Page MenuHomePhabricator

Support at-rest encryption in Files
Closed, ResolvedPublic

Description

See some prior discussion in D10176.

Some organizations have compliance requirements which require data to be stored encrypted at rest. S3 has a magic header which claims to do this, but it's impossible to prove that it does anything and there is no specific threat it seems clearly capable of mitigating or defusing.

We can put the keys on the Phabricator webserver and do encryption at the Files level instead, which will let us prove that compromising S3 (or any other file data store) alone is no longer sufficient to compromise file data, and that an attacker who stole S3 drives alone would be unable to extract anything useful from them.

This reduces the problem to a different problem of where and how we store the master keys. Naively, putting them on disk on the webserver nodes is only marginally better. You do need to steal two different disks now, which is a bit better, but an attacker who can steal S3 disks can plausibly steal EC2 disks too. There is a narrow range of threat scenarios where this is obviously an improvement (e.g., webservers are first-party and S3 is a blind third-party datastore) but relatively few users are likely covered here.

We don't really need to resolve this since master key storage is clearly a better problem to have, but it would be nice to have a more obvious pathway forward here.

Event Timeline

One thing I've seen people do is something like an encrypted ssh key (For SSL certs):

  • The master key is stored encrypted in the server, using a pass-phrase that is known only to a select few. When a machine needs to start, a human log into the machine and type the key. The key is then loaded un-encrypted to RAM, but never to disk

This is obviously problematic as scale, but "some installs" might be happy with such an approach.

Using enough magic, this can be limited to the "file pipeline tier" or whatever, so the ops cost is somewhat reduced.

My tentative plan is to define keys in local config with some sort of structure like:

"encryption.keys": [
  "my-key-name": {
    "name": "My AES256 Key",
    "type": "aes-256",
    ...
  }
]

For the first key type (likely AES-256 or similar), you'll just put the key material directly in the config. This will have a lot of drawbacks in many setups, but it will be next to other sensitive information (API keys, etc) and seems like the least-bad place we can put key plaintext to me if we have to put it somewhere. In setups where S3 is a blind third-party filestore, this is a material improvement in security.

Later, we could add an aes-256-password type or similar, where you store encrypted key material and then unlock it in RAM with a passphrase at startup, as above. I'm not sure we have a great place to put the unlock UI if you have more than one machine, but we can deal with that when we get there.

I'm also planning to encrypt the data blocks themselves with a per-block key, then encrypt that key with the master key. This would let us do an aes-256-keyserver type later, where web servers send encrypted key text to the keyserver, get key plaintext back, then decrypt blobs locally. It also lets you rotate the master key with relatively less re-encryption of data.

(We can also do true client-side encryption in the future when we do it for Passphrase (T4721) but I think there's not much need/interest for this today.)

I'm planning to do this:

  • Add keyring, a new configuration option similar to the structure above, for storing encryption keys.
  • Let you specify a key as a "default" key. If you do, AES256 will be enabled for all writes going forward.
  • Provide bin/files encode --as aes-256-cbc F123, to explicitly change the storage format of a file. This is primarily for testing. This reads, encodes, and writes the entire file data (up to 4MB per chunk).
  • Provide bin/files cycle F123, to update the key encoding on applicable storage formats. In particular, if your master key leaks, you can add a new master, make it the default, cycle every file, then remove the bad master. This reads and writes only metadata (~100 bytes per chunk) and doesn't need to touch the storage engine, so it will be much faster than re-encoding everything.

Both files encode and files cycle will have --all options for enrolling an install after configuration.

The format of keyring will probably change in the future, because it will lack ways to specify keys per-storage-engine, for Passphrase, at night, during a full moon, etc. By configuring it, you agree not to complain too much when it changes format next year. I don't have a clear enough sense of the use cases to really anticipate what this will look like in the long run, so I'm just going to do something simple for now.

I've deployed this and configured this server to use AES-256-CBC, although I haven't retroactively migrated existing data.

You can see that newly uploaded files like F1689442 show "Encrypted (AES-256-CBC)" in the "Storage" tab:

encrypted.png (1×1 px, 91 KB)

The plaintext of the file is "abcdefghijklmnopqrstuvwxyz", and clicking "View File" in the web UI should download the plaintext properly, but the underlying data in the filestore is encrypted:

mysql> select * from file_storageblob where id = 179154;
+--------+----------------------------------+-------------+--------------+
| id     | data                             | dateCreated | dateModified |
+--------+----------------------------------+-------------+--------------+
| 179154 | :??;O{M?r3u?XQTmob?I?)ם?gjR |  1466090159 |   1466090159 |
+--------+----------------------------------+-------------+--------------+
1 row in set (0.00 sec)

The metadata for the file has a unique key + IV, stored encrypted with the master key and a unique IV:

mysql> select * from file where id = 1689442\G
*************************** 1. row ***************************
              id: 1689442
            phid: PHID-FILE-of45k23jmaiz2ohceoxx
            name: alphabet.txt
        mimeType: text/plain
        byteSize: 27
   storageEngine: blob
   storageFormat: aes-256-cbc
   storageHandle: 179154
...
        metadata: {"storage":{"key.name":"secure.20160616","iv.base64":"Uak3jdtkTLarkGYeKKzIhQ==","payload.base64":"t0Ni\/5d59ktugI\/k9iqb1pOPI56tamGTyZ515iDcQt2+qQGYr1G8emMM8T4GrRshqdXmNQK+le4+t+HNMsQrjaV786wbwY7acMUvqnRzlr7a2NOymKUX56KnrXe4X7cKdAtMKPRmqw5FHZ6Cpdxl2Q=="}}
...
1 row in set (0.00 sec)

To get started, see:

Barring any issues, this covers everything I plan to cover at this time, so I'm pausing it for feedback.


GoalRefTimeNotes
FormatsD161221 HourPrepare Files for modular storage formats.
AES256D16123, D161240.5 HoursImplement AES-256-CBC.
KeyringD161272 HoursKeyring configuration, documentation, support utilities.
Subtotal3.5 Hours
Cumulative Total3.5 Hours
epriestley claimed this task.

This is accounted for and we haven't run into any issues upstream with it in production after using it for about two weeks, but let us know if anyone runs into issues deploying it.