Not really sure, to be honest
Details
- Reviewers
reed • danielmcgrath - Group Reviewers
Blessed Reviewers - Required Signatures
L28 Phacility Individual Contributor License Agreement
Diff Detail
- Repository
- rP Phabricator
- Branch
- arcpatch-D10176
- Lint
Lint Warnings Severity Location Code Message Warning src/applications/files/engine/PhabricatorS3FileStorageEngine.php:76 XHP9 Naming Conventions Warning src/applications/files/engine/PhabricatorS3FileStorageEngine.php:77 XHP9 Naming Conventions Warning src/applications/files/engine/PhabricatorS3FileStorageEngine.php:169 TXT3 Line Too Long Warning src/applications/files/engine/PhabricatorS3FileStorageEngine.php:180 TXT3 Line Too Long - Unit
Tests Skipped - Build Status
Buildable 7368 Build 16103: Run Core Tests Build 7776: [Placeholder Plan] Wait for 30 Seconds Build 7775: arc lint + arc unit
Event Timeline
Can you walk me through why you'd want to enable this feature? I don't understand what the advantage is.
The desire for encryption at rest comes from a desire to protect any company-sensitive or user data that we store. Our policy for such information is always "better safe than sorry". Since the S3 storage engine already supports setting the encryption header when putting an object, the simplicity of adding that functionality into the StorageEngine seemed like a no-brainer to allow us to switch to using S3 storage for Phabricator.
I realize that it is a somewhat obscure use case, but is a blocker for us making the switch.
Thanks for taking the time to look at this! Much appreciated.
Since this encryption is completely transparent, I'm just having hard time coming up with a plausible scenario where it improves anything.
The best I can come up with is that if someone breaches physical security at an Amazon datacenter and grabs a bunch of storage media, this might prevent them from accessing the files, depending on how keys are stored. But we have no way of knowing, and no way to prove that the AWS-side implementation of this header isn't "set a metadata flag and ignore the header".
We are also hard-pressed to prove any implementation of this header works -- if we accidentally got this wrong and sent an x-amz-sarvur-sied-enkrypshun header (that is, accidentally misspelled the header name), there would be almost no evidence that the implementation wasn't functioning. (Maybe your test plan alludes to that?) From the documentation, it seems like it will cause some metadata to be set, but we have no way to know what else it does (if anything).
I don't want to upstream "encryption" features which don't have a practical security benefit, because I worry they're misleading.
We could implement real encryption, where Phabricator controls the keys and the storage engine does not have them. This would protect against a pretty vague but slightly-more-plausible set of threats (compromised network link between you and AWS?) and maybe against some types of physical hardware access (if we stored keys ephemerally, although this seems nontrivial -- some discussion in T4721).
We could also implement client-side encryption, although this would have to be selective -- we can't reasonably do client-side encryption of profile pictures, for example, since no one wants to provide a bunch of decryption keys to view a page.
So, overall:
- If this is purely an organizational decree -- "it says encryption, so we have to enable it" -- I don't want to upstream it. I worry it makes configuration more complex and only provides an illusion of security. You could maintain a local patch, or you could fork the storage engine. Some discussion in T5843 about making this easier in the future.
- If there are good reasons to enable this which I'm not aware of (e.g., the "someone steals AWS disks" is a real threat you're concerned about, or there's more information about the implementation of this feature that I'm missing, or there's some other threat I'm missing), and there are no drawbacks, I think we should enable it unconditionally. I don't see a threat which this realistically protects against, though. We don't know that it really protects against theft of physical disks, and attackers who steal physical disks could just as conceivably steal the machine with your AWS credentials on it and then use the S3 API to read your files, at least in theory.
- I would only want to add this as configuration if there are strong arguments for using it, and also significant drawbacks (like performance). The tradeoffs should be documented in this case, and the documentation should help installs make a decision about whether or not to enable the option. The provided documentation isn't helpful in understanding why you'd want the flag, other than "someone said I have to enable it".
- If your organization is concerned about the security of file storage, we're happy to talk about real solutions which let you prove your data is actually encrypted, give you control of encryption keys, and allow you to make assertions about the kinds of threats you are and are not protected from.
Just pushing this back in your queue, upshot is:
- If there's a solid argument for how this materially improves security...
- ...and it has no downsides, we'd prefer to upstream it as always on.
- ...but enabling the option has downsides (performance?), we'd accept this patch roughly as-is but with more discussion about when to enable or disable the feature. (Also, as far as I can tell, AES256 is the only valid option for the header value.)
- If the argument is more along the lines of "better safe than sorry", we'd prefer not to upstream it, since it's unprovable:
- ...you can maintain a one-line local fork, or implement a custom S3WithSSE engine (T4721).
- ...or we can talk about other ways to satisfy your requirements that make more sense in the upstream (client control of keys, real provable security).
Thanks for taking the time to write all of that up! You raise some good points.
As far as proving this improves security, you're correct in that there's no real way to be sure that data is being encrypted, since the objective of S3's SSE is to be transparent to the user. However, if you look at this blog post announcing the feature way back when, there's some useful tidbits included:
- we generate a unique key, encrypt your data with the key, and then encrypt the key with a master key
- keys are stored in hosts that are separate and distinct from those used to store your data
- The entire encryption, key management, and decryption process is inspected and verified internally on a regular basis as part of our existing audit process.
But at some point, yes, it does require trusting that Amazon is capable of handling the encryption.
Regarding downsides, I could try to gather some performance metrics on S3's encryption. A cursory google didn't reveal much. There's no extra cost associated, though.
Using this feature is basically trusting Amazon's network security, personnel, electronic security, and claims about their security systems, but not their physical security.
It makes perfect sense as a solution to a problem like "we want to comply with data protection regulations or policies which are drafted in such a way that transparent, unprovable server-side encryption technically satisfies the letter of the law and/or policy", and I understand that some organizations realistically exist in corporate or regulatory climates where that's a real problem and this is a good solution to it, but it's not the kind of problem we're currently interested in solving in the upstream, especially since relatively-reasonable workarounds are available.
(We're very open to solving the non-regulatory version of this problem -- "we want to protect our data" -- but that's more complicated, and I could imagine that a more provably secure system might not actually comply with regulations or policies, and thus not solve the actual problem. Particularly, it would probably be time consuming and complicated to get anything we built externally audited.)
Note that this diff implements all three forms of AWS-supported SSE (Server-Side Encryption), including SSE-C, which allows for customer-specified encryption keys. See https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html for more information. You are still trusting Amazon to do the right thing here. https://d0.awsstatic.com/whitepapers/AWS_Securing_Data_at_Rest_with_Encryption.pdf and https://d0.awsstatic.com/whitepapers/KMS-Cryptographic-Details.pdf go into detail on how they do that.
Anyway, I just found out that Phabricator is using https://github.com/tpyo/amazon-s3-php-class for handling S3 actions. As per https://github.com/tpyo/amazon-s3-php-class/issues/96, that class doesn't support Signature Version 4, which is required for SSE-KMS to work correctly. In order for that to be useful, Phabricator would need to swap to using https://github.com/aws/aws-sdk-php (or some other implementation that includes support).
The response headers of any objects coming from S3 show how it was encrypted, which key ID (when using SSE-KMS) was used, etc. You can also check in the AWS console as well and look at the object itself.
We could implement real encryption, where Phabricator controls the keys and the storage engine does not have them. This would protect against a pretty vague but slightly-more-plausible set of threats (compromised network link between you and AWS?) and maybe against some types of physical hardware access (if we stored keys ephemerally, although this seems nontrivial -- some discussion in T4721).
https://github.com/aws/aws-sdk-php/issues/831 seeks to add client-side (where Phabricator is the client) encryption to the aws-sdk for PHP, which could help with this. It's already part of the Ruby, Java, and .NET SDKs, so hopefully somebody will implement it for PHP as well.
- If there are good reasons to enable this which I'm not aware of (e.g., the "someone steals AWS disks" is a real threat you're concerned about, or there's more information about the implementation of this feature that I'm missing, or there's some other threat I'm missing), and there are no drawbacks, I think we should enable it unconditionally. I don't see a threat which this realistically protects against, though. We don't know that it really protects against theft of physical disks, and attackers who steal physical disks could just as conceivably steal the machine with your AWS credentials on it and then use the S3 API to read your files, at least in theory.
The former is a threat for me, sadly. I also have compliance reasons as well for certain things. Also, I would hope they don't store credentials in plain text.
- I would only want to add this as configuration if there are strong arguments for using it, and also significant drawbacks (like performance). The tradeoffs should be documented in this case, and the documentation should help installs make a decision about whether or not to enable the option. The provided documentation isn't helpful in understanding why you'd want the flag, other than "someone said I have to enable it".
I've improved the documentation somewhat, though it can be improved more.
- If your organization is concerned about the security of file storage, we're happy to talk about real solutions which let you prove your data is actually encrypted, give you control of encryption keys, and allow you to make assertions about the kinds of threats you are and are not protected from.
SSE-C does give me control of my encryption keys, if I trust AWS to do it correctly.
You are still trusting Amazon to do the right thing here.
I'm not comfortable with bringing any encryption approach which relies on this upstream.
I'm open to implementing encryption which does not require Amazon to do the right thing -- where Phabricator manages the keys, handles encryption and decryption, and stores only encrypted data in the filestore. I want to implement this at the "Files Application" level, not the S3 API level, so that it is usable with any current or future filestore, and encrypted data can be migrated between filestores without performing encryption operations. (The data storage format partially anticipates such extensions already.)
If you'd like to move forward here, open a feature request (see Contributing Feature Requests).
To set expectations:
First, I'm not comfortable with "put the key in plaintext on a different disk", as in this diff, being the only approach to key management we provide. It's OK as a "I don't really care that much, I'm just doing this for compliance" baseline, and does materially improve security (we no longer need to trust that Amazon is actually encrypting data, and an attack against datacenter facilities would need to steal two different disks), but I'd like to at least have plans in place to allow installs to upgrade to more secure, ephemeral key storage which resists theft of an arbitrary number of disks. I don't immediately have a sound technical plan for providing this. For example, I'm not sure how to prevent an in-memory key from being paged to disk as part of virtual memory with the standard tools we ship with (PHP / MySQL), and I'm not aware of any small standalone solutions for this which we could reasonably bundle as a dependency.
Second, this is a very low priority for us. We've seen little interest in it, and that interest seems largely motivated by compliance concerns. Installs that need this for compliance can use local disk storage and check the "Compliance!" checkbox in EBS, or use MySQL storage with RDS and check the "Compliance!" checkbox for RDS, or provide an S3WithAComplianceCheckbox storage engine, or fork and add this "X-Compliance" header.
Broadly, I am unwilling to bring any "Compliance" checkboxes upstream. Anything that we put our names on as "encryption" or "security" needs to represent a best effort to actually provide security in practice, not just compliance.
I've improved the documentation somewhat, though it can be improved more.
This explains what these options do, but doesn't help users decide whether to enable them or motivate their inclusion in the upstream.
As currently described, enabling SSE is always strictly better than not enabling it. If this is true, why would we have an option at all, instead of just always enabling it? And why would we disable this option by default, when enabled is strictly better in all use cases?
See some discussion in T8227.