Manage object mailKeys automatically in Mail instead of storing them on objects
ClosedPublic
Actions

Authored by epriestley on Apr 23 2018, 9:01 PM.

Details

Reviewers

amckinley

Maniphest Tasks

T13065: Move storage for `mailKey` to the Mail application

Commits

rP1b24b486f58a: Manage object mailKeys automatically in Mail instead of storing them on objects

Summary

Ref T13065. mailKeys are a private secret for each object. In some mail configurations, they help us ensure that inbound mail is authentic: when we send you mail, the "Reply-To" is "T123+456+abcdef".

The T123 is the object you're actually replying to.
The 456 is your user ID.
The abcdef is a hash of your user account with the mailKey.

Knowing this hash effectively proves that Phabricator has sent you mail about the object before, i.e. that you legitimately control the account you're sending from. Without this, anyone could send mail to any object "From" someone else, and have comments post under their username.

To generate this hash, we need a stable secret per object. (We can't use properties like the PHID because the secret has to be legitimately secret.)

Today, we store these in mailKey properties on the actual objects, and manually generate them. This results in tons and tons and tons of copies of this same ~10 lines of code.

Instead, just store them in the Mail application and generate them on demand. This change also anticipates possibly adding flags like "must encrypt" and "original subject", which are other "durable metadata about mail transmission" properties we may have use cases for eventually.

Test Plan

See next change for additional testing and context.
Sent mail about Herald rules (next change); saw mail keys generate cleanly.
Destroyed a Herald rule with a mail key, saw the mail properties get nuked.
Grepped for getMailKey() and converted all callsites I could which aren't the copy/pasted boilerplate present in 50 places.
Used bin/mail receive-test --to T123 to test normal mail receipt of older-style objects and make sure that wasn't broken.

Diff Detail

Repository

rP Phabricator

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

epriestley created this revision.Apr 23 2018, 9:01 PM

Owners added a subscriber: Restricted Owners Package.Apr 23 2018, 9:01 PM

Harbormaster completed remote builds in B20214: Diff 46403.Apr 23 2018, 9:02 PM

epriestley requested review of this revision.Apr 23 2018, 9:02 PM

epriestley added a child revision: D19400: Support Herald rules for Herald rules.Apr 23 2018, 9:07 PM

epriestley edited the summary of this revision. (Show Details)Apr 23 2018, 9:07 PM

epriestley added inline comments.Apr 23 2018, 9:10 PM

src/applications/auth/storage/PhabricatorAuthSSHKey.php
73–77	This was just a fake key that nothing actually used. It will be automatically generated by the new system now: we get good behavior for free by just deleting this.
src/applications/metamta/storage/PhabricatorMetaMTAMailProperties.php
84	(We load this with the Omnipotent viewer, which skips the actual check. This object only really implements `PhabricatorPolicyInterface` so that the Query can be a `CursorPagedPolicyAwareQuery`.)

amckinley accepted this revision.Apr 23 2018, 9:39 PM

amckinley added inline comments.

src/applications/metamta/storage/PhabricatorMetaMTAMailProperties.php
14	I'm not actually worried about this, but have you tested the performance impact of fetching a ton of these JSON blobs and deserializing them, compared to just reading the `mailKey` out of the already-fetched row? I'm trying to imagine what kind of operations could result in fetching a linear number of these blobs, but nothing leaps to mind. Maybe Herald rule evaluations? And just out of curiosity, why do we prefer blobs of JSON vs tables with (`objectID`, `propertyName`, `propertyValue`) columns?
40–42	Shouldn't we be checking this the other way around? First see if there's a `PhabricatorMetaMTAMailProperties` object, and if not, fetch `$object->getMailKey()` and save the result into the new table?

This revision is now accepted and ready to land.Apr 23 2018, 9:39 PM

epriestley added inline comments.Apr 23 2018, 10:04 PM

src/applications/metamta/storage/PhabricatorMetaMTAMailProperties.php
14	The big advantage of blobs is that we can add more stuff without migrations. Adding new properties to some tables (like `HarbormasterBuildLog`, recently) potentially means hours and hours of downtime for large installs. The big advantage of real columns is that we can query on them (and add keys, and join them, etc). Decoding JSON is normally very fast, and I believe there's no meaningful inherent performance difference between columns and JSON blobs in most cases. I haven't specifically benchmarked it recently, but I can't recall these JSON decodes ever showing up in a profile while doing performance work. JSON is slightly less space-efficient (e.g., column names are repeated) but this usually isn't an issue (we've never hit a case where we optimized a table's size by moving away from JSON, at least that I can recall offhand). Increasingly, a lot of objects tend to have their major properties as columns (when we need keys, constraints, etc) and then a `$properties`-type field for whatever other random stuff turns up. In this case, it's (almost certainly?) clear that nothing will ever want to query/join/constrain `mailKey` since it's a random secret, so the advantages of column storage are moot. The other two potential things we know about that we may want to store here in the future ("Must Encrypt" state, "Original Subject") also (almost certainly?) don't need to be queried. This table may also become large-ish (the rows are small, but it may have 1 row per object that exists in any other application) so migrations might be expensive in the future. Thus, a bag-of-JSON-properties felt like a better fit than a dedicated column.
14	And just out of curiosity, why do we prefer blobs of JSON vs tables with (objectID, propertyName, propertyValue) columns? Oh, this is another advantage of the JSON blobs: with reasonable frequency, we use them to store maps or lists. So we could use an `<object, name, value>` table, but then the `value` would need to be JSON (`[1, 2, 3]`) anyway, or we'd need another `<propertyID, uhIndex?, value>` or something, and then another one under that if the lists are nested. Basically a big mess that is either super hard/complicated or just ends up storing a lot of JSON anyway.
40–42	I don't want to migrate things passively one at a time because I think it just makes things more complicated (what's in the table is less clear/knowable), and the ultimate migration needs to hit everything anyway since we can't guarantee that passive migration will ever complete. I don't think it matters too much in this case, but I think doing a trickle-migration tends to make it harder to make changes here (e.g., if nothing else, the table is larger) without any concrete benefits (at least, I can't think of any).