Page MenuHomePhabricator

Manage object mailKeys automatically in Mail instead of storing them on objects
ClosedPublic

Authored by epriestley on Apr 23 2018, 9:01 PM.
Tags
None
Referenced Files
F14754374: D19399.id46416.diff
Tue, Jan 21, 3:30 PM
Unknown Object (File)
Sat, Jan 18, 8:01 AM
Unknown Object (File)
Thu, Jan 2, 12:38 AM
Unknown Object (File)
Dec 13 2024, 7:57 PM
Unknown Object (File)
Dec 2 2024, 8:00 PM
Unknown Object (File)
Dec 2 2024, 8:00 PM
Unknown Object (File)
Dec 2 2024, 8:00 PM
Unknown Object (File)
Dec 2 2024, 8:00 PM
Subscribers
Restricted Owners Package

Details

Summary

Ref T13065. mailKeys are a private secret for each object. In some mail configurations, they help us ensure that inbound mail is authentic: when we send you mail, the "Reply-To" is "T123+456+abcdef".

  • The T123 is the object you're actually replying to.
  • The 456 is your user ID.
  • The abcdef is a hash of your user account with the mailKey.

Knowing this hash effectively proves that Phabricator has sent you mail about the object before, i.e. that you legitimately control the account you're sending from. Without this, anyone could send mail to any object "From" someone else, and have comments post under their username.

To generate this hash, we need a stable secret per object. (We can't use properties like the PHID because the secret has to be legitimately secret.)

Today, we store these in mailKey properties on the actual objects, and manually generate them. This results in tons and tons and tons of copies of this same ~10 lines of code.

Instead, just store them in the Mail application and generate them on demand. This change also anticipates possibly adding flags like "must encrypt" and "original subject", which are other "durable metadata about mail transmission" properties we may have use cases for eventually.

Test Plan
  • See next change for additional testing and context.
  • Sent mail about Herald rules (next change); saw mail keys generate cleanly.
  • Destroyed a Herald rule with a mail key, saw the mail properties get nuked.
  • Grepped for getMailKey() and converted all callsites I could which aren't the copy/pasted boilerplate present in 50 places.
  • Used bin/mail receive-test --to T123 to test normal mail receipt of older-style objects and make sure that wasn't broken.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Owners added a subscriber: Restricted Owners Package.Apr 23 2018, 9:01 PM
src/applications/auth/storage/PhabricatorAuthSSHKey.php
73–77

This was just a fake key that nothing actually used. It will be automatically generated by the new system now: we get good behavior for free by just deleting this.

src/applications/metamta/storage/PhabricatorMetaMTAMailProperties.php
84

(We load this with the Omnipotent viewer, which skips the actual check. This object only really implements PhabricatorPolicyInterface so that the Query can be a CursorPagedPolicyAwareQuery.)

amckinley added inline comments.
src/applications/metamta/storage/PhabricatorMetaMTAMailProperties.php
14

I'm not actually worried about this, but have you tested the performance impact of fetching a ton of these JSON blobs and deserializing them, compared to just reading the mailKey out of the already-fetched row? I'm trying to imagine what kind of operations could result in fetching a linear number of these blobs, but nothing leaps to mind. Maybe Herald rule evaluations?

And just out of curiosity, why do we prefer blobs of JSON vs tables with (objectID, propertyName, propertyValue) columns?

40–42

Shouldn't we be checking this the other way around? First see if there's a PhabricatorMetaMTAMailProperties object, and if not, fetch $object->getMailKey() and save the result into the new table?

This revision is now accepted and ready to land.Apr 23 2018, 9:39 PM
src/applications/metamta/storage/PhabricatorMetaMTAMailProperties.php
14

The big advantage of blobs is that we can add more stuff without migrations. Adding new properties to some tables (like HarbormasterBuildLog, recently) potentially means hours and hours of downtime for large installs.

The big advantage of real columns is that we can query on them (and add keys, and join them, etc).

Decoding JSON is normally very fast, and I believe there's no meaningful inherent performance difference between columns and JSON blobs in most cases. I haven't specifically benchmarked it recently, but I can't recall these JSON decodes ever showing up in a profile while doing performance work. JSON is slightly less space-efficient (e.g., column names are repeated) but this usually isn't an issue (we've never hit a case where we optimized a table's size by moving away from JSON, at least that I can recall offhand).

Increasingly, a lot of objects tend to have their major properties as columns (when we need keys, constraints, etc) and then a $properties-type field for whatever other random stuff turns up.

In this case, it's (almost certainly?) clear that nothing will ever want to query/join/constrain mailKey since it's a random secret, so the advantages of column storage are moot. The other two potential things we know about that we may want to store here in the future ("Must Encrypt" state, "Original Subject") also (almost certainly?) don't need to be queried. This table may also become large-ish (the rows are small, but it may have 1 row per object that exists in any other application) so migrations might be expensive in the future. Thus, a bag-of-JSON-properties felt like a better fit than a dedicated column.

14

And just out of curiosity, why do we prefer blobs of JSON vs tables with (objectID, propertyName, propertyValue) columns?

Oh, this is another advantage of the JSON blobs: with reasonable frequency, we use them to store maps or lists.

So we could use an <object, name, value> table, but then the value would need to be JSON ([1, 2, 3]) anyway, or we'd need another <propertyID, uhIndex?, value> or something, and then another one under that if the lists are nested. Basically a big mess that is either super hard/complicated or just ends up storing a lot of JSON anyway.

40–42

I don't want to migrate things passively one at a time because I think it just makes things more complicated (what's in the table is less clear/knowable), and the ultimate migration needs to hit everything anyway since we can't guarantee that passive migration will ever complete.

I don't think it matters too much in this case, but I think doing a trickle-migration tends to make it harder to make changes here (e.g., if nothing else, the table is larger) without any concrete benefits (at least, I can't think of any).

This revision was automatically updated to reflect the committed changes.