Page MenuHomePhabricator

Apply inverse edge edits after committing primary object edits
ClosedPublic

Authored by epriestley on Jan 14 2019, 9:03 PM.
Tags
None
Referenced Files
F14804676: D19969.id47665.diff
Sat, Jan 25, 5:43 PM
F14795674: D19969.id47761.diff
Sat, Jan 25, 3:08 AM
F14795673: D19969.id47666.diff
Sat, Jan 25, 3:08 AM
F14795672: D19969.id47665.diff
Sat, Jan 25, 3:08 AM
F14795671: D19969.id.diff
Sat, Jan 25, 3:08 AM
Unknown Object (File)
Fri, Jan 24, 7:09 AM
Unknown Object (File)
Tue, Jan 21, 3:34 PM
Unknown Object (File)
Tue, Jan 21, 3:25 PM
Subscribers
None

Details

Summary

Fixes T13082. When you create a revision (say, D111) with Ref T222 in the body, we write a D111 -> T222 edge ("revision 111 references task 222") and an inverse T222 -> D111 edge ("task 222 is referenced by revision 111").

We also apply a transaction to D111 ("alice added a task: Txxx.") and an inverse transaction to T222 ("alice added a revision: Dxxx").

Currently, it appears that the inverse transaction can sometimes generate mail faster than D111 actually commits its (database) transactions, so the mail says "alice added a revision: Unknown Object (Differential Revision)". See T13082 for evidence that this is true, and a reproduction case.

To fix this, apply the inverse transaction (to T222) after we commit the main object (here, D111).

This is tricky because when we apply transactions, the transaction editor automatically "fixes" them to be consistent with the database state. For example, if a task already has title "XYZ" and you set the title to "XYZ" (same title), we just no-op the transaction.

It also fixes edge edits. The old sequence was:

  • Open (database) transaction.
  • Apply our transaction ("alice added a task").
  • Apply the inverse transaction ("alice added a revision").
  • Write the edges to the database.
  • Commit (database) transaction.

Under this sequence, the inverse transaction was "correct" and didn't need to be fixed, so the fixing step didn't touch it.

The new sequence is:

  • Open (database) transaction.
  • Apply our transaction ("alice added a task").
  • Write the edges.
  • Commit (database) transaction.
  • Apply the inverse transaction ("alice added a revision").

Since the inverse transaction now happens after the database edge write, the fixing step detects that it's a no-op and throws it away if we do this naively.

Instead, add some special cases around inverse edits to skip the correction/fixing logic, and just pass the "right" values in the first place.

Test Plan

Added and removed related tasks from revisions, saw appropriate transactions render on both objects.

(It's hard to be certain this completely fixes the issue since it only happened occasionally in the first place, but we can see if it happens any more on secure.)

Diff Detail

Repository
rP Phabricator
Branch
mfa13
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 21519
Build 29321: Run Core Tests
Build 29320: arc lint + arc unit

Event Timeline

  • Correct "remove an inverse edge" transaction.

Some additional context -- normally, when we apply a transaction, we populate only the "new" value. So we pass an object like this into the Editor:

{
  "type": "title",
  "old": <no value yet>,
  "new": "Feed The Zoo Animals"
}

The Editor populates the old value by reading the object. In most cases, this is: easier (we don't have to figure it out ourselves); and more correct (we can't get it wrong). The "get it wrong" part is more of an issue with users calling the API, but we could make mistakes in first-party code too.

So the transaction gets turned into this inside the Editor, before it's actually applied/saved:

{
  "type": "title",
  "old": "Feed The Zoo Anermls",
  "new": "Feed The Zoo Animals"
}

Sometimes, the new value is also modified. This is less common now in new/modern code, since it tends to make API stuff really messy/hard/bad. However, a recent example was the Pholio transactions in D19924 and related changes. In those cases, we passed in something like:

{
  "type": "image",
  "old": <no value yet>,
  "new": <SomeObject>
}

...and the <SomeObject> got mangled down to a PHID inside the Editor. Again, we're generally moving away from this and most modern transactions don't touch the new value, or touch it only in an obvious/trivial way.

However, edges are old and complicated and get hit by basically every step here and have their data mangled in many wonderful ways. Until T13056 is cleared, this is hard to get rid of.

About a year ago, in T13051, we also hit some issues where some objects with a very large number of edges were taking up way too much storage space, since each transaction contained the entire old edge list and entire new edge list. At the time, the storage format was always:

{
  "type": "edge",
  "old": [
    "PHID-X": {
      "src": ...
      "dst": ...
      ...
    },
    "PHID-Y": {
      ...
    }
  ],
  "new": ...
}

...and so on. T13051 has a better example. So these transactions were huge (adding the 201st edge meant we stored 200 "old" dictionaries and 201 "new" dictionaries).

T13051 added a new "compact" storage format, and some translation code to let everything pretend we're still using the old "tons of huge dictionaries" format.

And, to make things worse, edges are edited with values like this:

"new": {
  "+": ...
  "-": ...
  "=": ...
}

..where + means "add these edges", - means "remove these edges", and = means "set edges to this list".

So here, where we're trying to add an edge, the full path is something like:

We build a new value to "add edge X":

{
  "+": {"X": true}
}

We put it in a transaction:

{
  "old": <no value yet>,
  "new": {"+": {"X": true}}
}

We pass that into the Editor. It loads the list of actual edges, then builds an "old" value and a "new" value by applying the adds/removes/sets to that list. Say the current list is <Y, Z>, we get:

{

  "old": [Y, Z],
  "new": [Y, Z, X]
}

...except those aren't actually PHIDs -- they're dictionaries with src, dst, etc.

Then things go along normally for a while and before we save the transaction, the dictionaries get flattened into plain PHIDs.


Okay. So the issue is that at the step where {"+": [X]} becomes old: [Y, Z], new: [Y, Z, X]. The Editor looks in the database and sees that X is already an edge (since this diff has reordered things), so it builds old: [X, Y, Z], applies the "+", gets new: [X, Y, Z]. They're the same so another piece of code throws the transaction away.

This patch computes and provides the raw storage "old" and "new" values as plain old lists of PHIDs, sets them both on the transaction, and then just adds a handful of "if (inverse) { don't do anything, stop, don't touch it, don't throw it away, I promise it's fine, hands off }" logic to skip all the other transform/flatten/expand/reduce/discard steps. This is kind of "bad" in some senses, but the whole mess should un-mess a bit some day after T13056 allows us to get rid of "data" (which is why most of these steps exist) and we can remove most of these transform steps and just have everything operate on lists of PHIDs.

I claim with only limited reservations that I totally understand this change.

This revision is now accepted and ready to land.Jan 16 2019, 11:29 PM

I'll probably hold this until after the release cut -- we have a lot of stuff in queue and the bug this fixes is very small/rare, I just have a personal vendetta against it.

This revision was automatically updated to reflect the committed changes.