Page MenuHomePhabricator

Improve performance of "arc diff" updates for changes with large diff text

Authored by epriestley on Feb 27 2019, 9:49 AM.



See PHI1104. The older "differential.querydiffs" method includes the entire raw diff text for all the diffs associated with a revision in its response, but we: only care about the most recent diff; and don't care about the text at all.

For reasonably large changes with several updates, this can be significantly slow.

We can get this same information more efficiently from the modern "", since D19386 (April 2018). The only trick is that we need a "revisionPHID", which we don't have on hand.

For now, just fetch the revision PHID. In the future, we can likely make adjustments so that we have the revision PHID already by the time we get here.

This may slow down the normal case very slightly (since we now do two calls instead of one), but it speeds up the bad cases dramatically.

Test Plan

Ran arc diff to update a change in a local repository. var_dump()'d the old and new algorithm results, saw the same outcome.

Used arc diff --trace on an update to a change to verify that is called but differential.querydiffs is not.

Diff Detail

rARC Arcanist
Lint Not Applicable
Tests Not Applicable

Event Timeline

amckinley added inline comments.

Just out of curiosity, how could this stuff fail? I see how the control flow gets us back to the legacy version if anything goes wrong, and from your test plan, it's not obvious to me that we know if the new version works.

This revision is now accepted and ready to land.Feb 27 2019, 5:20 PM

The most likely reason the new stuff would fail is that phabricator/ is older than April 2018, so the "commits" attachment does not exist yet (or even older, and "" method does not exist).

My test plan was flimsy and relied on reaching an inserted var_dump() before the first return to know that we'd hit it, but it can easily be made more robust -- I updated the test plan to have a more conclusive test (observe what calls we actually make using --trace) and executed the new plan to double check things.

  • Explain that we anticipate failures primarily because of old server versions.
This revision was automatically updated to reflect the committed changes.
jmeador added inline comments.

any reason this cannot be:

$diff_phid = $revision['fields']['diffPHID'];

And then could use that instead of inferring the diff ID?

Oh, good catch. Yeah, that would be an improvement.

That would also be more correct better than this approach if we ever let you "undo" a diff (T1081) so that the current active diff is not the one with the highest ID, but I don't currently expect that to ever happen since the use cases seem very weak.

Oh, good catch. Yeah, that would be an improvement.

use cases seem very weak.

We’ve actually had a couple requests for that. Primarily because users don’t really understand how base commit rules work.