Page MenuHomePhabricator

Improve performance of "arc diff" updates for changes with large diff text
ClosedPublic

Authored by epriestley on Feb 27 2019, 9:49 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 23, 9:21 AM
Unknown Object (File)
Fri, Nov 22, 9:47 PM
Unknown Object (File)
Tue, Nov 19, 8:50 PM
Unknown Object (File)
Tue, Nov 19, 7:44 AM
Unknown Object (File)
Mon, Nov 18, 6:05 AM
Unknown Object (File)
Fri, Nov 15, 1:09 AM
Unknown Object (File)
Wed, Nov 13, 1:42 AM
Unknown Object (File)
Tue, Nov 12, 11:11 PM
Subscribers
Tokens
"Burninate" token, awarded by joshuaspence.

Details

Summary

See PHI1104. The older "differential.querydiffs" method includes the entire raw diff text for all the diffs associated with a revision in its response, but we: only care about the most recent diff; and don't care about the text at all.

For reasonably large changes with several updates, this can be significantly slow.

We can get this same information more efficiently from the modern "differential.diff.search", since D19386 (April 2018). The only trick is that we need a "revisionPHID", which we don't have on hand.

For now, just fetch the revision PHID. In the future, we can likely make adjustments so that we have the revision PHID already by the time we get here.

This may slow down the normal case very slightly (since we now do two calls instead of one), but it speeds up the bad cases dramatically.

Test Plan

Ran arc diff to update a change in a local repository. var_dump()'d the old and new algorithm results, saw the same outcome.

Used arc diff --trace on an update to a change to verify that differential.diff.search is called but differential.querydiffs is not.

Diff Detail

Repository
rARC Arcanist
Branch
load1
Lint
Lint Passed
SeverityLocationCodeMessage
Advicesrc/workflow/ArcanistDiffWorkflow.php:2269XHP16TODO Comment
Unit
Tests Passed
Build Status
Buildable 22132
Build 30241: Run Core Tests
Build 30240: arc lint + arc unit

Event Timeline

amckinley added inline comments.
src/workflow/ArcanistDiffWorkflow.php
2314

Just out of curiosity, how could this stuff fail? I see how the control flow gets us back to the legacy version if anything goes wrong, and from your test plan, it's not obvious to me that we know if the new version works.

This revision is now accepted and ready to land.Feb 27 2019, 5:20 PM

The most likely reason the new stuff would fail is that phabricator/ is older than April 2018, so the "commits" attachment does not exist yet (or even older, and "differential.diff.search" method does not exist).

My test plan was flimsy and relied on reaching an inserted var_dump() before the first return to know that we'd hit it, but it can easily be made more robust -- I updated the test plan to have a more conclusive test (observe what calls we actually make using --trace) and executed the new plan to double check things.

  • Explain that we anticipate failures primarily because of old server versions.
This revision was automatically updated to reflect the committed changes.
jmeador added inline comments.
src/workflow/ArcanistDiffWorkflow.php
2290

any reason this cannot be:

$diff_phid = $revision['fields']['diffPHID'];

And then differential.diff.search could use that instead of inferring the diff ID?

Oh, good catch. Yeah, that would be an improvement.

That would also be more correct better than this approach if we ever let you "undo" a diff (T1081) so that the current active diff is not the one with the highest ID, but I don't currently expect that to ever happen since the use cases seem very weak.

Oh, good catch. Yeah, that would be an improvement.

use cases seem very weak.

We’ve actually had a couple requests for that. Primarily because users don’t really understand how base commit rules work.