Paths

Table of Contentst

Use "git ls-remote" to guess if "git fetch" is a no-op
ClosedPublic
Actions

Authored by epriestley on Mar 14 2017, 10:25 PM.

Details

Reviewers

Maniphest Tasks

T12296: Improve Phacility repository import performance
T12392: Instance reporting that synchronizeWorkingCopyBeforeRead() effectively fails

Commits

rP2b0ad243d179: Use "git ls-remote" to guess if "git fetch" is a no-op

Summary

Ref T12296. Ref T12392. Currently, when we're observing a remote repository, we periodically run git fetch ....

Instead, periodically run git ls-remote (to list refs in the remote) and git for-each-ref (to list local refs) and only continue if the two lists are different.

The motivations for this are:

In T12296, it appears that doing this is faster than doing a no-op git fetch. This effect seems to reproduce locally in a clean environment (900ms for ls-remote + 100ms for for-each-ref vs about 1.4s for fetch). I don't have any explanation for why this is, but there it is. This isn't a huge change, although the time we're saving does appear to mostly be local CPU time, which is good for us.
Because we control all writes, we could cache git for-each-ref in the future and do fewer disk operations. This doesn't necessarily seem too valuable, though.
This allows us to tell if a fetch will do anything or not, and make better decisions around clustering (in particular, simplify how observed repository versioning works). With git fetch, we can't easily distinguish between "fetch, but nothing changed" and "legitimate fetch".

If a repository updates very regularly we end up doing slightly more work this way (that is, if ls-remote always comes back with changes, we do a little extra work), but this is normally very rare.

This might not get non-bare repositories quite right in some cases (i.e., incorrectly detect them as changed when they are unchanged) but we haven't created non-bare repositories for many years.

Test Plan

Ran bin/repository update --trace --verbose PHABX, saw sensible construction of local and remote maps and accurate detection of whether a fetch would do anything or not.

Diff Detail

Repository

rP Phabricator

Branch

ref1

Lint

Lint Passed

Unit

Tests Passed

Build Status

Buildable 15984
Build 21182: Run Core Tests
Build 21181: arc lint + arc unit

Event Timeline

epriestley created this revision.Mar 14 2017, 10:25 PM

Harbormaster completed remote builds in B15984: Diff 42077.Mar 14 2017, 10:26 PM

epriestley added inline comments.Mar 14 2017, 10:26 PM

src/applications/repository/engine/PhabricatorRepositoryPullEngine.php
161	These aren't precisely related, but `bin/repository update --verbose` and similar were just printing "%s" since `log()` doesn't actually take a pattern.

chad accepted this revision.Mar 14 2017, 10:41 PM

This revision is now accepted and ready to land.Mar 14 2017, 10:41 PM

epriestley mentioned this in T12392: Instance reporting that synchronizeWorkingCopyBeforeRead() effectively fails.Mar 17 2017, 11:42 PM

Closed by commit rP2b0ad243d179: Use "git ls-remote" to guess if "git fetch" is a no-op (authored by epriestley, committed by epriestley). · Explain WhyMar 17 2017, 11:43 PM

This revision was automatically updated to reflect the committed changes.

joshuaspence added a subscriber: joshuaspence.Mar 20 2017, 6:30 AM

joshuaspence added inline comments.

src/applications/repository/engine/PhabricatorRepositoryPullEngine.php
425	I think you can just pass `--refs` t `git ls-remote` to avoid it returning pseudorefs.

iiam

src/applications/repository/engine/PhabricatorRepositoryPullEngine.php
425	This flag appears to work, but be completely undocumented? How did you discover it?

joshuaspence added inline comments.Mar 21 2017, 1:41 AM

src/applications/repository/engine/PhabricatorRepositoryPullEngine.php
425	`man git ls-remote` on Git 2.11.0

$ man git ls-remote | grep -- --refs
No manual entry for ls-remote

epriestley@orbital ~ $ git help ls-remote | grep -- --refs
epriestley@orbital ~ $

epriestley@orbital ~ $ man git-ls-remote | grep -- --refs
epriestley@orbital ~ $

iiam

epriestley mentioned this in T12778: Repository mirroring is much more frequent than necessary.May 30 2017, 11:29 PM

Revision Contents
Changeset List

Path

Size

src/

applications/

repository/

engine/

PhabricatorRepositoryPullEngine.php

88 lines

Diff 42077

View Options

src/applications/repository/engine/PhabricatorRepositoryPullEngine.php

Use "git ls-remote" to guess if "git fetch" is a no-opClosedPublicActions