Page MenuHomePhabricator

Repository mirroring is much more frequent than necessary
Closed, InvalidPublic

Description

We have a few repositories on our installation that are mirrored to another repo host. We recently added a repo (which, in case it matters, is also observed from a third location) that is committed to automatically once an hour. We noticed a much-greater-than-expected increase in the number of git requests made to our repo host.

We discovered that this is because the PhabricatorRepositoryPullLocalDaemon never hits a frequency below 15 seconds. An hour is the exact threshold before it begins to wait longer between checks, and every check pushes to the mirror. This strikes us as a bit unfortunate - there is no reason to make 240 pushes per hour. Even if that's ameliorated load-wise by 239 of them being noops, it still adds unnecessary noise to our logging and metrics.

Would it be possible to switch to a strategy where Phabricator only pushes to mirrors when it actually gets updates? Or is there a particular reason that strategy was already considered and rejected?

Event Timeline

neilfitz added a project: Restricted Project.
neilfitz added a subscriber: alexmv.

Can you walk me through why you consider this to be a bug report rather than a feature request?

I actually wasn't sure which to pick myself, since it's not as specific as most bug reports, but it seemed more like undesirable behavior of an existing feature, as opposed to a new feature. Would it be more appropriate to consider this as a feature request instead?

I'm not sure, since I'm not really sure what problem we're solving. It sounds like there are potentially three motivations:

  • Resource cost, e.g. CPU and I/O. If you want to motivate this from a resource cost perspective, please measure/quantify the resource costs. I don't see any actual evidence that this is legitimately slow or resource intensive.
  • Logging/monitoring. We can go down this path, but it will likely end in "it's easier to ask you to change Phabricator than to change our logging to ignore/de-emphasize no-op pushes since we do monitoring by pointing a spectrometer at the network card" or something like that, i.e. you want us to fix a problem of your own creation which is unique to your install by making Phabricator more complicated for everyone forever.
  • Pure aesthetics. We generally favor the aesthetics of the codebase over the aesthetics of Phabricator's interaction with the environment, and are very unlikely to make Phabricator more complex internally to perform fewer pushes if this has no actual cost in real terms and just makes some stuff prettier while imposing a direct complexity cost.

A resource cost motivation might rise to the level of "bug" if this legitimately prevents scaling Phabricator. Scale and security issues have significant leeway to bend the rules.

If you can't quantify a specific resource cost, I think this is generally likely to end in a "wait for T12681, then pay us a lot of money" sort of answer.

To your question specifically: we can change this, but any change will make Phabricator more complicated to maintain and support (the mirroring behavior will become less predictable/consistent for all users if we ever skip pushes). In the general case, we can not determine that a mirror is a no-op by looking at only our own state because a third party may have pushed to the remote since our last push. Any behavioral change here will affect existing installs which may rely on the current behavior to enforce a rule like "keep the GitHub mirror clean/synchronized by continually overwriting it even if someone pushes to it by accident".

We can git ls-remote before pushing to make the no-op determination accurately but the cost may not be much different, and there may be no difference in logging impact.

We can guess if we're going to no-op blindly, but this is a meaningful behavioral change which may negatively impact existing installs. If it's an option/setting, see T8227.

Thanks for the detailed response! We hadn't considered that people might tamper with mirror remotes, so you make a very good point there that our proposed change might cause a regression. And you're quite right that logging is our problem.

Our main concern was on the performance angle, but it's not a huge impact at the moment. We agree it's probably for the best to back off on this one for now -
especially since the bigger win (from our perspective) of cutting down the number of git requests...doesn't actually work.

We still speculate that creating sending N empty packfiles might be much more expensive than running git ls-remote, for both the Phabricator host and the remote host. But it's more of a general "who-knows-what-horrors-lurk-in-the-depths-of-git" (e.g., maybe something is secretly quadratic) - not an evidence-based hypothesis at the moment.

For pulls (vs mirrors), we switched to ls-remote to detect no-ops in D17497. The primary motivation was to let us make better clustering decisions, but it also "appeared" to improve performance, which suggests that it may not be outlandish to imagine that git ls-remote would use fewer resources here as well.

Since there's no remaining pathway forward, I'm going to close this. In the future:

  • We're happy to pursue this as a bug if you can quantify a large, scalability-impacting resource impact from no-op git push --mirror operations.
  • We're happy to pursue this as a feature request too, but you should follow Describing Root Problems and have realistic expectations about upstream priority if the problem basically boils down to something pretty narrow like "our logging isn't great with no-op noise and it's pretty hard to fix" or "our sensibilities are offended by Phabricator's boorish behavior".

In the future, please try to be more specific when filing requests upstream. We would normally just reject this sort of "here's something we observed, what are some options / general thoughts?" question as outside the realm of free support, which covers only a very narrow range of requests (see Support Resources for a list) because of the large amount of time it takes to provide general context on complex, open-ended problems. See also vaguely related T12134 for discussion about further tightening of free support.