Page MenuHomePhabricator

Add an intracluster synchronization log for cluster repositories
ClosedPublic

Authored by epriestley on Nov 6 2018, 10:45 PM.

Details

Summary

Depends on D19778. Ref T13216. See PHI943, PHI889, et al.

We currently have a push log and a pull log, but do not separately log intracluster synchronization events. We've encountered several specific cases where having this kind of log would be helpful:

  • In PHI943, an install was accidentally aborting locks early. Having timing information in the sync log would let us identify this more quickly.
  • In PHI889, an install hit an issue with MaxStartups configuration in sshd. A log would let us identify when this is an issue.
  • In PHI889, I floated a "push the linux kernel + fetch timeout" theory. A sync log would let us see sync/fetch timeouts to confirm this being a problem in practice.
  • A sync log will help us assess, develop, test, and monitor intracluster routing sync changes (likely those in T13211) in the future.

Some of these events are present in the pull log already, but only if they make it as far as running a git upload-pack subprocess (not the case with MaxStartups problems) -- and they can't record end-to-end timing.

No UI yet, I'll add that in a future change.

Test Plan
  • Forced all operations to synchronize by adding || true to the version check.
  • Pulled, got a sync log in the database.

Diff Detail

Repository
rP Phabricator
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

epriestley created this revision.Nov 6 2018, 10:45 PM
Owners added a subscriber: Restricted Owners Package.Nov 6 2018, 10:45 PM
epriestley requested review of this revision.Nov 6 2018, 10:46 PM
amckinley accepted this revision.Nov 7 2018, 6:22 PM
amckinley added inline comments.
src/applications/diffusion/protocol/DiffusionRepositoryClusterEngine.php
734

Worth it to add the name/PHID of the repo being synchronized? I know we get that info in the PhabricatorRepositorySyncEvent object, but seems like it would be good to log it here as well.

Also, is it always the cluster leader that we sync from? Don't we now just pick any device that's at max_version?

src/applications/repository/phid/PhabricatorRepositorySyncEventPHIDType.php
6

I'm guessing this is "sync event" and not a typo for SYNC?

src/applications/repository/query/PhabricatorRepositorySyncEventQuery.php
7–11

This could probably also use resultType, devicePHID and fromDevicePHID, no? Or are you just going to add stuff as needed when you build the UI?

This revision is now accepted and ready to land.Nov 7 2018, 6:22 PM
epriestley marked 3 inline comments as done.Nov 7 2018, 6:30 PM
epriestley added inline comments.
src/applications/diffusion/protocol/DiffusionRepositoryClusterEngine.php
734

I think the repository is always clear from context: this message ends up on the user's command line after a git fetch or git push or similar, not in a general logfile. Hopefully, users know what they're fetching/pushing. Hopefully.

"Devices at max_version" and "leaders" are the same machines: the definition of "leader" is "any device with the most current version of the repository".

src/applications/repository/phid/PhabricatorRepositorySyncEventPHIDType.php
6

Yeah, the other two are PSHE ("Push Event") and PULE ("Pull Event"), so this is SYNE for consistency.

src/applications/repository/query/PhabricatorRepositorySyncEventQuery.php
7–11

Yeah, just didn't have any callers yet. I expect to flesh this out at least a bit once I add UI for it.

This revision was automatically updated to reflect the committed changes.