Improve cluster write feedback and write routing
Open, NormalPublic
Actions

Assigned To

Authored By

	epriestley
	Mar 19 2018, 2:07 PM

Description

See PHI480. An install with a master/master repository cluster has seen feedback from users about this message on the SSH wire flow:

# Waiting up to 120 second(s) for a cluster write lock...

Specifically:

This message is missing an English translation.
This message isn't as clear as it could be that the lock will be acquired instantaneously when it's available (i.e., we aren't wasting time in sleep(...)).
This message isn't as clear as it could be about the scope of the lock (global per repository).

It would be more helpful if waiting for a write lock gave you more information about who has the lock, so it's clear that the server is doing something and that you aren't just being made to wait for no reason or because of a bug:

Acquiring the repository write lock on rXYZ...
Waiting for epriestley to finish pushing abcdef1234 to master...
...

Beyond this, we could improve the write routing algorithm. The write process looks like this:

Choose a random node.
Acquire the repository write lock.
Acquire the host read lock.
Synchronize.
Release the host read lock.
Accept the write.
Release the repository write lock.

This synchronize can be skipped if we get lucky, and randomly choose the same node that the writer ahead of us chose, since that node will already have the write we just waited for.

We can make clients exceptionally lucky by routing like this:

If possible, choose the node with a writer holding the lock.
If possible, choose a leader node.
Choose a random node.

Since writes are always one-at-a-time, there's effectively no downside to this: we can't hit a "thundering herd" problem with write traffic.

See also T10884 for read routing, which is a little trickier, since a bad read routing algorithm can kill a cluster by focusing too much traffic on a small set of nodes.

I plan to make these changes:

Modify the PushEvent table to contain more information about locks and waits (e.g., active queue status; how long the push waited for the write lock; how long the push spent synchronizing).
Modify the "Waiting..." prompt to show more information by querying the PushEvent table.

Based on this data, we can make a decision about whether smarter write routing is worthwhile by seeing how much time pushes are spending synchronizing after acquiring the write lock.

(I suspect it will be, and the bar to implement this isn't very high, but it would still be nice to have some data suggesting that it will have a real effect: although I think the smarter routing is strictly better, it is more complex than routing with shuffle().)

Revisions and Commits

rP Phabricator
	D19735	rPbc6c8c0e93a7 Explicitly shuffle nodes before selecting one for cluster sync
	D19734	rP51073b972ece Try to route cluster writes to nodes which won't need to synchronize first
	D19250	rPdf3c937dab8c Record lock timing information on PushEvents
	D19249	rP69bff489d4ec Generate a random unique "Request ID" for SSH requests so processes can…
	D19247	rP859b27497095 Provide more information to users during `git push` while waiting for write…

Related Objects

Mentioned In: T13202: Plans: 2018 Week 38-40 Bonus Content
T13108: Plans: 2018 Week 12 Bonus Content
Mentioned Here: D19701: Parameterize the repository read and write locks
D19702: When we fail to acquire a repository lock, try to provide a hint about why
D19720: Add setHint/getHint to PhutilLockException for recent lock changes
T13211: Improve intracluster synchronization routing
rPc46be2a70b4d: Allow Maniphest tasks to be queried by workboard Column PHID via SearchEngine
rP51073b972ece: Try to route cluster writes to nodes which won't need to synchronize first
rPbc6c8c0e93a7: Explicitly shuffle nodes before selecting one for cluster sync
T10884: Sort repository, database and notification services better (by network distance)

Event Timeline

epriestley triaged this task as Normal priority.Mar 19 2018, 2:07 PM

epriestley created this task.

Herald added a subscriber: eadler. · View Herald TranscriptMar 19 2018, 2:07 PM

epriestley mentioned this in T13108: Plans: 2018 Week 12 Bonus Content.Mar 19 2018, 2:08 PM

ftdysa added a subscriber: ftdysa.Mar 19 2018, 2:21 PM

epriestley added a revision: D19247: Provide more information to users during `git push` while waiting for write locks.Mar 22 2018, 2:11 AM

Modify the PushEvent table to contain more information about locks and waits (e.g., active queue status; how long the push waited for the write lock; how long the push spent synchronizing).

This is tricky because the PushEvent is actually written by DiffusionCommitHookEngine. This runs in a subprocess two levels down:

- sshd
  - ssh-exec             <--- Where we can measure locks.
    - git receive-pack
      - commit-hook      <--- Where we write PushEvent.

So there are a couple issues:

we don't have a great way to pass timing information down to the commit hook;
the timing isn't actually complete when PushEvent finishes writing.

I can come up with two approaches to deal with this:

We can write to the SSH log instead. This isn't as nice since it disconnects timing information from push information and puts application-specific information into the SSH log. The advantage is that ssh-exec fully manages the SSH log.
We can try to tie the SSH request to the PushEvent by passing some request identifier down or some log ID up.

I tentatively favor approach (2), and passing something down seems better than passing something up, so I'm going to see if I can get traction on that.

epriestley added a revision: D19249: Generate a random unique "Request ID" for SSH requests so processes can coordinate better.Mar 22 2018, 7:56 PM

epriestley added a revision: D19250: Record lock timing information on PushEvents.Mar 22 2018, 8:09 PM

epriestley added a commit: rP859b27497095: Provide more information to users during `git push` while waiting for write….Mar 22 2018, 8:42 PM

epriestley added a commit: rP69bff489d4ec: Generate a random unique "Request ID" for SSH requests so processes can….Mar 22 2018, 8:44 PM

epriestley added a commit: rPdf3c937dab8c: Record lock timing information on PushEvents.

• pasik added a subscriber: • pasik.May 12 2018, 2:08 PM

joshuaspence added a subscriber: joshuaspence.Jul 10 2018, 10:37 AM

epriestley mentioned this in T13202: Plans: 2018 Week 38-40 Bonus Content.Oct 3 2018, 3:15 PM

amckinley added a subscriber: amckinley.Oct 3 2018, 6:11 PM

nickz added a subscriber: nickz.Oct 5 2018, 3:39 AM

epriestley added a revision: D19734: Try to route cluster writes to nodes which won't need to synchronize first.Oct 5 2018, 8:35 PM

epriestley added a revision: D19735: Explicitly shuffle nodes before selecting one for cluster sync.Oct 5 2018, 9:01 PM

epriestley added a commit: rP51073b972ece: Try to route cluster writes to nodes which won't need to synchronize first.Oct 17 2018, 3:08 PM

epriestley added a commit: rPbc6c8c0e93a7: Explicitly shuffle nodes before selecting one for cluster sync.Oct 17 2018, 3:11 PM

@epriestley : our current phabricator version is c46be2a70b4db72d61e76690ef095384de2c3f91 that was landed on 04/13, will there be any issues if we directly cherry pick the recent two versions without pulling any other changes?
rP51073b972ece: Try to route cluster writes to nodes which won't need to synchronize first.Wed, Oct 17, 3:08 PM
rPbc6c8c0e93a7: Explicitly shuffle nodes before selecting one for cluster sync.

Also, are there any dependent changes on arc, lib and schema if we cherry pick the two versions?

I think you can cherry-pick those changes safely. I suspect further changes connected to T13211 and/or T10884 in the near future are likely to be difficult to cherry-pick, though, so ideally you should try to upgrade.

(It's possible you'll also need D19701 and D19702, and those need D19720 (in libphutil/), so even just picking these changes may be a bit of a mess.)

epriestley moved this task from Backlog to Clusters on the Diffusion board.Apr 15 2019, 3:33 PM

Improve cluster write feedback and write routingOpen, NormalPublicActions

Description

Revisions and Commits

Related Objects

Event Timeline

Improve cluster write feedback and write routing
Open, NormalPublic
Actions