Currently, when choosing a repository, database, or notification server to connect to, we primarily shuffle() all valid servers and choose randomly.
This isn't usually a bad way to rank them, and is probably in the ballpark of optimal when all servers are identical. However, if you deploy a cluster across multiple regions, some of the servers may be much closer to you on the network than others are, and it would often be better to try those first and send relatively little traffic to nonregional datacenters.
If the only additional input to the ranking algorithm that we want to use is network latency, it might be easiest to detect this automatically rather than requiring users to configure it. We can keep a record of how long it took to connect to servers from the current host, and favor nearby servers over distant servers based on actual measured latency.
However, this is a bit magical and there may be some other use cases for selecting a different server? Maybe? I don't have a great one, really:
- Implicit magic is often bad; this is pretty magical.
- Gut feeling is that network latency almost certainly isn't the only reasonable concern here.
- You have a primary office in California, and a satellite office in Antartica. You want Antartica to write to California so that the penguin engineers bear the full cost of their high latency and minimally disrupt the California office, where 99% of your engineering headcount is. (If there are primarily-Antarctic repositories, they could be on a second cluster configured with Antarctic masters).
It would be nice to get a better sense of use cases where "use measured latency" gets the wrong result before designing a solution here.