Send some database traffic to replicas even while the master is still alive
Open, NormalPublic
Actions

Assigned To

None

Authored By

	epriestley
	May 30 2016, 5:53 PM

Description

Currently, Phabricator can be configured in a database cluster mode with a master database and zero or more read replicas.

While the master is reachable, we currently send no (normal) traffic to the read replicas. However, we could safely serve some traffic (like read-only logged-out traffic) from replicas. Doing so would reduce load on the master. The largest beneficiaries of this are likely to be active, open-source installs with a large public presence. Private installs may also benefit, but T11044 is probably generally more fruitful.

One additional class of beneficiary is installs with physically distant locations (e.g., a San Francisco office and a Mumbai office). Being able to send traffic originating in the Mumbai office to a local database server in Mumbai could improve performance substantially for users at that location.

The major technical problem that needs to be solved before we can support this is the "read-after-write" problem. It looks like this:

You submit a comment on a task. Your comment is written to the master. This happens at T+0.
We redirect back to the task page, /T123.
The server processes the request for this page at T+1. This page load is read-only, so it is served by the replica.
But! The replica has a replication delay! Your data isn't there yet. Your comment doesn't appear on the page.
You reload at T+2, T+3, and T+4, then give up. Your comment is gone! You write a long passive-aggressive tweet about how Phabricator destroyed your data.
At T+5, replication finishes and the comment would appear if you reloaded again.

See also some discussion starting at T1969#19465.

The easiest way to get started with this is probably to enable it for logged-out traffic only. This will let us navigate some special cases (like Conduit) relatively safely while avoiding the bulk of the read-after-write problem.

Afterwards, there are two major strategies we could pursue:

block on the server until replication completes after a write;
identify clients that recently wrote, and send their reads to the master for a while.

These second strategy is probably somewhat better for users, but also more complicated.

We may also need to identify pages which will write so reads before writes can go to the master. POST is a rough approximation of this. Presence of a valid CSRF token is another rough approximation. We're safe as long as we identify any superset.

We can also detect when we've already performed bad reads before we do a write so the outcome isn't unsafe, but we'd have to fatal at that point.

Likewise, we can detect when we went to the master and then didn't actually attempt a write to get a sense of how many false positives we're detecting.

Related Objects

Mentioned In: T13219: When returning a writable connection as a "r" connection, label it so it can be reused as a "w" connection
Z1336: General Chat
T10751: Make Phabricator Highly Available
Mentioned Here: T1969: Phabricator support/awareness of multiple regions
T11044: Support partitioning application databases across multiple database hosts

Event Timeline

epriestley created this task.May 30 2016, 5:53 PM

Herald added a subscriber: eadler. · View Herald TranscriptMay 30 2016, 5:53 PM

epriestley mentioned this in T10751: Make Phabricator Highly Available.May 30 2016, 6:23 PM

asherkin added a subscriber: asherkin.May 31 2016, 7:45 PM

epriestley mentioned this in Z1336: General Chat.Jan 26 2017, 11:10 PM

joshuaspence added a subscriber: joshuaspence.Feb 8 2017, 8:17 AM

epriestley mentioned this in T13219: When returning a writable connection as a "r" connection, label it so it can be reused as a "w" connection.Nov 21 2018, 4:18 PM

Send some database traffic to replicas even while the master is still aliveOpen, NormalPublicActions

Description

Related Objects

Event Timeline

Send some database traffic to replicas even while the master is still alive
Open, NormalPublic
Actions