HomePhabricator

Automatically degrade to read-only mode when unable to connect to the master

Description

Automatically degrade to read-only mode when unable to connect to the master

Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.

If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.

Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.

This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.

Test Plan:

  • Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
  • Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
  • Made stuff work, hit some good pages.
  • Hit some non-cluster pages.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4571

Differential Revision: https://secure.phabricator.com/D15674