Page MenuHomePhabricator

Automatically degrade to read-only mode when unable to connect to the master
ClosedPublic

Authored by epriestley on Apr 10 2016, 1:53 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Dec 12, 4:43 AM
Unknown Object (File)
Thu, Dec 12, 4:43 AM
Unknown Object (File)
Thu, Dec 12, 4:43 AM
Unknown Object (File)
Thu, Dec 12, 4:17 AM
Unknown Object (File)
Tue, Dec 10, 3:24 PM
Unknown Object (File)
Fri, Dec 6, 3:46 PM
Unknown Object (File)
Wed, Nov 27, 4:11 AM
Unknown Object (File)
Sat, Nov 23, 5:01 AM
Subscribers
None
Tokens
"Grey Medal" token, awarded by avivey.

Details

Summary

Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.

If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.

Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.

This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.

Test Plan
  • Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
  • Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
  • Made stuff work, hit some good pages.
  • Hit some non-cluster pages.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

epriestley retitled this revision from to Automatically degrade to read-only mode when unable to connect to the master.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: chad.
chad edited edge metadata.
This revision is now accepted and ready to land.Apr 10 2016, 4:04 PM
This revision was automatically updated to reflect the committed changes.