Multiserver / High-Availability Configuration
Closed, DuplicatePublic
Actions

Assigned To

Authored By

	epriestley
	Dec 6 2013, 10:27 PM

Description

Discussion of multi-host / high-availability stuff. These features serve three major use cases:

Large installs that want to improve availability (e.g., if a machine dies, failover should be as painless as possible and not involve full restore from backup).
Phacility SAAS, where many installs are served by a homogenous web tier.
Scaling reads for huge/public/open source installs.

The major considerations are:

letting many web frontends and daemon hosts access a small number of copies of a repository;
having a database failover strategy (and possibly formalizing read/write databases);
having a repository failover strategy; and
routing SSH requests.

Web/Daemon Access to Repositories: Currently, webservers access repositories by running a PullLocal daemon in --no-discover mode. This keeps up-to-date copies of repositories on all the web frontends. Facebook is likely the only install which uses this, and it does not currently supported hosted repositories (they're months behind that appearing in the upstream).

Looking forward, in the Phacility SAAS case and in the general case of large installs, this is not a very scalable strategy. We tend to incur costs on the order of O(WebFrontends * NumberAndSizeOfRepositories), because each repository needs to be kept on each frontend. This will hit scaling limits fairly quickly, and we should abandon it as soon as we're able to.

The intended strategy for accomplishing this is to move all repository access to Conduit, and let Conduit route requests to the right place. The web UI already does this, although the daemons do not yet, and not all of the infrastructure is in place here. When this does work, it means that we only need one copy of each repository to exist across the host pools, and it can satisfy all of the requests to that repository. This will also let us spread repository masters across as many machines as we want, and also spread daemons across machines. Finally, we can remove the --no-discovery daemons on the web frontends and make them pure web boxes which run web processes only.

Implementation here is mostly straightforward and many of the building blocks are in place, although it will be time consuming to complete.

Database Failover: Currently, there is no official plan for setting up database contingencies. Likely, this comes in two forms:

You set up a MySQL slave, and when the master fails you point Phabricator at the slave. Phabricator doesn't need to know about this at all.
You set up one or more MySQL slaves, and when the master fails you point Phabricator at a slave. In the meantime, you tell Phabricator about the slaves and it routes read connections to them. There is some discussion of this in T1969, although that task is a sticky mire. The major difficulty with this is figuring out how to approach read-after-write.

Repository Failover: This probably looks like database failover, but we need to do more work on our side. Likely, we'll map each repository to a master and zero or more slave(s), and mirror the slaves after commits (by pushing in Git and Mercurial, and with svnsync in SVN?). Since we'll know about the slaves, we can balance reads to them. This has fewer read-after-write problems, although they're still present. Apparently Gitolite does a passable job of this, so I can double check what it's doing. This seems very easy if the readers can lag, and tractable if they aren't allowed to lag.

Routing SSH: In the large-scale case, we need to be able to receive SSH on many hosts and route it correctly. We have much of what we need in place to do this (we decode protocol frames and can detect which repository a request targets and whether it's a read or a write very quickly), but don't actually have the interface layer in place where we examine the request and decide how to route it. This needs to get built; for small installs it will just be "route locally".

There are some other considerations:

what the management UI looks like;
the conduit protocol; and
automated failover.

Management UI: Managing host clusters and roles may get complicated, especially in the Phacility case. I'm not sure if it's worthwhile to build a general-purpose tool for it -- basically, something a little bit like Facebook's SMC, where you have a central console for bringing masters down, toggling failover, etc. This might make sense for Phacility but might be an overreach for everyone else. Needs more consideration.

Conduit Protocol: We probably need to do conduit SSH support (T550) and revisit the protocol as part of the proxying junk.

Automated Failover: I don't plan to support this for now, since I think it often causes more problems than it's worth. We can look at this once everything's stable, but for now I'm assuming an admin will actually flip the failover switch if a machine bites the dust, and any detection will focus on alerting rather than recovery.

Revisions and Commits

rP Phabricator
	Closed		D9589 Query daemons across all hosts with `./bin/phd status --all`.
		D11021	rP0ce08b4d279b Work around lack of PKCS8 support in OSX ssh-keygen
		D10401	rP657b36dd0685 Allow Phabricator to accept Conduit requests signed with an SSH key
		D10400	rP8fbebce501a4 Implement storage of a host ID and a public key for authorizing Conduit between…
		D9607	rPf52fbf61170d Unify the local and global view for `./bin/phd status`.
		D9497	rP0ccebbe4b1dd Query daemons across all hosts with `./bin/phd status --all`.

Related Objects
Search...

Status	Assigned	Task
Duplicate	epriestley	T4209 Multiserver / High-Availability Configuration
Resolved	epriestley	T4292 Implement repository replication
Resolved	epriestley	T2783 Make working-copy operations service-oriented
		Restricted Maniphest Task
		Restricted Maniphest Task
		Restricted Maniphest Task
Resolved	epriestley	T6240 Implement Conduit request signing for host-to-host calls
Resolved	None	T7019 Proxy HTTP VCS traffic
Resolved	None	T7020 Proxy Diffusion Conduit requests
		Restricted Maniphest Task
Resolved	epriestley	T10366 General support for multiple URIs for a repository
Resolved	epriestley	T10860 After an inconsistent cluster repository write, consider just ignoring the lock
Open	epriestley	T10861 Provide a tool to rewind the push log for a repository
Resolved	epriestley	T4571 Allow Phabricator to run in Read-Only Mode
Resolved	epriestley	T10758 Make `bin/storage dump` replica-aware
Resolved	epriestley	T6996 Write an `--output <file>` mode for `storage dump` which can gzip
Resolved	epriestley	T10759 Run PhabricatorDatabase/MySQLSetupCheck against all configured replicas
Resolved	None	T6710 Support "timeout" in vanilla MySQL connections
Open	None	T8543 Prevent write queries from executing on "r" connections in LiskDAO
Open	epriestley	T10813 Maybe fix various statements that MySQL statement-based replication gets upset about
Resolved	epriestley	T6016 "Daemon out-of-date" detection shows incorrect status in high availability setup
Resolved	epriestley	T6017 Working copy check in Diffusion incorrectly reports that the repo has been deleted in Multiserver configuration

Event Timeline

epriestley created this task.Dec 6 2013, 10:27 PM

epriestley claimed this task.

epriestley raised the priority of this task from to Normal.

epriestley updated the task description. (Show Details)

epriestley added subscribers: epriestley, zeeg.

epriestley added a subscriber: tcook.Dec 13 2013, 1:01 AM

epriestley edited this Maniphest Task.Dec 20 2013, 12:01 AM

epriestley edited this Maniphest Task.Jan 8 2014, 4:36 PM

aarwine added a subscriber: aarwine.Jan 16 2014, 1:19 AM

jbrown added a subscriber: jbrown.Mar 7 2014, 5:20 PM

epriestley edited this Maniphest Task.Mar 7 2014, 5:44 PM

joshuaspence added a subscriber: joshuaspence.Jun 13 2014, 4:57 AM

joshuaspence edited this Maniphest Task.Jun 14 2014, 6:01 PM

bartus added a subscriber: bartus.Jun 15 2014, 6:06 PM

joshuaspence edited this Maniphest Task.Jun 16 2014, 10:41 PM

joshuaspence edited this Maniphest Task.Jun 17 2014, 1:10 AM

jevripio added a subscriber: jevripio.Jun 17 2014, 9:25 AM

joshuaspence edited this Maniphest Task.Jun 17 2014, 9:57 PM

nharkins added a subscriber: nharkins.Jun 17 2014, 11:59 PM

joshuaspence edited this Maniphest Task.Jun 18 2014, 1:44 AM

I'm presuming that in a high availability configuration, there'd be machines running Phabricator with a web interface, but are just responsible for running daemons or repository hosting / replication?

I have a diff that locks down Phabricator when configured as a daemon tier machine if that's desirable. It basically disables all non-daemon related applications and prevents all non-administrators from logging into the machine.

That's potentially useful, yes. I think a rough sketch of the plan of attack here is:

Finish T2783: Most of the remaining work on T2783 can happen at any time: convert remaining calls in Diffusion and the daemons (other than the PullLocal daemon) into Conduit calls. There should be only a handful of these left.

Build a Service Directory: This is a new application which lists all the hosts which provide services. For example: all of the machines running with some Phabricator responsibilities, all the MySQL databases, etc. It might make sense to generalize this. For example, maybe it would be reasonable to list Jenkins instances as services here for Harbormaster to use?

At Facebook, there was an internal tool called the "Service Management Console", which basically acted like DNS-with-extensions for services. You could go look up a central database (approximately, issue a DNS query for "cdb023.facebook.com", essentially) and get a list of available servers, but with a bunch of extra attributes like "this is available", "this is read/write vs read-only", etc. DBAs could swap hosts from the web UI easily, and everyone could see service status. This tool was significantly useful and I suspect it's worthwhile to build something at least slightly general.

One possibility is that this is just part of Drydock, although I think it's likely to have enough meat to justify being a separate application.

Support Host Identity and Authentication: Hosts need to know who they are (so they know which services they should provide, like Conduit vs Web, and which repositories they should host, and which calls should be routed locally vs remotely). They also need to be able to identify themselves to one another. I think the most straightforward way to do this is through private keys, which are sufficient to accomplish both goals. We can maybe even use the machine private keys (/etc/ssh_host_rsa_key, e.g.), with an option to use a specific alternate private key.

If SSH is enabled, machines can make service calls over SSH directly (ssh ... conduit method.name). If SSH is not enabled, we can sign HTTP requests using keys to achieve the same effect.

So machines would look up their public key in the service directory. If they find it, they say "Oh, I am phabweb03, I should provide web services to users" or "Oh, I am phabdaemon09, I should provide SSH/Conduit services to other hosts only, and I host these 73 repositories: ...". To make a service call, they look up the correct host and connect to it using their private key to sign an SSH/HTTP request. The other end of the connection looks up the public key and identifies the internal service.

Probably sort out Host routing?: Discussion in T5702. We have some mess around how we route "Host" headers, which might need to get sorted out here. This doesn't really block anything, but the "lock down the web UI" diff probably touches this code and makes it more complicated. By refactoring it to be more general, we could have machines which aren't supposed to provide user web services just not serve the user web services "virtual host", which could be an easy, effective way to lock segments of functionality.

In T4209#21, @epriestley wrote:

That's potentially useful, yes. I think a rough sketch of the plan of attack here is:

Finish T2783: Most of the remaining work on T2783 can happen at any time: convert remaining calls in Diffusion and the daemons (other than the PullLocal daemon) into Conduit calls. There should be only a handful of these left.

Build a Service Directory: This is a new application which lists all the hosts which provide services. For example: all of the machines running with some Phabricator responsibilities, all the MySQL databases, etc. It might make sense to generalize this. For example, maybe it would be reasonable to list Jenkins instances as services here for Harbormaster to use?

I think having this generalized would be very useful; if something like this was available, I think there's a high probability we'd be using it at our workplace.

At Facebook, there was an internal tool called the "Service Management Console", which basically acted like DNS-with-extensions for services. You could go look up a central database (approximately, issue a DNS query for "cdb023.facebook.com", essentially) and get a list of available servers, but with a bunch of extra attributes like "this is available", "this is read/write vs read-only", etc. DBAs could swap hosts from the web UI easily, and everyone could see service status. This tool was significantly useful and I suspect it's worthwhile to build something at least slightly general.

Again I think this has the potential to be highly useful; the AWS console doesn't provide the level of granularity needed when attempting to work out where services are running and what particular servers do.

One possibility is that this is just part of Drydock, although I think it's likely to have enough meat to justify being a separate application.

I'd imagine that Drydock might use the "Service Management Console" to query as to whether hosts are still in a good condition maybe? If we're generalizing the latter I'd expect there to be some way of configuring a "this host is online" ping / HTTP request or something of that nature.

Support Host Identity and Authentication: Hosts need to know who they are (so they know which services they should provide, like Conduit vs Web, and which repositories they should host, and which calls should be routed locally vs remotely). They also need to be able to identify themselves to one another. I think the most straightforward way to do this is through private keys, which are sufficient to accomplish both goals. We can maybe even use the machine private keys (/etc/ssh_host_rsa_key, e.g.), with an option to use a specific alternate private key.

If SSH is enabled, machines can make service calls over SSH directly (ssh ... conduit method.name). If SSH is not enabled, we can sign HTTP requests using keys to achieve the same effect.

So machines would look up their public key in the service directory. If they find it, they say "Oh, I am phabweb03, I should provide web services to users" or "Oh, I am phabdaemon09, I should provide SSH/Conduit services to other hosts only, and I host these 73 repositories: ...". To make a service call, they look up the correct host and connect to it using their private key to sign an SSH/HTTP request. The other end of the connection looks up the public key and identifies the internal service.

This all sounds like a great idea. In particular, Conduit over SSH probably means the daemon and storage tiers don't need to have a web server at all, since all of the API methods can be running over SSH (which is probably far more reliable anyway).

Probably sort out Host routing?: Discussion in T5702. We have some mess around how we route "Host" headers, which might need to get sorted out here. This doesn't really block anything, but the "lock down the web UI" diff probably touches this code and makes it more complicated. By refactoring it to be more general, we could have machines which aren't supposed to provide user web services just not serve the user web services "virtual host", which could be an easy, effective way to lock segments of functionality.

I think Conduit over SSH should pretty much resolve any need to have a web server running on the daemon / storage tiers. Basically any initial host-specific configuration can be done through bin/config, and we can probably route any non-locked, runtime host-specific configuration can be routed over Conduit / SSH and displayed in the service management console (for example, operations to migrate a Git repository from one host to another, or something like that?)

If SSH is enabled, machines can make service calls over SSH directly (ssh ... conduit method.name). If SSH is not enabled, we can sign HTTP requests using keys to achieve the same effect.

I'd also pretty much argue that in a High Availability configuration, you should just be using SSH here and then we don't need to bother with signing HTTP requests or running web servers at the daemon / storage tier.

I think the motivation for HTTP is likely to be performance, since the overhead of spinning up an ssh to do git cat-file in order to show a user file content may be higher than we want to pay. Pure SSH is fine for the daemons. If we can get away with it, it would definitely be nice to use pure SSH everywhere.

Specifically, performance from the Diffusion browse views.

kofalt added a subscriber: kofalt.Aug 4 2014, 6:06 AM

hach-que added a subtask: T6016: "Daemon out-of-date" detection shows incorrect status in high availability setup.Sep 2 2014, 7:24 AM

hach-que added a revision: D10400: Implement storage of a host ID and a public key for authorizing Conduit between servers.Sep 2 2014, 10:44 AM

hach-que added a revision: D10401: Allow Phabricator to accept Conduit requests signed with an SSH key.Sep 2 2014, 10:46 AM

hach-que added a commit: rP8fbebce501a4: Implement storage of a host ID and a public key for authorizing Conduit between….Oct 3 2014, 12:53 PM

epriestley added a commit: rP657b36dd0685: Allow Phabricator to accept Conduit requests signed with an SSH key.Nov 17 2014, 9:12 PM

webframp added a subscriber: webframp.Nov 24 2014, 7:42 PM

epriestley added a revision: D11021: Work around lack of PKCS8 support in OSX ssh-keygen.Dec 19 2014, 10:57 PM

epriestley added a commit: rP0ce08b4d279b: Work around lack of PKCS8 support in OSX ssh-keygen.Dec 20 2014, 12:36 AM

nickz added a subscriber: nickz.Jan 7 2015, 8:54 PM

joshuaspence added a project: Phacility.Jan 22 2015, 7:40 PM

epriestley moved this task from Backlog to Do Eventually on the Phacility board.Jan 23 2015, 11:50 AM

epriestley closed subtask T6017: Working copy check in Diffusion incorrectly reports that the repo has been deleted in Multiserver configuration as Resolved.Feb 19 2015, 5:25 PM

epriestley mentioned this in T7559: integration with reverse caching proxies such as squid, varnish.Mar 16 2015, 4:05 AM

epriestley mentioned this in Starmap.Apr 15 2015, 11:34 AM

epriestley mentioned this in T8685: Move secure.phabriactor.com halfway into the cluster.Jun 26 2015, 5:54 PM

epriestley mentioned this in T8764: Reduce impact of backups on running instances.Jul 6 2015, 3:41 PM

devurandom added a subscriber: devurandom.Aug 19 2015, 6:06 AM

greggrossmeier added a subscriber: greggrossmeier.Sep 3 2015, 6:03 PM

jra3 added a subscriber: jra3.Oct 14 2015, 9:02 PM

• jithinvmohan added a subscriber: • jithinvmohan.Nov 3 2015, 10:09 AM

20after4 added a subscriber: 20after4.Dec 4 2015, 8:45 AM

In T4209#69927, @epriestley wrote:

Specifically, performance from the Diffusion browse views.

What about using persistent SSH control connections to avoid spinning up a new connection for each request. I can't think of any reason that wouldn't work?

I think the first SSH connection is still more expensive than several HTTP connections, and we can't easily just pool all traffic from a host over a single connection because a host may run multiple instances of Phabricator -- and we already built all the HTTP stuff anyway.

Specifically, here's the progress of the stuff I outlined above:

Finish T2783: Nearly complete for Git, D11874 might be the last callsite. Needs more work for Mercurial/Subversion.
Build a Service Directory: This is the Almanac application, and in production in this role in the Phacility cluster since February.
Host Identity: Conduit has supported public/private key authentication over HTTP since early this year, and this is also in production in the Phacility cluster since launch.
Host Headers: Host handling got split out fairly nicely and is visible in Config → Sites.

On the top-level goals:

Web/Daemon Access to Repositories: Web access is effectively complete, and in production since February, with one currently known bug (T9319). Daemon access is substantially complete but needs a bit more work (the "Finish T2783" stuff).
Database Failover: No progress on this.
Repository Failover: Some indirect progress, but this mostly depends on moving T4292 forward.
Routing SSH: Complete and in production since February.
Management UI: Substantially complete (Almanac).

Overall, if you have an exceptionally detailed understanding of technically-functional-but-mostly-undocumented Phabricator features, here's roughly what you can deploy in a cluster today and soon:

	Today	After T2783	After T4292
Web Hosts	Unlimited	Unlimited	Unlimited
Daemon Hosts	1 (Must also run repos on this host)	Unlimited	Unlimited
Repository Hosts	1 (Must also run daemons on this host)	Unlimited, but losing a host impacts service for some repositories	Unlimited
Database Hosts	1	1	1
Notification Hosts	1	1	1

In cases I've noted as "Unlimited" without qualification, losing hosts does not impact service availability (except that you'll have less capacity).

In all cases, a single host can serve multiple roles (you can put a total of 2 hosts in production, put repo + daemon + web on each, and get HA on those services after T4292).

The amount of work in T2783 is not very large, but not trivial either.

The amount of work in T4292 is a bit more substantial, but I think it's well-defined and surmountable.

We haven't made any progress on databases and the pathway forward there isn't very concrete, although I don't think it's hugely complex overall.

The current HA plan for the notification server is "suffer without it until it gets fixed". It would probably be relatively easy to make this HA (or "more HA"), but it doesn't seem terribly important.

Although some of this is in production, there's essentially zero documentation on any of this (and I don't expect installs to be able to figure it out) because you can only configure "half-HA-of-unimportant-nodes" today, which is great if you're running the Phacility cluster and primarily care about serving a large number of Phabricator instances on a single hardware pool, but I assume not hugely useful for anyone else. I expect to complete at least T2783 before we have a real user-facing narrative for configuring this stuff, and ideally both T2783 and T4292, and really ideally also get databases sorted.

epriestley mentioned this in T9898: `git branch --contains X` has cost in the realm of O(commits * branches).Dec 4 2015, 1:48 PM

michel-slm added a subscriber: michel-slm.Dec 7 2015, 12:37 PM

tycho.tatitscheff added a subscriber: tycho.tatitscheff.Dec 7 2015, 12:56 PM

eadler added a project: Restricted Project.Jan 9 2016, 12:34 AM

eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jan 9 2016, 12:37 AM

• jneumann added a subscriber: • jneumann.Jan 16 2016, 1:57 AM

eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Feb 24 2016, 12:07 AM

epriestley mentioned this in T10246: Deploy Drydock in the Phacility cluster.Feb 26 2016, 2:56 PM

eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Apr 7 2016, 6:05 PM

Herald added a subscriber: eadler. · View Herald TranscriptApr 7 2016, 6:05 PM

eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Apr 7 2016, 6:07 PM

I'm merging this into T10751, which is a cleaner followup without two years of outdated history. The goals remain the same.

epriestley closed this task as a duplicate of T10751: Make Phabricator Highly Available.Apr 8 2016, 6:25 PM

epriestley closed subtask T2783: Make working-copy operations service-oriented as Resolved.Apr 8 2016, 8:41 PM

epriestley closed subtask T6016: "Daemon out-of-date" detection shows incorrect status in high availability setup as Resolved.Apr 8 2016, 8:46 PM

epriestley closed subtask T4571: Allow Phabricator to run in Read-Only Mode as Resolved.Apr 15 2016, 9:31 PM

epriestley closed subtask T4292: Implement repository replication as Resolved.May 30 2016, 5:14 PM

Herald added a subscriber: faulconbridge. · View Herald TranscriptMay 30 2016, 5:14 PM

urzds added a subscriber: urzds.Jul 12 2017, 11:13 AM

Multiserver / High-Availability ConfigurationClosed, DuplicatePublicActions

Description

Revisions and Commits

Related ObjectsSearch...

Event Timeline

Multiserver / High-Availability Configuration
Closed, DuplicatePublic
Actions

Related Objects
Search...