Changeset View
Changeset View
Standalone View
Standalone View
src/docs/user/cluster/cluster.diviner
| Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines | |||||
| Repository replicas are important for availability if you host repositories | Repository replicas are important for availability if you host repositories | ||||
| on Phabricator, but less important if you host repositories elsewhere | on Phabricator, but less important if you host repositories elsewhere | ||||
| (instead, you should focus on making that service more available). | (instead, you should focus on making that service more available). | ||||
| The distributed nature of Git and Mercurial tend to mean that they are | The distributed nature of Git and Mercurial tend to mean that they are | ||||
| naturally somewhat resistant to data loss: every clone of a repository includes | naturally somewhat resistant to data loss: every clone of a repository includes | ||||
| the entire history. | the entire history. | ||||
| Repositories may become a scalability bottleneck, although this is rare unless | |||||
| your install has an unusually heavy repository read volume. Slow clones/fetches | |||||
| may hint at a repository capacity problem. Adding more repository hosts will | |||||
| provide an approximately linear increase in capacity. | |||||
| For details, see @{article:Cluster: Repositories}. | For details, see @{article:Cluster: Repositories}. | ||||
| Cluster: Daemons | Cluster: Daemons | ||||
| ================ | ================ | ||||
| Configuring multiple daemon hosts is straightforward, but you must configure | Configuring multiple daemon hosts is straightforward, but you must configure | ||||
| repositories first. | repositories first. | ||||
| With daemons running on multiple hosts, you can transparently survive the loss | With daemons running on multiple hosts, you can transparently survive the loss | ||||
| of any subset of hosts without an interruption to daemon services, as long as | of any subset of hosts without an interruption to daemon services, as long as | ||||
| at least one host remains alive. Daemons are stateless, so spreading daemons | at least one host remains alive. Daemons are stateless, so spreading daemons | ||||
| across multiple hosts provides no resistance to data loss. | across multiple hosts provides no resistance to data loss. | ||||
| Daemons can become a bottleneck, particularly if your install sees a large | |||||
| volume of write traffic to repositories. If the daemon task queue has a | |||||
| backlog, that hints at a capacity problem. If existing hosts have unused | |||||
| resources, increase `phd.taskmasters` until they are fully utilized. From | |||||
| there, adding more daemon hosts will provide an approximately linear increase | |||||
| in capacity. | |||||
| For details, see @{article:Cluster: Daemons}. | For details, see @{article:Cluster: Daemons}. | ||||
| Cluster: Web Servers | Cluster: Web Servers | ||||
| ==================== | ==================== | ||||
| Configuring multiple web hosts is straightforward, but you must configure | Configuring multiple web hosts is straightforward, but you must configure | ||||
| repositories first. | repositories first. | ||||
| With multiple web hosts, you can transparently survive the loss of any subset | With multiple web hosts, you can transparently survive the loss of any subset | ||||
| of hosts as long as at least one host remains alive. Web hosts are stateless, | of hosts as long as at least one host remains alive. Web hosts are stateless, | ||||
| so putting multiple hosts in service provides no resistance to data loss. | so putting multiple hosts in service provides no resistance to data loss | ||||
| because no data is at risk. | |||||
| Web hosts can become a bottleneck, particularly if you have a workload that is | |||||
| heavily focused on reads from the web UI (like a public install with many | |||||
| anonymous users). Slow responses to web requests may hint at a web capacity | |||||
| problem. Adding more hosts will provide an approximately linear increase in | |||||
| capacity. | |||||
| For details, see @{article:Cluster: Web Servers}. | For details, see @{article:Cluster: Web Servers}. | ||||
| Cluster: Notifications | |||||
| ====================== | |||||
| Configuring multiple notification hosts is simple and has no pre-requisites. | |||||
| With multiple notification hosts, you can survive the loss of any subset of | |||||
| hosts as long as at least one host remains alive. Service may be breifly | |||||
| disrupted directly after the incident which destroys the other hosts. | |||||
| Notifications are noncritical, so this normally has little practical impact | |||||
| on service availability. Notifications are also stateless, so clustering this | |||||
| service provides no resistance to data loss because no data is at risk. | |||||
| Notification delivery normally requires very few resources, so adding more | |||||
| hosts is unlikely to have much impact on scalability. | |||||
| For details, see @{article:Cluster: Notifications}. | |||||
| Overlaying Services | Overlaying Services | ||||
| =================== | =================== | ||||
| Although hosts can run a single dedicated service type, certain groups of | Although hosts can run a single dedicated service type, certain groups of | ||||
| services work well together. Phabricator clusters usually do not need to be | services work well together. Phabricator clusters usually do not need to be | ||||
| very large, so deploying a small number of hosts with multiple services is a | very large, so deploying a small number of hosts with multiple services is a | ||||
| good place to start. | good place to start. | ||||
| In planning a cluster, consider these blended host types: | In planning a cluster, consider these blended host types: | ||||
| **Everything**: Run HTTP, SSH, MySQL, repositories and daemons on a single | **Everything**: Run HTTP, SSH, MySQL, notifications, repositories and daemons | ||||
| host. This is the starting point for single-node setups, and usually also the | on a single host. This is the starting point for single-node setups, and | ||||
| best configuration when adding the second node. | usually also the best configuration when adding the second node. | ||||
| **Everything Except Databases**: Run HTTP, SSH, repositories and daemons on one | **Everything Except Databases**: Run HTTP, SSH, notifications, repositories and | ||||
| host, and MySQL on a different host. MySQL uses many of the same resources that | daemons on one host, and MySQL on a different host. MySQL uses many of the same | ||||
| other services use. It's also simpler to separate than other services, and | resources that other services use. It's also simpler to separate than other | ||||
| tends to benefit the most from dedicated hardware. | services, and tends to benefit the most from dedicated hardware. | ||||
| **Repositories and Daemons**: Run repositories and daemons on the same host. | **Repositories and Daemons**: Run repositories and daemons on the same host. | ||||
| Repository hosts //must// run daemons, and it normally makes sense to | Repository hosts //must// run daemons, and it normally makes sense to | ||||
| completely overlay repositories and daemons. These services tend to use | completely overlay repositories and daemons. These services tend to use | ||||
| different resources (repositories are heavier on I/O and lighter on CPU/RAM; | different resources (repositories are heavier on I/O and lighter on CPU/RAM; | ||||
| daemons are heavier on CPU/RAM and lighter on I/O). | daemons are heavier on CPU/RAM and lighter on I/O). | ||||
| Repositories and daemons are also both less latency sensitive than other | Repositories and daemons are also both less latency sensitive than other | ||||
| Show All 12 Lines | |||||
| Cluster Recipes | Cluster Recipes | ||||
| =============== | =============== | ||||
| This section provides some guidance on reasonable ways to scale up a cluster. | This section provides some guidance on reasonable ways to scale up a cluster. | ||||
| The smallest possible cluster is **two hosts**. Run everything (web, ssh, | The smallest possible cluster is **two hosts**. Run everything (web, ssh, | ||||
| database, repositories, and daemons) on each host. One host will serve as the | database, notifications, repositories, and daemons) on each host. One host will | ||||
| master; the other will serve as a replica. | serve as the master; the other will serve as a replica. | ||||
| Ideally, you should physically separate these hosts to reduce the chance that a | Ideally, you should physically separate these hosts to reduce the chance that a | ||||
| natural disaster or infrastructure disruption could disable or destroy both | natural disaster or infrastructure disruption could disable or destroy both | ||||
| hosts at the same time. | hosts at the same time. | ||||
| From here, you can choose how you expand the cluster. | From here, you can choose how you expand the cluster. | ||||
| To improve **scalability and performance**, separate loaded services onto | To improve **scalability and performance**, separate loaded services onto | ||||
| dedicated hosts and then add more hosts of that type to increase capacity. If | dedicated hosts and then add more hosts of that type to increase capacity. If | ||||
| you have a two-node cluster, the best way to improve scalability by adding one | you have a two-node cluster, the best way to improve scalability by adding one | ||||
| host is likely to separate the master database onto its own host. | host is likely to separate the master database onto its own host. | ||||
| Note that increasing scale may //decrease// availability by leaving you with | Note that increasing scale may //decrease// availability by leaving you with | ||||
| too little capacity after a failure. If you have three hosts handling traffic | too little capacity after a failure. If you have three hosts handling traffic | ||||
| and one datacenter fails, too much traffic may be sent to the single remaining | and one datacenter fails, too much traffic may be sent to the single remaining | ||||
| host in the surviving datacenter. You can hedge against this by mirroring new | host in the surviving datacenter. You can hedge against this by mirroring new | ||||
| hosts in other datacenters (for example, also separate the replica database | hosts in other datacenters (for example, also separate the replica database | ||||
| onto its own host). | onto its own host). | ||||
| After separating databases, separating repository + daemon nodes is likely | After separating databases, separating repository + daemon nodes is likely | ||||
| the next step. | the next step to consider. | ||||
| To improve **availability**, add another copy of everything you run in one | To improve **availability**, add another copy of everything you run in one | ||||
| datacenter to a new datacenter. For example, if you have a two-node cluster, | datacenter to a new datacenter. For example, if you have a two-node cluster, | ||||
| the best way to improve availability is to run everything on a third host in a | the best way to improve availability is to run everything on a third host in a | ||||
| third datacenter. If you have a 6-node cluster with a web node, a database node | third datacenter. If you have a 6-node cluster with a web node, a database node | ||||
| and a repo + daemon node in two datacenters, add 3 more nodes to create a copy | and a repo + daemon node in two datacenters, add 3 more nodes to create a copy | ||||
| of each node in a third datacenter. | of each node in a third datacenter. | ||||
| Show All 10 Lines | |||||