Changeset View
Changeset View
Standalone View
Standalone View
src/docs/user/cluster/cluster_repositories.diviner
Show All 13 Lines | |||||
advantages of doing this are: | advantages of doing this are: | ||||
- you can completely survive the loss of repository hosts; | - you can completely survive the loss of repository hosts; | ||||
- reads and writes can scale across multiple machines; and | - reads and writes can scale across multiple machines; and | ||||
- read and write performance across multiple geographic regions may improve. | - read and write performance across multiple geographic regions may improve. | ||||
This configuration is complex, and many installs do not need to pursue it. | This configuration is complex, and many installs do not need to pursue it. | ||||
This configuration is not currently supported with Subversion. | This configuration is not currently supported with Subversion or Mercurial. | ||||
Repository Hosts | Repository Hosts | ||||
================ | ================ | ||||
Repository hosts must run a complete, fully configured copy of Phabricator, | Repository hosts must run a complete, fully configured copy of Phabricator, | ||||
including a webserver. If you make repositories available over SSH, they must | including a webserver. They must also run a properly configured `sshd`. | ||||
also run a properly configured `sshd`. | |||||
Generally, these hosts will run the same set of services and configuration that | Generally, these hosts will run the same set of services and configuration that | ||||
web hosts run. If you prefer, you can overlay these services and put web and | web hosts run. If you prefer, you can overlay these services and put web and | ||||
repository services on the same hosts. | repository services on the same hosts. See @{article:Clustering Introduction} | ||||
for some guidance on overlaying services. | |||||
When a user requests information about a repository that can only be satisfied | When a user requests information about a repository that can only be satisfied | ||||
by examining a repository working copy, the webserver receiving the request | by examining a repository working copy, the webserver receiving the request | ||||
will make an HTTP service call to a repository server which hosts the | will make an HTTP service call to a repository server which hosts the | ||||
repository to retrieve the data it needs. It will use the result of this query | repository to retrieve the data it needs. It will use the result of this query | ||||
to respond to the user. | to respond to the user. | ||||
Show All 9 Lines | |||||
Before responding to a read, replicas make sure their version of the repository | Before responding to a read, replicas make sure their version of the repository | ||||
is up to date (no node in the cluster has a newer version of the repository). | is up to date (no node in the cluster has a newer version of the repository). | ||||
If it isn't, they block the read until they can complete a fetch. | If it isn't, they block the read until they can complete a fetch. | ||||
Before responding to a write, replicas obtain a global lock, perform the same | Before responding to a write, replicas obtain a global lock, perform the same | ||||
version check and fetch if necessary, then allow the write to continue. | version check and fetch if necessary, then allow the write to continue. | ||||
Additionally, repositories passively check other nodes for updates and | |||||
replicate changes in the background. After you push a change to a repositroy, | |||||
it will usually spread passively to all other repository nodes within a few | |||||
minutes. | |||||
Even if passive replication is slow, the active replication makes acknowledged | |||||
changes sequential to all observers: after a write is acknowledged, all | |||||
subsequent reads are guaranteed to see it. The system does not permit stale | |||||
reads, and you do not need to wait for a replication delay to see a consistent | |||||
view of the repository no matter which node you ask. | |||||
HTTP vs HTTPS | HTTP vs HTTPS | ||||
============= | ============= | ||||
Intracluster requests (from the daemons to repository servers, or from | Intracluster requests (from the daemons to repository servers, or from | ||||
webservers to repository servers) are permitted to use HTTP, even if you have | webservers to repository servers) are permitted to use HTTP, even if you have | ||||
set `security.require-https` in your configuration. | set `security.require-https` in your configuration. | ||||
Show All 11 Lines | |||||
repository hosts and bind to them with the "https" protocol. Just be aware that | repository hosts and bind to them with the "https" protocol. Just be aware that | ||||
the `security.require-https` setting won't prevent you from making | the `security.require-https` setting won't prevent you from making | ||||
configuration mistakes, as it doesn't cover intracluster traffic. | configuration mistakes, as it doesn't cover intracluster traffic. | ||||
Other mitigations are possible, but securing a network against the NSA and | Other mitigations are possible, but securing a network against the NSA and | ||||
similar agents of other rogue nations is beyond the scope of this document. | similar agents of other rogue nations is beyond the scope of this document. | ||||
Monitoring Replication | |||||
====================== | |||||
You can review the current status of a repository on cluster nodes in | |||||
{nav Diffusion > (Repository) > Manage Repository > Cluster Configuration}. | |||||
This screen shows all the configured devices which are hosting the repository | |||||
and the available version. | |||||
**Version**: When a repository is mutated by a push, Phabricator increases | |||||
an internal version number for the repository. This column shows which version | |||||
is on disk on the corresponding node. | |||||
After a change is pushed, the node which received the change will have a larger | |||||
version number than the other nodes. The change should be passively replicated | |||||
to the remaining nodes after a brief period of time, although this can take | |||||
a while if the change was large or the network connection between nodes is | |||||
slow or unreliable. | |||||
You can click the version number to see the corresponding push logs for that | |||||
change. The logs contain details about what was changed, and can help you | |||||
identify if replication is slow because a change is large or for some other | |||||
reason. | |||||
**Writing**: This shows that the node is currently holding a write lock. This | |||||
normally means that it is actively receiving a push, but can also mean that | |||||
there was a write interruption. See "Write Interruptions" below for details. | |||||
Write Interruptions | |||||
=================== | |||||
A repository cluster can be put into an inconsistent state by an interruption | |||||
in a brief window immediately after a write. | |||||
Phabricator can not commit changes to a working copy (stored on disk) and to | |||||
the global state (stored in a database) atomically, so there is a narrow window | |||||
between committing these two different states when some tragedy (like a | |||||
lightning strike) can befall a server, leaving the global and local views of | |||||
the repository state divergent. | |||||
In these cases, Phabricator fails into a "frozen" state where further writes | |||||
are not permitted until the failure is investigated and resolved. | |||||
TODO: Complete the support tooling and provide recovery instructions. | |||||
Loss of Leaders | |||||
=============== | |||||
A more straightforward failure condition is the loss of all servers in a | |||||
cluster which have the most up-to-date copy of a repository. This looks like | |||||
this: | |||||
- There is a cluster setup with two nodes, X and Y. | |||||
- A new change is pushed to server X. | |||||
- Before the change can propagate to server Y, lightning strikes server X | |||||
and destroys it. | |||||
Here, all of the "leader" nodes with the most up-to-date copy of the repository | |||||
have been lost. Phabricator will refuse to serve this repository because it | |||||
can not serve it consistently, and can not accept writes without data loss. | |||||
The most straightforward way to resolve this issue is to restore any leader to | |||||
service. The change will be able to replicate to other nodes once a leader | |||||
comes back online. | |||||
If you are unable to restore a leader or unsure that you can restore one | |||||
quickly, you can use the monitoring console to review which changes are | |||||
present on the leaders but not present on the followers by examining the | |||||
push logs. | |||||
TODO: Complete the support tooling and provide recovery instructions. | |||||
Backups | Backups | ||||
====== | ====== | ||||
Even if you configure clustering, you should still consider retaining separate | Even if you configure clustering, you should still consider retaining separate | ||||
backup snapshots. Replicas protect you from data loss if you lose a host, but | backup snapshots. Replicas protect you from data loss if you lose a host, but | ||||
they do not let you rewind time to recover from data mutation mistakes. | they do not let you rewind time to recover from data mutation mistakes. | ||||
If something issues a `--force` push that destroys branch heads, the mutation | If something issues a `--force` push that destroys branch heads, the mutation | ||||
Show All 18 Lines |