Differential D15764 Diff 37985 src/docs/user/cluster/cluster_repositories.diviner

Changeset View

Standalone View

src/docs/user/cluster/cluster_repositories.diviner

	Show All 13 Lines
	advantages of doing this are:			advantages of doing this are:

	- you can completely survive the loss of repository hosts;			- you can completely survive the loss of repository hosts;
	- reads and writes can scale across multiple machines; and			- reads and writes can scale across multiple machines; and
	- read and write performance across multiple geographic regions may improve.			- read and write performance across multiple geographic regions may improve.

	This configuration is complex, and many installs do not need to pursue it.			This configuration is complex, and many installs do not need to pursue it.

	This configuration is not currently supported with Subversion.			This configuration is not currently supported with Subversion or Mercurial.


	Repository Hosts			Repository Hosts
	================			================

	Repository hosts must run a complete, fully configured copy of Phabricator,			Repository hosts must run a complete, fully configured copy of Phabricator,
	including a webserver. If you make repositories available over SSH, they must			including a webserver. They must also run a properly configured `sshd`.
	also run a properly configured `sshd`.

	Generally, these hosts will run the same set of services and configuration that			Generally, these hosts will run the same set of services and configuration that
	web hosts run. If you prefer, you can overlay these services and put web and			web hosts run. If you prefer, you can overlay these services and put web and
	repository services on the same hosts.			repository services on the same hosts. See @{article:Clustering Introduction}
				for some guidance on overlaying services.

	When a user requests information about a repository that can only be satisfied			When a user requests information about a repository that can only be satisfied
	by examining a repository working copy, the webserver receiving the request			by examining a repository working copy, the webserver receiving the request
	will make an HTTP service call to a repository server which hosts the			will make an HTTP service call to a repository server which hosts the
	repository to retrieve the data it needs. It will use the result of this query			repository to retrieve the data it needs. It will use the result of this query
	to respond to the user.			to respond to the user.


	Show All 9 Lines

	Before responding to a read, replicas make sure their version of the repository			Before responding to a read, replicas make sure their version of the repository
	is up to date (no node in the cluster has a newer version of the repository).			is up to date (no node in the cluster has a newer version of the repository).
	If it isn't, they block the read until they can complete a fetch.			If it isn't, they block the read until they can complete a fetch.

	Before responding to a write, replicas obtain a global lock, perform the same			Before responding to a write, replicas obtain a global lock, perform the same
	version check and fetch if necessary, then allow the write to continue.			version check and fetch if necessary, then allow the write to continue.

				Additionally, repositories passively check other nodes for updates and
				replicate changes in the background. After you push a change to a repositroy,
				it will usually spread passively to all other repository nodes within a few
				minutes.

				Even if passive replication is slow, the active replication makes acknowledged
				changes sequential to all observers: after a write is acknowledged, all
				subsequent reads are guaranteed to see it. The system does not permit stale
				reads, and you do not need to wait for a replication delay to see a consistent
				view of the repository no matter which node you ask.


	HTTP vs HTTPS			HTTP vs HTTPS
	=============			=============

	Intracluster requests (from the daemons to repository servers, or from			Intracluster requests (from the daemons to repository servers, or from
	webservers to repository servers) are permitted to use HTTP, even if you have			webservers to repository servers) are permitted to use HTTP, even if you have
	set `security.require-https` in your configuration.			set `security.require-https` in your configuration.

	Show All 11 Lines
	repository hosts and bind to them with the "https" protocol. Just be aware that			repository hosts and bind to them with the "https" protocol. Just be aware that
	the `security.require-https` setting won't prevent you from making			the `security.require-https` setting won't prevent you from making
	configuration mistakes, as it doesn't cover intracluster traffic.			configuration mistakes, as it doesn't cover intracluster traffic.

	Other mitigations are possible, but securing a network against the NSA and			Other mitigations are possible, but securing a network against the NSA and
	similar agents of other rogue nations is beyond the scope of this document.			similar agents of other rogue nations is beyond the scope of this document.


				Monitoring Replication
				======================

				You can review the current status of a repository on cluster nodes in
				{nav Diffusion > (Repository) > Manage Repository > Cluster Configuration}.

				This screen shows all the configured devices which are hosting the repository
				and the available version.

				Version: When a repository is mutated by a push, Phabricator increases
				an internal version number for the repository. This column shows which version
				is on disk on the corresponding node.

				After a change is pushed, the node which received the change will have a larger
				version number than the other nodes. The change should be passively replicated
				to the remaining nodes after a brief period of time, although this can take
				a while if the change was large or the network connection between nodes is
				slow or unreliable.

				You can click the version number to see the corresponding push logs for that
				change. The logs contain details about what was changed, and can help you
				identify if replication is slow because a change is large or for some other
				reason.

				Writing: This shows that the node is currently holding a write lock. This
				normally means that it is actively receiving a push, but can also mean that
				there was a write interruption. See "Write Interruptions" below for details.


				Write Interruptions
				===================

				A repository cluster can be put into an inconsistent state by an interruption
				in a brief window immediately after a write.

				Phabricator can not commit changes to a working copy (stored on disk) and to
				the global state (stored in a database) atomically, so there is a narrow window
				between committing these two different states when some tragedy (like a
				lightning strike) can befall a server, leaving the global and local views of
				the repository state divergent.

				In these cases, Phabricator fails into a "frozen" state where further writes
				are not permitted until the failure is investigated and resolved.

				TODO: Complete the support tooling and provide recovery instructions.


				Loss of Leaders
				===============

				A more straightforward failure condition is the loss of all servers in a
				cluster which have the most up-to-date copy of a repository. This looks like
				this:

				- There is a cluster setup with two nodes, X and Y.
				- A new change is pushed to server X.
				- Before the change can propagate to server Y, lightning strikes server X
				and destroys it.

				Here, all of the "leader" nodes with the most up-to-date copy of the repository
				have been lost. Phabricator will refuse to serve this repository because it
				can not serve it consistently, and can not accept writes without data loss.

				The most straightforward way to resolve this issue is to restore any leader to
				service. The change will be able to replicate to other nodes once a leader
				comes back online.

				If you are unable to restore a leader or unsure that you can restore one
				quickly, you can use the monitoring console to review which changes are
				present on the leaders but not present on the followers by examining the
				push logs.

				TODO: Complete the support tooling and provide recovery instructions.


	Backups			Backups
	======			======

	Even if you configure clustering, you should still consider retaining separate			Even if you configure clustering, you should still consider retaining separate
	backup snapshots. Replicas protect you from data loss if you lose a host, but			backup snapshots. Replicas protect you from data loss if you lose a host, but
	they do not let you rewind time to recover from data mutation mistakes.			they do not let you rewind time to recover from data mutation mistakes.

	If something issues a `--force` push that destroys branch heads, the mutation			If something issues a `--force` push that destroys branch heads, the mutation
	Show All 18 Lines