Page MenuHomePhabricator

D15764.id37985.diff
No OneTemporary

D15764.id37985.diff

diff --git a/src/docs/user/cluster/cluster_repositories.diviner b/src/docs/user/cluster/cluster_repositories.diviner
--- a/src/docs/user/cluster/cluster_repositories.diviner
+++ b/src/docs/user/cluster/cluster_repositories.diviner
@@ -19,19 +19,19 @@
This configuration is complex, and many installs do not need to pursue it.
-This configuration is not currently supported with Subversion.
+This configuration is not currently supported with Subversion or Mercurial.
Repository Hosts
================
Repository hosts must run a complete, fully configured copy of Phabricator,
-including a webserver. If you make repositories available over SSH, they must
-also run a properly configured `sshd`.
+including a webserver. They must also run a properly configured `sshd`.
Generally, these hosts will run the same set of services and configuration that
web hosts run. If you prefer, you can overlay these services and put web and
-repository services on the same hosts.
+repository services on the same hosts. See @{article:Clustering Introduction}
+for some guidance on overlaying services.
When a user requests information about a repository that can only be satisfied
by examining a repository working copy, the webserver receiving the request
@@ -57,6 +57,17 @@
Before responding to a write, replicas obtain a global lock, perform the same
version check and fetch if necessary, then allow the write to continue.
+Additionally, repositories passively check other nodes for updates and
+replicate changes in the background. After you push a change to a repositroy,
+it will usually spread passively to all other repository nodes within a few
+minutes.
+
+Even if passive replication is slow, the active replication makes acknowledged
+changes sequential to all observers: after a write is acknowledged, all
+subsequent reads are guaranteed to see it. The system does not permit stale
+reads, and you do not need to wait for a replication delay to see a consistent
+view of the repository no matter which node you ask.
+
HTTP vs HTTPS
=============
@@ -84,6 +95,81 @@
similar agents of other rogue nations is beyond the scope of this document.
+Monitoring Replication
+======================
+
+You can review the current status of a repository on cluster nodes in
+{nav Diffusion > (Repository) > Manage Repository > Cluster Configuration}.
+
+This screen shows all the configured devices which are hosting the repository
+and the available version.
+
+**Version**: When a repository is mutated by a push, Phabricator increases
+an internal version number for the repository. This column shows which version
+is on disk on the corresponding node.
+
+After a change is pushed, the node which received the change will have a larger
+version number than the other nodes. The change should be passively replicated
+to the remaining nodes after a brief period of time, although this can take
+a while if the change was large or the network connection between nodes is
+slow or unreliable.
+
+You can click the version number to see the corresponding push logs for that
+change. The logs contain details about what was changed, and can help you
+identify if replication is slow because a change is large or for some other
+reason.
+
+**Writing**: This shows that the node is currently holding a write lock. This
+normally means that it is actively receiving a push, but can also mean that
+there was a write interruption. See "Write Interruptions" below for details.
+
+
+Write Interruptions
+===================
+
+A repository cluster can be put into an inconsistent state by an interruption
+in a brief window immediately after a write.
+
+Phabricator can not commit changes to a working copy (stored on disk) and to
+the global state (stored in a database) atomically, so there is a narrow window
+between committing these two different states when some tragedy (like a
+lightning strike) can befall a server, leaving the global and local views of
+the repository state divergent.
+
+In these cases, Phabricator fails into a "frozen" state where further writes
+are not permitted until the failure is investigated and resolved.
+
+TODO: Complete the support tooling and provide recovery instructions.
+
+
+Loss of Leaders
+===============
+
+A more straightforward failure condition is the loss of all servers in a
+cluster which have the most up-to-date copy of a repository. This looks like
+this:
+
+ - There is a cluster setup with two nodes, X and Y.
+ - A new change is pushed to server X.
+ - Before the change can propagate to server Y, lightning strikes server X
+ and destroys it.
+
+Here, all of the "leader" nodes with the most up-to-date copy of the repository
+have been lost. Phabricator will refuse to serve this repository because it
+can not serve it consistently, and can not accept writes without data loss.
+
+The most straightforward way to resolve this issue is to restore any leader to
+service. The change will be able to replicate to other nodes once a leader
+comes back online.
+
+If you are unable to restore a leader or unsure that you can restore one
+quickly, you can use the monitoring console to review which changes are
+present on the leaders but not present on the followers by examining the
+push logs.
+
+TODO: Complete the support tooling and provide recovery instructions.
+
+
Backups
======

File Metadata

Mime Type
text/plain
Expires
Sat, Mar 15, 11:37 PM (1 w, 2 d ago)
Storage Engine
blob
Storage Format
Encrypted (AES-256-CBC)
Storage Handle
7388554
Default Alt Text
D15764.id37985.diff (5 KB)

Event Timeline