Page MenuHomePhabricator

D15763.id.diff
No OneTemporary

D15763.id.diff

diff --git a/src/docs/user/cluster/cluster_databases.diviner b/src/docs/user/cluster/cluster_databases.diviner
--- a/src/docs/user/cluster/cluster_databases.diviner
+++ b/src/docs/user/cluster/cluster_databases.diviner
@@ -6,31 +6,76 @@
Overview
========
-WARNING: This feature is a very early prototype; the features this document
-describes are mostly speculative fantasy.
-
You can deploy Phabricator with multiple database hosts, configured as a master
and a set of replicas. The advantages of doing this are:
- faster recovery from disasters by promoting a replica;
- - graceful degradation if the master fails;
- - reduced load on the master; and
+ - graceful degradation if the master fails; and
- some tools to help monitor and manage replica health.
This configuration is complex, and many installs do not need to pursue it.
-Phabricator can not currently be configured into a multi-master mode, nor can
-it be configured to automatically promote a replica to become the new master.
-
If you lose the master, Phabricator can degrade automatically into read-only
mode and remain available, but can not fully recover without operational
intervention unless the master recovers on its own.
+Phabricator will not currently send read traffic to replicas unless the master
+has failed, so configuring a replica will not currently spread any load away
+from the master. Future versions of Phabricator are expected to be able to
+distribute some read traffic to replicas.
+
+Phabricator can not currently be configured into a multi-master mode, nor can
+it be configured to automatically promote a replica to become the new master.
+There are no current plans to support multi-master mode or autonomous failover,
+although this may change in the future.
+
Setting up MySQL Replication
============================
-TODO: Write this section.
+To begin, set up a replica database server and configure MySQL replication.
+
+If you aren't sure how to do this, refer to the MySQL manual for instructions.
+The MySQL documentation is comprehensive and walks through the steps and
+options in good detail. You should understand MySQL replication before
+deploying it in production: Phabricator layers on top of it, and does not
+attempt to abstract it away.
+
+Some useful notes for configuring replication for Phabricator:
+
+**Binlog Format**: Phabricator issues some queries which MySQL will detect as
+unsafe if you use the `STATEMENT` binlog format (the default). Instead, use
+`MIXED` (recommended) or `ROW` as the `binlog_format`.
+
+**Grant `REPLICATION CLIENT` Privilege**: If you give the user that Phabricator
+will use to connect to the replica database server the `REPLICATION CLIENT`
+privilege, Phabricator's status console can give you more information about
+replica health and state.
+
+**Copying Data to Replicas**: Phabricator currently uses a mixture of MyISAM
+and InnoDB tables, so it can be difficult to guarantee that a dump is wholly
+consistent and suitable for loading into a replica because MySQL uses different
+consistency mechanisms for the different storage engines.
+
+An approach you may want to consider to limit downtime but still produce a
+consistent dump is to leave Phabricator running but configured in read-only
+mode while dumping:
+
+ - Stop all the daemons.
+ - Set `cluster.read-only` to `true` and deploy the new configuration. The
+ web UI should now show that Phabricator is in "Read Only" mode.
+ - Dump the database. You can do this with `bin/storage dump --for-replica`
+ to add the `--master-data` flag to the underlying command and include a
+ `CHANGE MASTER ...` statement in the dump.
+ - Once the dump finishes, turn `cluster.read-only` off again to restore
+ service. Continue loading the dump into the replica normally.
+
+**Log Expiration**: You can configure MySQL to automatically clean up old
+binary logs on startup with the `expire_logs_days` option. If you do not
+configure this and do not explicitly purge old logs with `PURGE BINARY LOGS`,
+the binary logs on disk will grow unboundedly and relatively quickly.
+
+Once you have a working replica, continue below to tell Phabricator about it.
Configuring Replicas
@@ -207,7 +252,38 @@
Promoting a Replica
===================
-TODO: Write this section.
+If you lose access to the master database, Phabricator will degrade into
+read-only mode. This is described in greater detail below.
+
+The easiest way to get out of read-only mode is to restore the master database.
+If the database recovers on its own or operations staff can revive it,
+Phabricator will return to full working order after a few moments.
+
+If you can't restore the master or are unsure you will be able to restore the
+master quickly, you can promote a replica to become the new master instead.
+
+Before doing this, you should first assess how far behind the master the
+replica was when the link died. Any data which was not replicated will either
+be lost or become very difficult to recover after you promote a replica.
+
+For example, if some `T1234` had been created on the master but had not yet
+replicated and you promote the replica, a new `T1234` may be created on the
+replica after promotion. Even if you can recover the master later, merging
+the data will be difficult because each database may have conflicting changes
+which can not be merged easily.
+
+If there was a significant replication delay at the time of the failure, you
+may wait to try harder or spend more time attempting to recover the master
+before choosing to promote.
+
+If you have made a choice to promote, disable replication on the replica and
+mark it as the `master` in `cluster.databases`. Remove the original master and
+deploy the configuration change to all surviving hosts.
+
+Once write service is restored, you should provision, deploy, and configure a
+new replica by following the steps you took the first time around. You are
+critically vulnerable to a second disruption until you have restored the
+redundancy.
Unreachable Masters

File Metadata

Mime Type
text/plain
Expires
Mon, Mar 24, 5:16 AM (1 w, 5 d ago)
Storage Engine
blob
Storage Format
Encrypted (AES-256-CBC)
Storage Handle
7713443
Default Alt Text
D15763.id.diff (5 KB)

Event Timeline