Allow Phabricator to run in Read-Only Mode
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	epriestley
	Mar 7 2014, 5:44 PM

Description

This came up in IRC, but there's some interest in read-only Phabricator. Some use cases:

Lower-effort alternative to real HA which doesn't require intervention to switch or manage database masters. This was the driving use case here. Basically, trade writes away to get lower administrative costs.
Allow Phabricator to stay up in read-only mode while updating, performing maintenance, etc, without risking data loss. This has come up as a nice-to-have in the past.

I think we can do this relatively easily. Roughly:

Implement a read-only config flag. In this mode, we throw when establishing a write connection.
No-op all the nonessential writes we perform when read-only is set (user access logging and cache fills come to mind).
ApplicationSearch might be a little tricky, but should be able to switch to a pure GET mode.
Eventually, remove write options from the UI, too, to make it clearer to users that they can't do writes.

The only tricky part that's critical is that we need an alternate, writable session store. Some ideas on this:

The most obvious thing is to write things to SQLite or disk, but this seems fairly complicated, won't generalize, and won't scale across multiple readers.
We could write to a second MySQL, or a specific table in MySQL which is only present for readers. This would work, but require more work and configuration from everyone.
We could issue sessions without requiring storage:
- Define a new readonly session type which is only valid for readonly installs.
- The session cookie is <time, hash(time + user secret + application secret)> and good for a short period of time. Where we currently extend the session, we rotate the cookie instead.
- This isn't as nice as a writable session store from a security perspective, but is much simpler than alternatives.

After thinking about this, I lean heavily toward just defining a read-only session type for dealing with auth, since it's like 10x less dedicated code than anything else, and can be made approximately as secure.

Revisions and Commits

rPHU libphutil
	D15673	rPHUa46f7b7e5d80 Provide a way to explicitly establish a database connection
	D15661	rPHU1dddbacbb252 Add a read-only flag to database connections
rP Phabricator
	D15679	rPac35246d0d54 Never sever non-cluster database; write more read-only documentation
	D15677	rPebff07d01983 Automatically sever databases after prolonged unreachability
	D15674	rP146fb646f92b Automatically degrade to read-only mode when unable to connect to the master
	D15672	rPe0a8cac703d1 When no master database is configured, automatically degrade to read-only mode
	D15671	rP071741c61da7 When Phabricator is in read-only mode, explain why
	D15668	rP6a4a9bb2d284 When `cluster.databases` is configured, read the master connection from it
	D15667	rP0439645d5bb0 Add a "Database Cluster Status" console in Config
	D15663	rP3f51b7853956 Lay `cluster.databases` configuration groundwork for database clustering
	D15662	rP49d93dcf98aa Add a `cluster.read-only` option

Related Objects
Search...

Status	Assigned	Task
Duplicate	epriestley	T4209 Multiserver / High-Availability Configuration
Resolved	epriestley	T10751 Make Phabricator Highly Available
Resolved	epriestley	T4571 Allow Phabricator to run in Read-Only Mode
Resolved	epriestley	T10758 Make `bin/storage dump` replica-aware
Resolved	epriestley	T6996 Write an `--output <file>` mode for `storage dump` which can gzip
Resolved	epriestley	T10759 Run PhabricatorDatabase/MySQLSetupCheck against all configured replicas
Resolved	None	T6710 Support "timeout" in vanilla MySQL connections
Open	None	T8543 Prevent write queries from executing on "r" connections in LiskDAO
Open	epriestley	T10813 Maybe fix various statements that MySQL statement-based replication gets upset about

Event Timeline

epriestley created this task.Mar 7 2014, 5:44 PM

epriestley claimed this task.

epriestley raised the priority of this task from to Normal.

epriestley updated the task description. (Show Details)

epriestley added subscribers: epriestley, jbrown.

epriestley edited this Maniphest Task.Mar 7 2014, 5:44 PM

jbrown added a subscriber: frgtn.Mar 7 2014, 5:48 PM

joshuaspence added a subscriber: joshuaspence.Jun 13 2014, 6:03 PM

bartus added a subscriber: bartus.Jun 15 2014, 6:08 PM

jevripio added a subscriber: jevripio.Jun 17 2014, 9:22 AM

hach-que added a subscriber: hach-que.Aug 2 2014, 1:45 AM

• chasemp awarded a token.Aug 12 2014, 6:05 PM

• chasemp added a subscriber: • chasemp.

+@btrahan, per offline conversation.

We don't need this for v0, because we'll stop the world (prevent reads and writes to all instances) for schema changes. This shouldn't be hugely worse than a more subtle approach until we're at greater scale.

We probably don't need it for v1, since the v1 plan is probably double writes: push new code to all webservers, then migrate instances using full R/W stops one at a time.

Somewhere around v2 we could replace "stop all reads and writes" with some sort of dance like "put instance in readonly mode; take master out of the pool; stop replication; upgrade master; put master back online; take replica offline; resume writes; resume replication; return replica to pool once it catches up".

When moving logical instance DBs between physical DB servers, having readonly also lets us improve from "stop the instance, dump data, load data, start instance" to "put instance in readonly, dump data, load data, resume writes", and eventually to "dump data, load data, being replication, once replication catches up swap which node is master, resume writes".

Finally, this gives us a better disaster recovery story, where we might be able to restore a damaged instance as readonly (e.g., from the last backup) while recovering a better dataset (e.g., replaying replication logs).

I'm not sure if we'll feel more pressure to improve migrations or improve load spreading or improve disaster recovery (hopefully not) first. There may also be a variety of ways we can cheat on migrations (for example, migrations which only add new tables can just be pushed to the DB tier first without any downtime), so I'm also not sure how far we can get without building this.

However, since we're very likely to need it eventually and it will impact a lot of stuff (there's probably a fair amount of effort in making applications understand readonly mode so we get something serviceable out of it) it may be worthwhile to begin building sooner rather than later so we don't have any surprises if the need for it becomes more urgent later on.

epriestley moved this task from Backlog to Do After Launch on the Phacility board.Nov 22 2014, 11:09 PM

sokcevic added a subscriber: sokcevic.Dec 1 2014, 5:54 PM

Cluster tokens (T5955, D10990) will also need a readonly alternative once this starts getting built out.

epriestley moved this task from Do After Launch to Do Eventually on the Phacility board.Feb 20 2015, 3:57 PM

epriestley mentioned this in T8209: Migrations which use handles may fail after introduction of cache columns.May 15 2015, 12:47 PM

In T8209, we hit an issue where a migration performed a cache fill as a side effect of loading handles. This created a problem because the cache fill happened from a migration that runs before the cache was introduced.

This sort of interaction is likely to become more common in the future now that we're dipping our toes into some level of readthrough caching (see T7707).

We could reduce the likelihood of surprises during migrations by introducing a "migration" level between "read/write" and "read only". This level would disable side-effect writes (cache fills, auxiliary logging, multimeter, xhprof profiling) but not complain about explicit writes. This is a little bit magical, but I think it's generally OK: it doesn't really add appreciable complexity over having "read/write" vs "read only" modes (code still needs to handle both cases), and it should meaningfully reduce our exposure to incidental writes during migrations. I think we'd increase stability overall by making this tradeoff between complexities.

epriestley mentioned this in D12858: Don't use handles in the Calendar event name migration.May 15 2015, 12:54 PM

epriestley mentioned this in T8301: Unapproved users should be allowed to view objects just like anonymous users..May 23 2015, 12:30 PM

epriestley mentioned this in T8685: Move secure.phabriactor.com halfway into the cluster.Jun 26 2015, 5:54 PM

eadler added a subscriber: eadler.Jun 26 2015, 9:47 PM

cburroughs added a subscriber: cburroughs.Jul 6 2015, 2:38 PM

devurandom added a subscriber: devurandom.Aug 19 2015, 5:43 AM

greggrossmeier added a subscriber: greggrossmeier.Sep 3 2015, 6:04 PM

epriestley mentioned this in T8594: Develop system for dumping/backing up Phriction documents.Sep 10 2015, 12:12 AM

Pawka added a subscriber: Pawka.Oct 15 2015, 11:48 AM

polybuildr added a subscriber: polybuildr.Nov 14 2015, 1:28 AM

BYK added a subscriber: BYK.Jan 7 2016, 8:53 PM

epriestley mentioned this in Q288: Importing from another Phabricator install (Answer 316).Jan 24 2016, 1:19 PM

epriestley mentioned this in T10751: Make Phabricator Highly Available.Apr 8 2016, 6:23 PM

epriestley added a parent task: T10751: Make Phabricator Highly Available.

epriestley added a revision: D15661: Add a read-only flag to database connections.Apr 8 2016, 10:25 PM

epriestley added a revision: D15662: Add a `cluster.read-only` option.Apr 8 2016, 10:41 PM

epriestley added a commit: rPHU1dddbacbb252: Add a read-only flag to database connections.Apr 9 2016, 11:15 AM

epriestley created subtask T10758: Make `bin/storage dump` replica-aware.Apr 9 2016, 11:40 AM

epriestley created subtask T10759: Run PhabricatorDatabase/MySQLSetupCheck against all configured replicas.Apr 9 2016, 12:56 PM

epriestley changed the edit policy from "All Users" to "Community (Project)".

epriestley added a revision: D15663: Lay `cluster.databases` configuration groundwork for database clustering.Apr 9 2016, 1:04 PM

will this task support setting a separate read-only username and password for database connections? This is useful in our environment to access the slaves.

I think the answer is yes:

You can configure separate credentials when connecting to each configured host, e.g. writer/trustno1 when connecting to db001 and reader/hunter2 when connecting to db002.

You can't configure multiple sets "read" and "write" credentials for a single host, so if db001 caught on fire and you were promoting db002, but it was currently set up to connect with reader credentials, you would need to change the credentials at the same time you changed its role from replica to master.

Does that answer your question?

(Is there any value in using dedicated reader credentials beyond having an extra layer of certainty that we won't be able to write to replicas?)

(This is also all subject to change since it's largely fantasy today.)

Ah, I see. In my world master promotion independently of the web tier. The web hosts connect over a VIP which always reaches the current master. We can also connect over a VIP to reach the slaves but don't currently use this.

We can just configure different users and passwords for each VIP. Thanks!

Yeah, that should work fine. To promote, you'd update the VIP stuff and Phabricator could remain in the dark.

There may be some additional safety checks in the future which would fail in this set of conditions (for example, a way to have Phabricator verify that the MySQL server-id has the expected value). Rebinding a VIP could cause these checks to fail, but I can make sure there are ways to opt-out of these if I do add them. I think these are probably very low-value anyway and likely not worth adding.

epriestley added a commit: rP49d93dcf98aa: Add a `cluster.read-only` option.Apr 9 2016, 8:40 PM

epriestley added a commit: rP3f51b7853956: Lay `cluster.databases` configuration groundwork for database clustering.

epriestley merged a task: T6710: Support "timeout" in vanilla MySQL connections.Apr 9 2016, 9:07 PM

epriestley added a subtask: T6710: Support "timeout" in vanilla MySQL connections.

epriestley mentioned this in T6710: Support "timeout" in vanilla MySQL connections.

epriestley added a revision: D15667: Add a "Database Cluster Status" console in Config.Apr 9 2016, 10:24 PM

epriestley added a revision: D15668: When `cluster.databases` is configured, read the master connection from it.Apr 10 2016, 3:33 AM

epriestley added a commit: rP0439645d5bb0: Add a "Database Cluster Status" console in Config.Apr 10 2016, 3:34 AM

eadler added a subtask: T8543: Prevent write queries from executing on "r" connections in LiskDAO.Apr 10 2016, 6:12 AM

epriestley added a revision: D15671: When Phabricator is in read-only mode, explain why.Apr 10 2016, 11:30 AM

epriestley added a revision: D15672: When no master database is configured, automatically degrade to read-only mode.Apr 10 2016, 12:24 PM

epriestley added a revision: D15673: Provide a way to explicitly establish a database connection.Apr 10 2016, 12:37 PM

epriestley added a revision: D15674: Automatically degrade to read-only mode when unable to connect to the master.Apr 10 2016, 1:53 PM

epriestley added a commit: rPHUa46f7b7e5d80: Provide a way to explicitly establish a database connection.Apr 10 2016, 7:17 PM

epriestley added a commit: rP6a4a9bb2d284: When `cluster.databases` is configured, read the master connection from it.

epriestley closed subtask T6710: Support "timeout" in vanilla MySQL connections as Resolved.

epriestley added a commit: rP071741c61da7: When Phabricator is in read-only mode, explain why.

epriestley added a commit: rPe0a8cac703d1: When no master database is configured, automatically degrade to read-only mode.Apr 10 2016, 7:19 PM

epriestley added a commit: rP146fb646f92b: Automatically degrade to read-only mode when unable to connect to the master.

epriestley added a revision: D15677: Automatically sever databases after prolonged unreachability.Apr 10 2016, 9:51 PM

epriestley added a revision: D15679: Never sever non-cluster database; write more read-only documentation.Apr 11 2016, 1:50 PM

epriestley added a commit: rPebff07d01983: Automatically sever databases after prolonged unreachability.Apr 11 2016, 3:43 PM

epriestley added a commit: rPac35246d0d54: Never sever non-cluster database; write more read-only documentation.

epriestley closed subtask T10758: Make `bin/storage dump` replica-aware as Resolved.Apr 14 2016, 8:23 PM

epriestley created subtask T10813: Maybe fix various statements that MySQL statement-based replication gets upset about.Apr 14 2016, 8:55 PM

The foundation for this now exists, and core features like browsing user profiles, wiki pages, tasks, etc., appear to work properly in read-only mode.

A lot of stuff still doesn't work, or doesn't work as well as it could. Some of this is cosmetic (for example, we shouldn't offer you a comment form on objects in read-only mode) while some of it is more fundamental (for example, you can't establish new login sessions).

T10769 is tracking this followup work. I expect some of this is stuff we'll pursue soon, while some of it (like DarkConsole not working) is stuff that no one will ever really care about.

epriestley closed subtask T10759: Run PhabricatorDatabase/MySQLSetupCheck against all configured replicas as Resolved.Nov 21 2016, 11:55 PM

urzds added a subscriber: urzds.Jul 12 2017, 11:15 AM

Allow Phabricator to run in Read-Only ModeClosed, ResolvedPublicActions

Description

Revisions and Commits

Related ObjectsSearch...

Event Timeline

Allow Phabricator to run in Read-Only Mode
Closed, ResolvedPublic
Actions

Related Objects
Search...