Page MenuHomePhabricator

Allow Phabricator to run in Read-Only Mode
Closed, ResolvedPublic

Description

This came up in IRC, but there's some interest in read-only Phabricator. Some use cases:

  • Lower-effort alternative to real HA which doesn't require intervention to switch or manage database masters. This was the driving use case here. Basically, trade writes away to get lower administrative costs.
  • Allow Phabricator to stay up in read-only mode while updating, performing maintenance, etc, without risking data loss. This has come up as a nice-to-have in the past.

I think we can do this relatively easily. Roughly:

  • Implement a read-only config flag. In this mode, we throw when establishing a write connection.
  • No-op all the nonessential writes we perform when read-only is set (user access logging and cache fills come to mind).
  • ApplicationSearch might be a little tricky, but should be able to switch to a pure GET mode.
  • Eventually, remove write options from the UI, too, to make it clearer to users that they can't do writes.

The only tricky part that's critical is that we need an alternate, writable session store. Some ideas on this:

  • The most obvious thing is to write things to SQLite or disk, but this seems fairly complicated, won't generalize, and won't scale across multiple readers.
  • We could write to a second MySQL, or a specific table in MySQL which is only present for readers. This would work, but require more work and configuration from everyone.
  • We could issue sessions without requiring storage:
    • Define a new readonly session type which is only valid for readonly installs.
    • The session cookie is <time, hash(time + user secret + application secret)> and good for a short period of time. Where we currently extend the session, we rotate the cookie instead.
    • This isn't as nice as a writable session store from a security perspective, but is much simpler than alternatives.

After thinking about this, I lean heavily toward just defining a read-only session type for dealing with auth, since it's like 10x less dedicated code than anything else, and can be made approximately as secure.

Related Objects

Event Timeline

epriestley claimed this task.
epriestley raised the priority of this task from to Normal.
epriestley updated the task description. (Show Details)
epriestley added subscribers: epriestley, jbrown.
epriestley added a subscriber: btrahan.

+@btrahan, per offline conversation.

We don't need this for v0, because we'll stop the world (prevent reads and writes to all instances) for schema changes. This shouldn't be hugely worse than a more subtle approach until we're at greater scale.

We probably don't need it for v1, since the v1 plan is probably double writes: push new code to all webservers, then migrate instances using full R/W stops one at a time.

Somewhere around v2 we could replace "stop all reads and writes" with some sort of dance like "put instance in readonly mode; take master out of the pool; stop replication; upgrade master; put master back online; take replica offline; resume writes; resume replication; return replica to pool once it catches up".

When moving logical instance DBs between physical DB servers, having readonly also lets us improve from "stop the instance, dump data, load data, start instance" to "put instance in readonly, dump data, load data, resume writes", and eventually to "dump data, load data, being replication, once replication catches up swap which node is master, resume writes".

Finally, this gives us a better disaster recovery story, where we might be able to restore a damaged instance as readonly (e.g., from the last backup) while recovering a better dataset (e.g., replaying replication logs).

I'm not sure if we'll feel more pressure to improve migrations or improve load spreading or improve disaster recovery (hopefully not) first. There may also be a variety of ways we can cheat on migrations (for example, migrations which only add new tables can just be pushed to the DB tier first without any downtime), so I'm also not sure how far we can get without building this.

However, since we're very likely to need it eventually and it will impact a lot of stuff (there's probably a fair amount of effort in making applications understand readonly mode so we get something serviceable out of it) it may be worthwhile to begin building sooner rather than later so we don't have any surprises if the need for it becomes more urgent later on.

Cluster tokens (T5955, D10990) will also need a readonly alternative once this starts getting built out.

In T8209, we hit an issue where a migration performed a cache fill as a side effect of loading handles. This created a problem because the cache fill happened from a migration that runs before the cache was introduced.

This sort of interaction is likely to become more common in the future now that we're dipping our toes into some level of readthrough caching (see T7707).

We could reduce the likelihood of surprises during migrations by introducing a "migration" level between "read/write" and "read only". This level would disable side-effect writes (cache fills, auxiliary logging, multimeter, xhprof profiling) but not complain about explicit writes. This is a little bit magical, but I think it's generally OK: it doesn't really add appreciable complexity over having "read/write" vs "read only" modes (code still needs to handle both cases), and it should meaningfully reduce our exposure to incidental writes during migrations. I think we'd increase stability overall by making this tradeoff between complexities.

will this task support setting a separate read-only username and password for database connections? This is useful in our environment to access the slaves.

I think the answer is yes:

You can configure separate credentials when connecting to each configured host, e.g. writer/trustno1 when connecting to db001 and reader/hunter2 when connecting to db002.

You can't configure multiple sets "read" and "write" credentials for a single host, so if db001 caught on fire and you were promoting db002, but it was currently set up to connect with reader credentials, you would need to change the credentials at the same time you changed its role from replica to master.

Does that answer your question?

(Is there any value in using dedicated reader credentials beyond having an extra layer of certainty that we won't be able to write to replicas?)

(This is also all subject to change since it's largely fantasy today.)

Ah, I see. In my world master promotion independently of the web tier. The web hosts connect over a VIP which always reaches the current master. We can also connect over a VIP to reach the slaves but don't currently use this.

We can just configure different users and passwords for each VIP. Thanks!

Yeah, that should work fine. To promote, you'd update the VIP stuff and Phabricator could remain in the dark.

There may be some additional safety checks in the future which would fail in this set of conditions (for example, a way to have Phabricator verify that the MySQL server-id has the expected value). Rebinding a VIP could cause these checks to fail, but I can make sure there are ways to opt-out of these if I do add them. I think these are probably very low-value anyway and likely not worth adding.

The foundation for this now exists, and core features like browsing user profiles, wiki pages, tasks, etc., appear to work properly in read-only mode.

A lot of stuff still doesn't work, or doesn't work as well as it could. Some of this is cosmetic (for example, we shouldn't offer you a comment form on objects in read-only mode) while some of it is more fundamental (for example, you can't establish new login sessions).

T10769 is tracking this followup work. I expect some of this is stuff we'll pursue soon, while some of it (like DarkConsole not working) is stuff that no one will ever really care about.