Page MenuHomePhabricator

"bin/storage dump" option "--for-replica" might not interact properly with "--no-indexes" and PERSISTENCE_CACHE
Closed, ResolvedPublic

Description

Currently, bin/storage dump has a --for-replica option used to seed a replica database.

It also has a --no-indexes option used to skip indexes, and dumps tables marked as cache tables (PhabricatorConfigTableSchema::PERSISTENCE_CACHE) without data in all cases.

I suspect this won't quite work properly. In particular:

  • Combining the --for-replica and --no-indexes flags should probably be explicitly disallowed.
  • In --for-replica mode, we should probably dump all data, including caches.
  • The help for the --for-replica flag should be updated with a note that it dumps more data.

Before making changes, we should verify that this is actually a problem (i.e., loading a dump with missing data into a replica and then starting replication really causes issues) but it's hard to imagine that this works properly.

Event Timeline

epriestley created this task.

An adjacent issue is that PhabricatorMarkupCache is not currently marked as having cache persistence (PhabricatorConfigTableSchema::PERSISTENCE_CACHE), so the data dumps even when we do not intend to dump data for readthrough caches.

(On one random large instance this table has 130MB of data in cache_markupcache vs 3,500MB of data in differential_changeset_parse_cache so this is likely a minor effect for most installs.)

loading a dump with missing data into a replica and then starting replication really causes issues

I did this today, see T13390. It did not cause issues. I think this operation is "safe" through luck, because these cache tables are all key-value stores which are only accessed with SELECT WHERE key = ... and INSERT IGNORE .... Both of these operations are (presumably/empirically, since nothing explodes) safe if the replica is missing the data.

Intuitively, this makes sense. If we do a read on the replica, see no data, write a cache fill on the master, and replicate that to the replica (at least, with statement-based replication), we converge to the right state. I'm not sure what binlog-based replication does here, but whatever it is is apparently also fine.

So this is probably still "wrong", but may not actually be detectable as wrong by observing application behavior.

An adjacent issue is that PhabricatorMarkupCache is not currently marked as having cache persistence

This is the only cache table defined by a LiskDAO (a higher-level way to define a table, used for "real" application-level objects). Other cache tables are defined raw in a SchemaSpec (a lower-level way to define a table).

The low-level internals operate on SchemaSpec tables, and LiskDAO objects can emit a lower-level table representation for these internals to consume.

Currently, there is no way for a LiskDAO table to specify that the table has a particular persistence flag: all LiskDAO tables have PERSISTENCE_DATA. So this is a little bit of work to fix, not just one line of config somewhere, but nothing too involved.