Page MenuHomePhabricator

Automate more of the migration process for moving instances across shards in the Phacility cluster
Closed, ResolvedPublic

Description

See T11665 for context: several above-average-sized active instances have ended up co-hosted on the same shard, which is unevenly loading that shard.

A good tool here would be to be able to pick the biggest instance and move it to a new, empty shard.

In general, moving instances between shards is something we need in the long run, and need in a more general sense for private clustering (T11230).

In theory, we would do this seamlessly by configuring double writes, making the new shard authoritative, and turning off the configuration pointing at the old shard. In practice, I think this won't work well. It would mean that the database server was replicating only some tables (and possibly a master for some databases and replica for other databases, at least briefly), which seems fairly fragile. This also requires MySQL service restarts, which impact all instances instead of only the moving instance.

I think a "stop the world" approach is likely better: suspend the instance, copy the data, swap the config, restore the instance. This results in some downtime for the instance, but should generally be simpler and cleaner.

I'm planning to:

  • bring up a new shard;
  • migrate a test instance;
  • document the process;
  • migrate a real upstream instance;
  • migrate a real user instance.

I'll automate this if possible, but I suspect we may need to refine the process a bit first.

Revisions and Commits

Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision

Event Timeline

I've provisioned and deployed a new shard, but not yet opened it for allocations. I'm going to launch and then migrate a test instance next.

I'm expecting to do this:

  • Suspend the instance.
  • Stop the daemons.
  • Trigger a backup on the old repository shard + upload the data (in theory, host dump should work on repository shards).
  • Trigger a dump on the database shard (this automatically uploads the data).
  • Download and extract the backup on the new repository shard.
  • Download and load the backup on the new database shard.
  • Now comes the fiddly bit:
    • Swap the services in the admin console.
    • Sync the instance.
      • This does not destroy the old services. That's probably an issue, since new repositories will allocate on it. Manually destroy it instead?
    • Update all almanacServicePHIDs in repository to point at the new service.
    • Update all devicePHID in repository_workingcopyversion to point at the new device.
  • Unsuspend the instance.
  • Start the daemons.
  • Suspend the instance: AdminInstanceManage InstanceSuspend Instance
  • Stop the daemons:
old-repo$ PHABRICATOR_INSTANCE=turtle /core/lib/phabricator/bin/phd stop
  • Trigger a repository backup:
old-repo$ ./bin/host backup --instance turtle --name turtle-migration
  • Trigger a dump:
old-db$ ./bin/host dump --instance test-ln6fysaztujc
  • Upload the backup.
old-repo$ ./bin/host upload --file /core/bak/turtle-migration/turtle/turtle-migration.turtle.repositories.tgz
  • Note the database dump file PHID and repository dump file PHID.
  • Move to the new hosts. Download and import the database dump:
new-db$ ./bin/host download --phid <DATABASE-DUMP-PHID> --save-as turtle.dbdump.gz
new-db$ gzip -cd turtle.dbdata.gz | mysql -uroot -A
  • Download and extract the repository dump:
new-repo$ cd /core/repo/turtle/
new-repo$ /core/bin/host download --phid <REPOSITORY-DUMP-PHID> --save-as turtle.repodump.tgz
new-repo$ tar -xzvvf turtle.repodump.tgz --strip-components=5 # Tarball format is slightly goofy.

Now, time for the fiddly bit.

  • Swap services in: AdminInstanceManage InstanceEdit Services
  • Upgrade the instance to create accounts and sync it.
  • Remove the cluster cache:
new-db$ rm /core/tmp/cache/cluster/turtle.*
new-repo$ rm /core/tmp/cache/cluster/turtle.*
  • Swap, then destroy the old services:
new-db$ ./bin/host query --instance turtle --query 'SELECT phid, name FROM <INSTANCE>_almanac.almanac_service'
= turtle	PHID-ASRV-gcw2ls7e5cji3jyopff3	dbx005.phacility.net
= turtle	PHID-ASRV-zirvqlicsfn4sm72ddyz	repox005.phacility.net
= turtle	PHID-ASRV-cmvzyjdjwtxw5slu2q5q	services.phacility.net
= turtle	PHID-ASRV-rmqdbdlfwzbdu37hgwry	dbx006.phacility.net
= turtle	PHID-ASRV-tramkrrnlj2sy4quk6np	repox006.phacility.net
new-db$ ./bin/host query --instance turtle --query 'SELECT phid, name FROM <INSTANCE>_almanac.almanac_device'
= turtle	PHID-ADEV-wbtoqovrd6fktsij6u2d	daemon.phacility.net
= turtle	PHID-ADEV-hmcie2obnwzcyzhycpen	db005.phacility.net
= turtle	PHID-ADEV-5i4juu6f437y7srnoogf	repo005.phacility.net
= turtle	PHID-ADEV-udjv3tyedvdgdigb735q	web.phacility.net
= turtle	PHID-ADEV-d2hd3fc235t2kgxnf2qo	db006.phacility.net
= turtle	PHID-ADEV-dylswgfmtlerwlzvrqlg	repo006.phacility.net
new-db$ # Save these lists.
new-db$ mysql -uroot -A
mysql> UPDATE turtle_repository.repository SET almanacServicePHID = 'PHID-ASRV-tramkrrnlj2sy4quk6np' WHERE almanacServicePHID = 'PHID-ASRV-zirvqlicsfn4sm72ddyz';
mysql> UPDATE turtle_repository.repository_workingcopyversion SET devicePHID = 'PHID-ADEV-dylswgfmtlerwlzvrqlg' WHERE devicePHID = 'PHID-ADEV-5i4juu6f437y7srnoogf';
mysql> exit
new-db$ PHABRICATOR_INSTANCE=turtle /core/lib/phabricator/bin/remove destroy PHID-ASRV-gcw2ls7e5cji3jyopff3
new-db$ PHABRICATOR_INSTANCE=turtle /core/lib/phabricator/bin/remove destroy PHID-ASRV-zirvqlicsfn4sm72ddyz
new-db$ PHABRICATOR_INSTANCE=turtle /core/lib/phabricator/bin/remove destroy PHID-ADEV-hmcie2obnwzcyzhycpen
new-db$ PHABRICATOR_INSTANCE=turtle /core/lib/phabricator/bin/remove destroy PHID-ADEV-5i4juu6f437y7srnoogf
  • Unsuspend the instance.
  • Restart the daemons.

I migrated a test instance, which appears to have worked properly. I'm going to run though things again, then move a large instance off repo002.

I moved the largest instance from shard 002 to shard 006. This took about 35 minutes, most of which was backup/transfer time. I hit a couple of minor issues, but things appear to have worked properly overall.

epriestley renamed this task from Move instances across shards in the Phacility cluster to Automate more of the migration process for moving instances across shards in the Phacility cluster.Sep 22 2016, 10:09 PM
epriestley moved this task from Backlog to Do Eventually on the Phacility board.

Some of the syntax has changed slightly from the above, but I think the overall plan remains basically correct.

I plan to separate "migrate repositories" and "migrate databases" in anticipation of dropping the db003 = repo003 constraint. So we'll probably end up with something like:

$ bin/instances migrate-repo --to repox-asdlnbq --instance xyz
$ bin/instances migrate-db --to dbx-23f892 --instance xyz

I think "suspend" sends email now too, and I want to separate migrations from email (e.g., no point in emailing all the suspended/disabled instances) so I'll probably tweak how that works a bit. It would also be bad if moving a suspended instance un-suspended it, and we should probably have something that's more like a "maintenance" flag which is separate from instance status so we can set it / clear it on disabled/suspended instances without changing their state. We don't need anything until we start touching live instances, though.

epriestley added a revision: Restricted Differential Revision.Jun 6 2017, 12:39 PM
epriestley added a revision: Restricted Differential Revision.Jun 6 2017, 1:22 PM
epriestley added a revision: Restricted Differential Revision.Jun 6 2017, 3:27 PM
epriestley added a commit: Restricted Diffusion Commit.Jun 7 2017, 10:25 AM
epriestley added a commit: Restricted Diffusion Commit.

I think we have a generally tricky situation on the database stuff here. The general algorithm this process is using so far is:

  • Put the instance in maintenance mode (right now, this is just a "TODO" comment since I haven't written this mode).
  • Add the target service (that we're moving to) to the instance.
  • (Actually migrate.)
  • Remove the source service (that we moved from).
  • Take the instance out of maintenance mode.

So, in the middle there, the instance is bound to both the source and destination services.

That's fine if the services are repository services (and, at least in theory, this is perfectly acceptable as a valid runtime state -- you can have multiple repository services), but it's a problem if the services are database services since this definitely isn't a valid runtime state.

For the database services I think we probably need to introduce some additional instance state to track which service is the actual master, vs services which are just attached, and have a separate "promote to master" step in the middle where we swap.

This touches T12801 a bit, since the "select a concrete database master" phase of things is currently part of the Instances > Almanac sync step.


A semi-related issue is that when we're doing the "restore" step, I'd like to move any data on the target host aside first. That is, if we're about to restore a bunch of stuff to repos/example/ but that directory already exists, we should move it to repos/example.moved-aside.20170611.zefnokf.bak~/ first. This prevents data loss if someone (for example) manages to screw up the migration direction and move data "from" an empty host "to" a full host.

This is easy for repositories (mv works fine) but a little tougher for databases. Two issues are:

  • We probably can't use bin/host dump or any other command which relies on bin/storage, since the instance will still be pointing at the old database. We could add flags like --use-the-new-database x.y.z.q but this feels a little complicated and error prone to me.
  • These commands would not necessarily produce the right result anyway, because the schema version at HEAD and the schema version of the "left over" data we find on the target host may differ -- so the set of databases on the target may not be the same set of databases we're moving. If we just dumped the databases we're expecting that's good enough to avoid overwriting/destroying anything, but might produce an incomplete leftovers.moved-aside.sql (missing some databases which we've dropped since the last time the target was upgraded). Some of these databases may also not exist (if we've added databases since the last time the target was upgraded) so bin/storage commands would fail.

I think the best approach here is probably to just mysqldump any databases with the instance prefix rather than try to go through the whole normal backup/dump pathway. This will duplicate a little bit of code, but always work properly regardless of the target's state and it's simpler overall, both of which seem desirable.


So my next step here is a bin/host restore command which roughly does this:

  • Operate in two modes: either bin/host restore --path <file> (restore from a file on disk, e.g. a normal backup) or bin/host restore --download <phid> (restore from a download).
  • If we're downloading, download first.
  • Check if the host has data already.
    • If it does, move that data aside with mv or copy it to disk with mysqldump so that we're never destroying anything.
  • Restore the backup file.
  • If we downloaded data to disk earlier, remove it.

Thinking about this a little more, there's really no reason to do any of the rebinding stuff until after we move the data. I did that earlier when moving manually, largely for convenience -- but we don't need to sequence things like that when automating, particularly if we don't need bin/storage to work on the new database.

This would work just as well:

  • Enter maintenance mode.
  • Move the data.
  • Swap and sync services.
  • Leave maintenance mode.

I think that's also generally simpler and a bit easier to resume if interrupted.

Actually, from the logs above I didn't even do it manually and just made it up for no reason.

epriestley added a revision: Restricted Differential Revision.Jun 7 2017, 11:16 AM

Conveniently, bin/restore already exists and does much of the above, although it doesn't have a --download flag yet and its current behavior is to abort if data already exists.

epriestley added a revision: Restricted Differential Revision.Jun 7 2017, 11:22 AM
epriestley added a revision: Restricted Differential Revision.Jun 7 2017, 11:29 AM
epriestley added a revision: Restricted Differential Revision.Jun 7 2017, 11:51 AM
epriestley added a revision: Restricted Differential Revision.Jun 7 2017, 12:26 PM
epriestley added a commit: Restricted Diffusion Commit.Jun 7 2017, 4:30 PM
epriestley added a commit: Restricted Diffusion Commit.
epriestley added a commit: Restricted Diffusion Commit.
epriestley added a commit: Restricted Diffusion Commit.Jun 7 2017, 4:40 PM
epriestley added a commit: Restricted Diffusion Commit.Jun 7 2017, 4:42 PM
epriestley added a commit: Restricted Diffusion Commit.Jun 7 2017, 6:17 PM
epriestley added a revision: Restricted Differential Revision.Jun 7 2017, 6:56 PM
epriestley added a revision: Restricted Differential Revision.Jun 7 2017, 7:05 PM
epriestley added a commit: Restricted Diffusion Commit.Jun 7 2017, 7:08 PM
epriestley added a commit: Restricted Diffusion Commit.Jun 7 2017, 7:16 PM
epriestley added a revision: Restricted Differential Revision.Jun 8 2017, 12:29 AM
epriestley added a revision: Restricted Differential Revision.Jun 8 2017, 12:34 AM
epriestley added a commit: Restricted Diffusion Commit.Jun 8 2017, 1:23 PM
epriestley added a commit: Restricted Diffusion Commit.Jun 8 2017, 1:26 PM
epriestley added a revision: Restricted Differential Revision.Jun 9 2017, 3:55 PM
epriestley added a commit: Restricted Diffusion Commit.Jun 9 2017, 4:02 PM

I'm going to call this "resolved" since I migrated instances in T12817 successfully. There's followup work, but T12798 is probably a better point of consolidation for future work now than this task is.