Page MenuHomePhabricator

Prevent upgrade scripts from being run multiple times simultaneously
Closed, ResolvedPublic

Description

We have puppet code for provoisioning Phabricator which includes this:

exec { "${phabricator::params::base_dir}/phabricator/bin/storage upgrade":
  command   => "${phabricator::params::base_dir}/phabricator/bin/storage upgrade --force",
  onlyif    => "${phabricator::params::base_dir}/phabricator/bin/storage status | grep -q 'Not Applied'",
  subscribe => Vcsrepo['phabricator'],
}

For us, updating Phabricator basically involves running puppet on all Phabricator hosts using parallel-ssh. I had a failure whilst upgrading Phabricator today:

notice: /Stage[main]/Phabricator::Config/Exec[/usr/src/phabricator/bin/storage upgrade]/returns: Applying patch 'phabricator:20151030.harbormaster.initiator.sql'...
notice: /Stage[main]/Phabricator::Config/Exec[/usr/src/phabricator/bin/storage upgrade]/returns: [2015-11-05 07:29:34] EXCEPTION: (AphrontQueryException) #1060: Duplicate column name 'initiatorPHID' at [<phutil>/src/aphront/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:311]
notice: /Stage[main]/Phabricator::Config/Exec[/usr/src/phabricator/bin/storage upgrade]/returns: arcanist(head=master, ref.master=2db40f995337), phabricator(head=master, ref.master=621f806e3b54, custom=1), phlab(head=master, ref.master=cd07327495c5), phutil(head=master, ref.master=ceca9d1122ea)
notice: /Stage[main]/Phabricator::Config/Exec[/usr/src/phabricator/bin/storage upgrade]/returns:   #0 AphrontBaseMySQLDatabaseConnection::throwQueryCodeException(integer, string) called at [<phutil>/src/aphront/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:275]
notice: /Stage[main]/Phabricator::Config/Exec[/usr/src/phabricator/bin/storage upgrade]/returns:   #1 AphrontBaseMySQLDatabaseConnection::throwQueryException(mysqli) called at [<phutil>/src/aphront/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:181]
notice: /Stage[main]/Phabricator::Config/Exec[/usr/src/phabricator/bin/storage upgrade]/returns:   #2 AphrontBaseMySQLDatabaseConnection::executeRawQuery(string) called at [<phutil>/src/xsprintf/queryfx.php:6]
notice: /Stage[main]/Phabricator::Config/Exec[/usr/src/phabricator/bin/storage upgrade]/returns:   #3 queryfx(AphrontMySQLiDatabaseConnection, string, string) called at [<phabricator>/src/infrastructure/storage/management/PhabricatorStorageManagementAPI.php:252]
notice: /Stage[main]/Phabricator::Config/Exec[/usr/src/phabricator/bin/storage upgrade]/returns:   #4 PhabricatorStorageManagementAPI::applyPatchSQL(string) called at [<phabricator>/src/infrastructure/storage/management/PhabricatorStorageManagementAPI.php:220]
notice: /Stage[main]/Phabricator::Config/Exec[/usr/src/phabricator/bin/storage upgrade]/returns:   #5 PhabricatorStorageManagementAPI::applyPatch(PhabricatorStoragePatch) called at [<phabricator>/src/infrastructure/storage/management/workflow/PhabricatorStorageManagementUpgradeWorkflow.php:192]
notice: /Stage[main]/Phabricator::Config/Exec[/usr/src/phabricator/bin/storage upgrade]/returns:   #6 PhabricatorStorageManagementUpgradeWorkflow::execute(PhutilArgumentParser) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:406]
notice: /Stage[main]/Phabricator::Config/Exec[/usr/src/phabricator/bin/storage upgrade]/returns:   #7 PhutilArgumentParser::parseWorkflowsFull(array) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:301]
notice: /Stage[main]/Phabricator::Config/Exec[/usr/src/phabricator/bin/storage upgrade]/returns:   #8 PhutilArgumentParser::parseWorkflows(array) called at [<phabricator>/scripts/sql/manage_storage.php:176]
err: /Stage[main]/Phabricator::Config/Exec[/usr/src/phabricator/bin/storage upgrade]/returns: change from notrun to 0 failed: /usr/src/phabricator/bin/storage upgrade --force returned 255 instead of one of [0] at /etc/puppet/modules/phabricator/manifests/config.pp:82

The issue seems to be that multiple hosts were trying to apply the migrations simultaneously. This is potentially dangerous so I might disable the automatic execution of storage upgrade, but maybe we should fix this on the Phabricator end by using some sort of lock?

Event Timeline

joshuaspence raised the priority of this task from to Needs Triage.
joshuaspence updated the task description. (Show Details)
joshuaspence added a project: Database.
joshuaspence added a subscriber: joshuaspence.

Trying to run db migration automagically is a path to ruin. Having multiple hosts trying to run it is one likely error; Having one host run migration while another one is reading/writing is bad. Having one host migrate, and another expect the old schema.
Just don't go there.

Oh, I take the hosts offline first.

joshuaspence added a subscriber: epriestley.

Happy to submit a patch if @epriestley is on board.

Yeah, I'm fine with putting a global lock on this. You shouldn't really ever be running it on multiple hosts simultaneously (in the best case, this just slows you down by running the slow checks in adjust multiple times), but it's dangerous enough to explicitly prevent.

One vague concern is that users sometimes have trouble figuring out what's holding a lock (e.g., see T9707). I may add a "lock log" table, and this lock would sometimes acquire before the lock log table exists. But that's not a concern today.

Both storage upgrade and storage adjust should probably acquire and hold the same lock, and when storage upgrade runs storage adjust implicitly it should hold the lock for the entire duration.

epriestley renamed this task from Prevent migrations from being applied multiple times to Prevent upgrade scripts from being run multiple times simultaneously.Nov 5 2015, 12:28 PM