Observed in the Phacility cluster. An instance with an especially large total data size is encountering an issue while dumping backups to disk:
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] [2019-06-25 11:55:46] EXCEPTION: (PhutilProxyException) Error while executing Task ID 16700631. {>} (CommandException) Command failed with error #255! Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] COMMAND Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] '/core/bin/remote' --internal -u instances -i /core/conf/keystore/instances.key backup db006.phacility.net '--instance' '<turtles>' '--weak' '--no-indexes' Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] STDOUT Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] (empty) Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] STDERR Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] BEGIN Writing backup "20190625.<turtles>.databases.sql.gz"... Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] [2019-06-25 11:55:46] EXCEPTION: (CommandException) Command failed with error #255! Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] COMMAND Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] '/core/lib/phabricator/bin/storage' dump '--output' '/core/bak/20190625/<turtles>/20190625.<turtles>.databases.sql.gz' '--compress' '--overwrite' '--no-indexes' Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] STDOUT Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] Fatal error: Out of memory (allocated 311164928) (tried to allocate 105988097 bytes) in /core/lib/libphutil/src/future/exec/ExecFuture.php on line 246 Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] STDERR Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] PHP Fatal error: Out of memory (allocated 311164928) (tried to allocate 105988097 bytes) in /core/lib/libphutil/src/future/exec/ExecFuture.php on line 246 Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] at [<phutil>/src/future/exec/ExecFuture.php:380] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] libcore(), phutil(head=stable, ref.master=fa6519f4cea1, ref.stable=674d69718e88) Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] #0 ExecFuture::resolvex() called at [<libcore>/workflow/host/CoreHostBackupWorkflow.php:128] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] #1 CoreHostBackupWorkflow::backupInstance(string, PhutilArgumentParser, string) called at [<libcore>/workflow/host/CoreHostBackupWorkflow.p... (1,032 more bytes) ... at [<phutil>/src/future/exec/ExecFuture.php:380] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] arcanist(head=stable, ref.master=7d15b85a1bc0, ref.stable=feb5f4d42c4f), corgi(head=master, ref.master=216435c68a94), instances(head=stable, ref.master=db56d5d6ad91, ref.stable=37fb1f4917c7), libcore(), phabricator(head=stable, ref.stable=51cd3f097928), phutil(head=stable, ref.master=92882eb9404d, ref.stable=b675a06bf741), services(head=stable, ref.master=5b51b63027b7, ref.stable=e9147e2871e1) Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] #0 <#2> ExecFuture::resolvex() called at [<instances>/applications/instances/worker/InstancesWorker.php:110] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] #1 <#2> InstancesWorker::execxRemote(string, string, string, array) called at [<instances>/applications/instances/worker/InstancesBackupWorker.php:39] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] #2 <#2> InstancesBackupWorker::doWork() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorWorker.php:124] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] #3 <#2> PhabricatorWorker::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/storage/PhabricatorWorkerActiveTask.php:158] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] #4 <#2> PhabricatorWorkerActiveTask::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php:22] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] #5 PhabricatorTaskmasterDaemon::run() called at [<phutil>/src/daemon/PhutilDaemon.php:219] Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] #6 PhutilDaemon::execute() called at [<phutil>/scripts/daemon/exec/exec_daemon.php:131] Daemon 1141723 FAIL [Tue, 25 Jun 2019 11:55:46 +0000] Process exited with error 255.
I suspect we're reading from mysqldump faster than we're writing to disk, and the buffer is growing unboundedly. If so, we should just cap the size of the read buffer (at, say, 32MB) and pause new reads until the writes catch up.