Page MenuHomePhabricator

Database dumps can buffer (?) too much data to hold in memory
Closed, ResolvedPublic

Description

Observed in the Phacility cluster. An instance with an especially large total data size is encountering an issue while dumping backups to disk:

Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] [2019-06-25 11:55:46] EXCEPTION: (PhutilProxyException) Error while executing Task ID 16700631. {>} (CommandException) Command failed with error #255!
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] COMMAND
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] '/core/bin/remote' --internal -u instances -i /core/conf/keystore/instances.key backup db006.phacility.net '--instance' '<turtles>' '--weak' '--no-indexes'
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] 
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] STDOUT
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] (empty)
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] 
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] STDERR
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000]  BEGIN  Writing backup "20190625.<turtles>.databases.sql.gz"...
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] [2019-06-25 11:55:46] EXCEPTION: (CommandException) Command failed with error #255!
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] COMMAND
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] '/core/lib/phabricator/bin/storage' dump '--output' '/core/bak/20190625/<turtles>/20190625.<turtles>.databases.sql.gz' '--compress' '--overwrite' '--no-indexes'
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] 
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] STDOUT
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] 
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] Fatal error: Out of memory (allocated 311164928) (tried to allocate 105988097 bytes) in /core/lib/libphutil/src/future/exec/ExecFuture.php on line 246
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] 
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] 
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] STDERR
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] PHP Fatal error:  Out of memory (allocated 311164928) (tried to allocate 105988097 bytes) in /core/lib/libphutil/src/future/exec/ExecFuture.php on line 246
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000]  at [<phutil>/src/future/exec/ExecFuture.php:380]
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] libcore(), phutil(head=stable, ref.master=fa6519f4cea1, ref.stable=674d69718e88)
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000]   #0 ExecFuture::resolvex() called at [<libcore>/workflow/host/CoreHostBackupWorkflow.php:128]
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000]   #1 CoreHostBackupWorkflow::backupInstance(string, PhutilArgumentParser, string) called at [<libcore>/workflow/host/CoreHostBackupWorkflow.p... (1,032 more bytes) ... at [<phutil>/src/future/exec/ExecFuture.php:380]
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000] arcanist(head=stable, ref.master=7d15b85a1bc0, ref.stable=feb5f4d42c4f), corgi(head=master, ref.master=216435c68a94), instances(head=stable, ref.master=db56d5d6ad91, ref.stable=37fb1f4917c7), libcore(), phabricator(head=stable, ref.stable=51cd3f097928), phutil(head=stable, ref.master=92882eb9404d, ref.stable=b675a06bf741), services(head=stable, ref.master=5b51b63027b7, ref.stable=e9147e2871e1)
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000]   #0 <#2> ExecFuture::resolvex() called at [<instances>/applications/instances/worker/InstancesWorker.php:110]
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000]   #1 <#2> InstancesWorker::execxRemote(string, string, string, array) called at [<instances>/applications/instances/worker/InstancesBackupWorker.php:39]
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000]   #2 <#2> InstancesBackupWorker::doWork() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorWorker.php:124]
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000]   #3 <#2> PhabricatorWorker::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/storage/PhabricatorWorkerActiveTask.php:158]
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000]   #4 <#2> PhabricatorWorkerActiveTask::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php:22]
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000]   #5 PhabricatorTaskmasterDaemon::run() called at [<phutil>/src/daemon/PhutilDaemon.php:219]
Daemon 1141723 STDE [Tue, 25 Jun 2019 11:55:46 +0000]   #6 PhutilDaemon::execute() called at [<phutil>/scripts/daemon/exec/exec_daemon.php:131]
Daemon 1141723 FAIL [Tue, 25 Jun 2019 11:55:46 +0000] Process exited with error 255.

I suspect we're reading from mysqldump faster than we're writing to disk, and the buffer is growing unboundedly. If so, we should just cap the size of the read buffer (at, say, 32MB) and pause new reads until the writes catch up.

Event Timeline

epriestley created this task.

PHP Fatal error: Out of memory (allocated 311164928) (tried to allocate 105988097 bytes) in /core/lib/libphutil/src/future/exec/ExecFuture.php on line 246

The actual line is:

ExecFuture->readStdout()
$result = (string)substr($this->stdout, $this->stdoutPos);

In the case that stdoutPos is 0, this performs an unnecessary copy of the string, which we could perhaps optimize away.

However, the buffer also just shouldn't be allowed to become large, which is the more fundamental issue.

I'm going to try to sneak this out to the db tier to resolve things before the west coast wakes up, at least.

The stalled production backup process completed successfully after deploying the change.

As a followup, it would be nice to provide a method like id(new FutureIterator(...))->setWakeOnIO(true) or similar to give FutureIterator more select()-like semantics, but I think we have so much headroom here that there's no urgency on this.