Page MenuHomePhabricator

PhabricatorTaskmasterDaemon using fabulous amounts of RAM
Closed, InvalidPublic

Description

PhabricatorTaskmasterDaemon is using 1.1GB of physical RAM and continues to increase rapidly until it crashes due to OOM; it gets restarted automatically and this process repeats, causing memory congestion on the system (and other processes to fail).

The command line of the process according to /proc/34458/cmdline is (for some reason there are no spaces in the output):

php/srv/phabricator/libphutil/scripts/daemon/exec/exec_daemon.phpPhabricatorTaskmasterDaemon--load-phutil-library=/srv/phabricator/arcanist/src--load-phutil-library=/srv/phabricator/phabricator/src--load-phutil-library=/srv/phabricator/libpageup--log=/var/tmp/phd/log/daemons.log

Is there any way to find out what it's doing that causes such excessive memory usage?

Event Timeline

Unknown Object (User) assigned this task to epriestley.Sep 19 2013, 1:58 AM
Unknown Object (User) added a project: Phabricator.
Unknown Object (User) added a subscriber: Unknown Object (User).

In broad strokes:

  • We don't have great tools for figuring out what's leaking, and I'm not aware of any way to integrate great tools (like, I don't know of any way to ask a PHP process where its memory is allocated and get a meaningful answer which humans can understand, although one may exist or we could potentially write one).
  • We do have coarse tools, wherein we can ask for total allocated memory and look at how it changes over time. However, there's no integration with the taskmaster right now.
  • In essentially all cases (at least, so far), memory leaks come from things like caches and logs that build up as side effects of normal operation. There are a manageable number of these, and a very small number tend to cause most of the pain. While not pleasant to diagnose and identify, they aren't unduly challenging either, and they appear at a low enough rate (a handful per year) that I think we can manage them manually, at least for now.
  • Because of the nature of the taskmaster, we can also implment a scorched earth solution, where we just cycle the process after N tasks or M minutes or once memory usage exceeds X or whatever.

So steps here are probably:

  • To identify and fix big leaks:
    • Improve logging to include memory information, so we can look at a log and get a pointer to which task types cause the biggest problems.
    • And/or record memory information alongside task duration.
    • Improve task repeatability (e.g., something like bin/worker run <id>?) so we can narrow down the coarse information this gives us to find specific leaks.
  • To mitigate small leaks:
    • Implement scorched earth, where the taskmasters just exit after every 100 tasks or whatever. Of the triggers, number of tasks initially seems the least volatile to me.

When you see this, what does your 15m table (in /daemon/) look like? Although I've seen some probably-slightly-larger-than-necessary taskmasters, I haven't seen memory usage behavior like what you describe.

I'm not immediately able to reproduce this. I added a bunch of debug garbage all over the place and ran a ton of tasks but nothing leaked an appreciable amount of memory. You can try applying this patch to restart the taskmaster after every 128 (adjust to taste) tasks:

diff --git a/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php b/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php
index 8726263..969311d 100644
--- a/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php
+++ b/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php
@@ -3,6 +3,7 @@
 final class PhabricatorTaskmasterDaemon extends PhabricatorDaemon {
 
   public function run() {
+    $tasks_executed = 0;
     $sleep = 0;
     do {
       $tasks = id(new PhabricatorWorkerLeaseQuery())
@@ -16,7 +17,7 @@ final class PhabricatorTaskmasterDaemon extends PhabricatorDaemon {
 
           $this->log("Working on task {$id} ({$class})...");
 
-          $task = $task->executeTask();
+          $task->executeTask();
           $ex = $task->getExecutionException();
           if ($ex) {
             if ($ex instanceof PhabricatorWorkerPermanentFailureException) {
@@ -30,6 +31,8 @@ final class PhabricatorTaskmasterDaemon extends PhabricatorDaemon {
           } else {
             $this->log("Task {$id} complete! Moved to archive.");
           }
+
+          $tasks_executed++;
         }
 
         $sleep = 0;
@@ -37,6 +40,13 @@ final class PhabricatorTaskmasterDaemon extends PhabricatorDaemon {
         $sleep = min($sleep + 1, 30);
       }
 
+      if ($tasks_executed > 128) {
+        // Retire the taskmaster process after executing a threshold number of
+        // tasks. The process will be restarted by the overseer. This prevents
+        // low-volume memory leaks from building up over time.
+        return 0;
+      }
+
       $this->sleep($sleep);
     } while (true);
   }

I'm not sure if that will help or not.

Unknown Object (User) added a comment.Sep 20 2013, 12:10 AM

Running this patch now to see if it resolves the issue.

The '15m completed' table might still be useful too if you have a chance to grab that.

Unknown Object (User) added a comment.Sep 20 2013, 12:53 AM

The patch looks like it's resolved the issue. Prior to the patch there was a taskmaster daemon taking up 1.3GB of RAM, now the highest at any point seems to be 263MB.

My guess is that it was the commit parser. When checking the daemon logs there was a whole bunch of exceptions while parsing commits and that's probably the cause of the memory leak.

Unknown Object (User) added a comment.Sep 20 2013, 12:54 AM

Also the "15m" table is empty on the daemon overview page.

Unknown Object (User) added a comment.Sep 20 2013, 12:57 AM

Mind you there's also:

PhabricatorRepositoryGitCommitChangeParserWorker	7
PhabricatorRepositoryCommitHeraldWorker	4

which doesn't seem to be changing at all.

Doesn't xdebug let you make a dump of

In T3848#4, @epriestley wrote:

In broad strokes:

  • We don't have great tools for figuring out what's leaking, and I'm not aware of any way to integrate great tools (like, I don't know of any way to ask a PHP process where its memory is allocated and get a meaningful answer which humans can understand, although one may exist or we could potentially write one).

xdebug has some memory profiling features and there is https://github.com/arnaud-lb/php-memory-profiler which I just discovered.

further to what @20after4 was saying, there do appear to be php memory profiling tools available should this become an issue again.
http://stackoverflow.com/questions/880458/php-memory-profiling

epriestley renamed this task from PhabricatorTaskmasterDaemon using stupid amounts of RAM to PhabricatorTaskmasterDaemon using fabulous amounts of RAM.Aug 25 2014, 2:30 AM
jon.krawczuk added a subscriber: jon.krawczuk.

Review as soon as possible

chad raised the priority of this task from High to Needs Triage.Sep 17 2014, 5:21 PM
chad added a subscriber: chad.
chad removed a subscriber: chad.
This comment was removed by pere.
chad changed the visibility from "All Users" to "Public (No Login Required)".Jul 23 2015, 4:38 AM

We haven't seen reports about this from other installs or experienced it ourselves, and we've run an instanced production cluster with a mixture of workloads for more than 6 months now. All the daemon stuff has also changed a lot over the course of the last two years since this was originally reported.

Since we can't reproduce this and no other installs appear to experience it, there's not much we can really do about it.