Page MenuHomePhabricator

Garbage collect old daemon records based on modification date, not creation date
ClosedPublic

Authored by epriestley on May 26 2017, 3:51 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Dec 22, 3:42 AM
Unknown Object (File)
Fri, Dec 20, 3:30 AM
Unknown Object (File)
Mon, Dec 16, 11:38 AM
Unknown Object (File)
Thu, Dec 5, 1:18 PM
Unknown Object (File)
Fri, Nov 29, 7:56 PM
Unknown Object (File)
Fri, Nov 29, 6:08 PM
Unknown Object (File)
Wed, Nov 27, 2:45 AM
Unknown Object (File)
Nov 23 2024, 10:24 AM
Subscribers
None

Details

Summary

Fixes T12720. Currently, old daemon records are collected based on creation date. By default, the GC collects them after 7 days.

After T12298, this can incorrectly collect hibernating daemons which are in state "wait".

In all cases, this could fail to collect daemons which are stuck in "running" for a long time for some reason. This doesn't seem to be causing any problems right now, but it makes me hesitant to do "dateCreated + not running or waiting" since that might make this become a problem, or make an existing problem with this that we just haven't bumped into worse.

Daemons always heartbeat periodically and update their rows, so dateModified is always fresh, so collect rows based only on modification date.

Test Plan
  • Ran daemons (bin/phd start).
  • Waited a few minutes.
  • Verified that hibernating daemons in the "wait" state had fresh timestamps.
  • Verified that very old daemons still got GC'd properly.
mysql> select id, daemon, status, FROM_UNIXTIME(dateCreated), FROM_UNIXTIME(dateModified) from daemon_log;
+-------+--------------------------------------+--------+----------------------------+-----------------------------+
| id    | daemon                               | status | FROM_UNIXTIME(dateCreated) | FROM_UNIXTIME(dateModified) |
+-------+--------------------------------------+--------+----------------------------+-----------------------------+
| 73377 | PhabricatorTaskmasterDaemon          | exit   | 2017-05-19 10:53:03        | 2017-05-19 12:38:54         |
...
| 73388 | PhabricatorRepositoryPullLocalDaemon | run    | 2017-05-26 08:43:29        | 2017-05-26 08:45:30         |
| 73389 | PhabricatorTriggerDaemon             | run    | 2017-05-26 08:43:29        | 2017-05-26 08:46:35         |
| 73390 | PhabricatorTaskmasterDaemon          | wait   | 2017-05-26 08:43:29        | 2017-05-26 08:46:35         |
| 73391 | PhabricatorTaskmasterDaemon          | wait   | 2017-05-26 08:43:33        | 2017-05-26 08:46:33         |
| 73392 | PhabricatorTaskmasterDaemon          | wait   | 2017-05-26 08:43:37        | 2017-05-26 08:46:31         |
| 73393 | PhabricatorTaskmasterDaemon          | wait   | 2017-05-26 08:43:40        | 2017-05-26 08:46:33         |
+-------+--------------------------------------+--------+----------------------------+-----------------------------+
17 rows in set (0.00 sec)

Note that:

  • The oldest daemon is <7 days old -- I had some other older rows but they got GC'd properly.
  • The hibernating taskmasters (at the bottom, in state "wait") have recent/more-current dateModified dates than their dateCreated dates.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable