Last night I queued a large number of background jobs (reparsing of commit messages and reindex search documents). A few hours later, I noticed that our two Phabricator daemons hosts had died (they quite possibly died earlier than this, but it took me a few hours to notice.
It is worth noting that I have phd.taskmasters set to 8. The instance is a c3.large. The following groups show the CPU usage over the past 24 hours:
My theory is that the taskmasters are autoscaling themselves and then eventually running out of memory and dying miserably. I've attached some relevant log files:
- Actually the daemon log files aren't particularly useful here.