Page MenuHomePhabricator

Use stdio, not signals, to heartbeat from the daemons
ClosedPublic

Authored by epriestley on Feb 22 2015, 2:06 PM.
Tags
None
Referenced Files
F14773093: D11850.id28607.diff
Fri, Jan 24, 12:16 AM
F14773092: D11850.id28556.diff
Fri, Jan 24, 12:16 AM
F14773091: D11850.id.diff
Fri, Jan 24, 12:16 AM
F14770743: D11850.id.diff
Thu, Jan 23, 9:30 PM
Unknown Object (File)
Tue, Jan 21, 7:22 PM
Unknown Object (File)
Tue, Jan 21, 11:10 AM
Unknown Object (File)
Wed, Jan 15, 2:08 AM
Unknown Object (File)
Thu, Jan 9, 2:52 AM
Subscribers

Details

Summary

Ref T7352. We currently run one overseer per daemon. I want to run one overseer for a group of daemons, to reduce the minimum memory footprint of an instance.

One barrier is how hang detection works: we detect daemon hangs by requiring them to send a periodic heartbeat. If a daemon doesn't heartbeat for a while, we assume it has hung and restart it.

Currently, this heartbeat is sent by having the daemons send SIGUSR1 to the overseer. When the overseer receives the signal, it extends the deadline for the next heartbeat.

However, the overseer can't tell where the signal came from. Right now it can only come from one place, but in a world where overseers run multiple daemons it could have come from any of the children.

Instead of using signals, this turns the daemon's stdout (which we already consume) into a structured message pipeline, and sends the heartbeat over stdout.

In a future diff, the overseer will be able to attriubute heartbeats to the correct child process.

Test Plan
  • Ran daemon in the raw, saw sensible output.
  • Made daemon use plain echo, saw output get wrapped.
  • Artificially set heartbeat deadline to 10 seconds, saw heartbeating daemons continue running and hung daemons restart.

Diff Detail

Repository
rPHU libphutil
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

epriestley retitled this revision from to Use stdio, not signals, to heartbeat from the daemons.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: btrahan.
btrahan edited edge metadata.
This revision is now accepted and ready to land.Feb 23 2015, 5:18 PM
This revision was automatically updated to reflect the committed changes.