Page MenuHomePhabricator

Plans: Daemon Status Reporting
Open, NormalPublic

Description

See PHI1063. See T6768.

  • Daemon lease names are meaningless/unwieldy.
  • There's no way easy way to connect a particular log message to a particular daemon.
  • There's no easy way to find logs for a given task.

See PHI1062. This is T13121. The -o LogLevel=quiet option to ssh actively impedes debugging and even if the remedy is "filter stderr" we'd be in a better position for stopping the bleeding.

Other adjacent changes:

Event Timeline

epriestley created this task.

I'm looking at these storage/backend changes:

  • Rename daemon_log.pid to daemon_log.overseerPID. This column already records the overseer PID, and a given daemon "slot" may have multiple PIDs over its lifetime so this can't be "fixed" by recording a different PID.
  • Add pid to daemon_logevent, and record the PID of the daemon which emitted the event.
  • Add taskID to daemon_logevent, and record the task ID the daemon was working on. Not all logs will have a value (for example, the Trigger daemon does not work on tasks), but the task queue is the source of most log messages.
  • Add daemonID to the activetask and archivetask tables and record the (most recent) daemon to work on the task.
  • Add archivedEpoch or similar to archivetask, for recording total time in queue (T5401).
  • Change lease values to purely random values (T6768).
  • Add generation to daemon_logevent and have bin/phd start and similar launch daemon groups with an incrementing generation ID (T10867).

See https://discourse.phabricator-community.org/t/sql-error-during-ferret-migration-of-ponder/2471.

Some old migrations call PhabricatorSearchWorker::queueDocumentForIndexing(). This no longer works after D20200 because the PHP implementation expects a dateCreated column to exist, but it won't exist until 20190220.daemon_worker.completed.02.sql runs.

This kind of migration is inherently fragile and we haven't added any since 2017. It was also largely obsoleted by T11932.

I'm just going to no-op all these migrations. Installs that haven't upgraded since 2017 will get a rebuild activity after upgrade anyway, so they don't really do anything.

The index can also always be rebuilt with bin/search index ....