Page MenuHomePhabricator

Provide an error log for `sshd` subprocesses
Open, NormalPublic

Description

See PHI2009. The sshd subprocesses (bin/ssh-auth and bin/ssh-exec) currently don't have a configurable error log. These processes are somewhat unique: most other processes log to the webserver error log or the daemon log.

See also T12611.

See also T4472, perhaps.

It would be nice to better unify all log behavior, but just adding log.ssh-error.path for now is probably the more practical approach.

Revisions and Commits

Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
rARC Arcanist
Audit Required
D21578
D21578
rP Phabricator
D21579

Event Timeline

epriestley triaged this task as Normal priority.Feb 26 2021, 7:38 PM
epriestley created this task.
epriestley added a revision: Restricted Differential Revision.
epriestley added a revision: Restricted Differential Revision.Feb 26 2021, 10:49 PM

This deployed, and appears resolved.

This ran into some filesystem permissions issues and needs followup.

In general, Phacility production hosts may interact with logs as several different users:

  • The default ubuntu user creates the log/ volume mount by checking out core/, creates log directories, and performs log maintenance (rotation and destruction).
  • The ubuntu user writes to phd logs.
  • The dweller user writes to sshd logs.
  • The www-data user writes to httpd logs.
  • Logs are organized by instance, so we can't (easily) guarantee all log directories exist before they are written. For example, if instance turtle signs up, we'd like to create turtle/ directories on disk at the time the logs are written. The on-disk layout is /logs/<instance>/<service>/<thing>.log, which is the preferred layout for human operators.
  • And: some hosts may have older logs or log directories in arbitrary states.

This is a big mess. It can be mostly navigated by:

  • Adding ubuntu, dweller, and www-data to a unix loggers group.
  • Making sure logs are always owned by group loggers and chmod 0664.
  • Making sure log directories are always owned by group loggers and chmod 0775.

I wrote a patch for this, but it's big, adds a lot of configuration, and generally feels like it's walking down a dark path. In particular, in the long term, it would be nice to send logs directly to an aggregation service (and ingest logs on disk). A lot of per-log filesystem configuration moves us further from this.

I'm imagining this instead:

  • A new Log daemon listens on a unix domain socket.
  • All first-party services that want to log write to it over the domain socket.
  • It can also ingest logs on disk.
  • It can buffer logs in memory.
  • It can report log status via the database.

From there, it can send logs to disk or some other service, including an instance of itself running on some other host.

In the short term, this produces a single writer and fixes all the filesystem junk, and provides a reasonable way to monitor log status. In the long term, this supports real logging behavior.

Evidence increasingly suggests that the root problem here was GET_LOCK() issue in T13627, not an error in an sshd subprocess context.