This is pretty fuzzy, but here's a list of observations that might lead toward something actionable:
- One or more processes are leaving a bunch of junk in /tmp/ at times. This is rare and hard to pin down so I haven't chased it down yet.
- After manually cleaning up /tmp/ the problem doesn't happen again for months/years.
- The actual files look like they're being written by us (same format as TempFile).
- A lot of these files are exactly 4194304 bytes long (the size of one file storage chunk), which is probably a smoking gun.
- So: incorrectly doing MIME type detection on all chunks of large files, and then not cleaning them up for some reason? Running MIME detection separately seems fine, though. So maybe this is somewhat subtle/involved.
- We saw an issue with daemon stability for an instance in the presence of a full /tmp/, although nothing in the logs is really a smoking gun. However, it's possible that a full /tmp/ causes some other kind of issue at an awkward time between log setup phases.
- We currently write to normal system /tmp/, which is the same volume as /. So a full /tmp/ also means a full /. The trigger problem could be some other kind of failure, like a pidfile update. We write these to /core/run/... but this is also the same volume so that doesn't help.