Page MenuHomePhabricator

Daemons created 881 identical copies of a 4MB diff on a Phacility instance
Closed, ResolvedPublic

Description

These changes have a total compressed data size of roughly 4GB. Hunks are stored compressed, and gzip isn't smart enough to recompress this data in the dump, so the dumps are also about 4GB. This is currently creating storage pressure on dbak001.

Event Timeline

I think this was caused by the issue resolved by D15072. Specifically:

  • daemons attempted to import a commit;
  • they build the diff for it, then failed to pull a file out of the repository;
  • they bailed, but the diff stuck around as a side effect.

Evidence for this is pretty convincing:

  • identical diffs all have "commit" as a source, so they came from the daemons;
  • daemon logs are full of the error from D15072;
  • after the update, no more log problems and no failing tasks.

So I believe I can safely destroy all of them (none are attached to revisions) and this won't happen again.

We could also:

  • make the daemon destroy the diff if it hits an exception; and/or
  • add a Files-based hunk storage mechanism.

These are maybe noble eventual goals, but I don't think they're important to pursue right now since I think this was a bit of a one-off.

I used bin/remove destroy <phid> to destroy the 881 identical diffs, and verified that trimmed roughly 4GB of data out of the datastore.

I pruned some out-of-date backups on dbak001 to give us a little more headroom. Pressure on the device should decrease over time as this data rotates out (old, bloated backups get replaced by more recent, small backups).