Page MenuHomePhabricator

Implement garbage collection / automatic archiving for Harbormaster logs
Open, NormalPublic

Description

Currently the log chunk table gets very large, very quickly in Harbormaster (especially for builds with lots of output). The GC should take old content from this table and compress / archive it somewhere. This means viewing old builds will take longer to load logs, but new builds should be very quick to look at (verses all builds being slow to look at because the table is so huge).

Related Objects

Event Timeline

hach-que claimed this task.
hach-que raised the priority of this task from to Normal.
hach-que updated the task description. (Show Details)
hach-que changed the visibility from "Public (No Login Required)" to "All Users".
hach-que added subscribers: hach-que, epriestley.
hach-que lowered the priority of this task from Normal to Wishlist.Sep 1 2014, 2:47 AM

Adding the index to the build log table seems to have resolved the performance issues with viewing build logs, so I'm moving this down in priority.

I had to basically rename and recreate harbormaster_buildlogchunk for the utf8mb migration because this table was gigabytes in size.

hach-que raised the priority of this task from Wishlist to Needs Triage.Nov 12 2014, 4:24 AM

Placing this back for triage, but I think this should probably be of a reasonably high priority on the Harbormaster roadmap, simply because of the impact it can have in migration and schema upgrade times.

chad added a subscriber: chad.

No idea how that project fubar happened.

Yeah I was wondering why Harbormaster was removed. You also fubar'd T5821 by the way :)

epriestley changed the visibility from "All Users" to "Public (No Login Required)".Jul 7 2015, 10:32 PM

T9494 discusses a GC TTL change I'd like to make before putting real GC here.

We can also probably get chunk compression into HEAD before too long since it's pretty simple and should be dramatic on most build logs.

Archiving old logs into Files I'm less excited about and it's probably a v-Far-Future sort of thing.

epriestley edited projects, added Harbormaster (v3); removed Harbormaster.

Pretty sure I can scope this into the current iteration under T10457 and provide some corresponding purge tools, just in case any installs hypothetically insert gigabytes of data into build logs.

I believe our Harbormaster log table is several GBs in size and by several GB I'm talking 20-80GB range depending on the last time I hard deleted the records.

Current plan here is:

  • Compress log chunks in-place.
  • Add a GC to move the data to Files (default = 30 days).
  • Add a second GC to destroy them (default = 180 days). This GC can be disabled if you want to retain them indefinitely.

Oh, and:

  • Provide some way to treat "too much output" as a categorical build failure, similar to how you might treat "excessive runtime" as a build failure.

This should maybe also somehow interact with Artifact Files.

eadler added a project: Restricted Project.Aug 5 2016, 4:44 PM