Page MenuHomePhabricator

Add "bin/storage optimize" to run OPTIMIZE TABLE on everything
ClosedPublic

Authored by epriestley on Apr 8 2017, 5:29 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Mar 22, 4:55 PM
Unknown Object (File)
Wed, Mar 20, 7:14 AM
Unknown Object (File)
Sat, Mar 16, 9:28 AM
Unknown Object (File)
Mon, Mar 4, 7:41 PM
Unknown Object (File)
Fri, Mar 1, 2:41 AM
Unknown Object (File)
Feb 21 2024, 9:59 AM
Unknown Object (File)
Feb 3 2024, 3:29 PM
Unknown Object (File)
Feb 3 2024, 3:29 PM
Tokens
"Yellow Medal" token, awarded by cspeckmim.

Details

Summary

Even with innodb_file_per_table enabled, individual table files on disk don't normally shrink.

For most tables, like maniphest_task, this is fine, since the data in the table normally never shrinks, or only shinks a tiny amount.

However, some tables (like the "worker" and "daemon" tables) grow very large during a huge import but most of the data is later deleted by garbage collection. In these cases, this lost space can be reclaimed by running OPTIMIZE TABLE on the tables.

Add a script to OPTIMIZE TABLE every table.

My primary goal here is just to reduce storage pressure on db001 since there are a couple of "import the linux kernel" installs on that host wasting a bunch of space. We're not in any trouble, but this should buy us a good chunk of headroom.

Test Plan

Ran bin/storage optimize locally and manually ran OPTIMIZE TABLE in production, saw tables get optimized.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.Apr 8 2017, 10:06 PM
This revision was automatically updated to reflect the committed changes.

Tried it out on our test install (~1/5th the size of our regular)

Completed optimizations, reclaimed 9 GB of disk space.

Woo! Should make backups a bit easier to manage~

do your users frequently upload the linux kernel repository?

We got ~3GB back on ~25GB here.

On db001, things went from 46GB -> 33GB.

No linux kernels~

Our primary repository has a ~10 year history with 100k+ commits though. Our branching strategy when cutting a version is to clone the repository (when we started with Mercurial this was the style at the time~). I believe on our test install we had ~3 copies of that repo, but then I had deleted extras that weren't necessary to help keep disk space on the test lean -- I'm not familiar with OPTIMIZE but maybe things that were previously allocated for the repositories and then no longer needed were also reclaimed as part of this? After I validate the update and get it installed on our production environment I'll report back what our disk space reclamation is.

Would it probably make sense to run this also as a regular task through phd?
Maybe it has to be triggered once manually before it is run automatically, as the initial run might be quite resource intensive while any subsequent runs should be just "regular cleanup tasks".

@eliaspro - I was considering adding it to our upgrade script. The process probably shouldn't be regular as it would likely degenerate response times while running.

Also 9GB on production install, interesting

I'm not sure if it would be informative or not but I captured the output. The larger/interesting ones I see are

OPTIMIZE Optimizing table "phabricator_daemon"."daemon_logevent"...
DONE Compacted table by 457 MB in 507ms.
OPTIMIZE Optimizing table "phabricator_differential"."differential_changeset_parse_cache"...
DONE Compacted table by 540 MB in 76,254ms.


OPTIMIZE Optimizing table "phabricator_repository"."repository_commit"...
DONE Compacted table by 156 MB in 54,999ms.
OPTIMIZE Optimizing table "phabricator_repository"."repository_commitdata"...
DONE Compacted table by 50 MB in 32,327ms.
OPTIMIZE Optimizing table "phabricator_repository"."repository_parents"...
DONE Compacted table by 28 MB in 33,999ms.
OPTIMIZE Optimizing table "phabricator_repository"."repository_path"...
DONE Compacted table by 44 MB in 28,318ms.
OPTIMIZE Optimizing table "phabricator_repository"."repository_pathchange"...
DONE Compacted table by 1 GB in 806,286ms.


OPTIMIZE Optimizing table "phabricator_worker"."worker_activetask"...
DONE Compacted table by 99 MB in 285ms.
OPTIMIZE Optimizing table "phabricator_worker"."worker_archivetask"...
DONE Compacted table by 1 GB in 963ms.
OPTIMIZE Optimizing table "phabricator_worker"."worker_taskdata"...
DONE Compacted table by 4 GB in 1,109ms.

I had an interesting experience with this where I purportedly reclaimed negative space (and none of it via the worker schema). Happy to past more from the probe if people are interested:

luca:~/phabriactors$ ./bin/storage probe
...
phabricator_worker                                  160.0 MB  1.5%
    edgedata                                          0.0 MB  0.0%
    lisk_counter                                      0.0 MB  0.0%
    worker_triggerevent                               0.0 MB  0.0%
    worker_trigger                                    0.0 MB  0.0%
    edge                                              0.0 MB  0.0%
    worker_activetask                                 0.1 MB  0.0%
    worker_bulkjob                                    0.3 MB  0.0%
    worker_bulkjobtransaction                         0.4 MB  0.0%
    worker_bulktask                                   0.8 MB  0.0%
    worker_archivetask                               48.1 MB  0.4%
    worker_taskdata                                 110.1 MB  1.0%
...
TOTAL                                            10,839.3 MB  100.0%
luca:~/phabriactors$ ./bin/storage optimize
...
 OPTIMIZE  Optimizing table "phabricator_differential"."differential_transaction_comment"...
 DONE  Compacted table by 1 MB in 5,834ms.
 OPTIMIZE  Optimizing table "phabricator_feed"."feed_storyreference"...
 DONE  Compacted table by 3 MB in 49,097ms.
 OPTIMIZE  Optimizing table "phabricator_search"."search_documentfield"...
 DONE  Compacted table by 43 MB in 97,513ms.
...
 OPTIMIZE  Optimizing table "phabricator_worker"."edge"...
 DONE  Optimized table (in 11ms) but it had little effect.
 OPTIMIZE  Optimizing table "phabricator_worker"."edgedata"...
 DONE  Optimized table (in 9ms) but it had little effect.
 OPTIMIZE  Optimizing table "phabricator_worker"."lisk_counter"...
 DONE  Optimized table (in 10ms) but it had little effect.
 OPTIMIZE  Optimizing table "phabricator_worker"."worker_activetask"...
 DONE  Optimized table (in 18ms) but it had little effect.
 OPTIMIZE  Optimizing table "phabricator_worker"."worker_archivetask"...
 DONE  Optimized table (in 3,703ms) but it had little effect.
 OPTIMIZE  Optimizing table "phabricator_worker"."worker_bulkjob"...
 DONE  Optimized table (in 61ms) but it had little effect.
 OPTIMIZE  Optimizing table "phabricator_worker"."worker_bulkjobtransaction"...
 DONE  Optimized table (in 45ms) but it had little effect.
 OPTIMIZE  Optimizing table "phabricator_worker"."worker_bulktask"...
 DONE  Optimized table (in 117ms) but it had little effect.
 OPTIMIZE  Optimizing table "phabricator_worker"."worker_taskdata"...
 DONE  Optimized table (in 4,186ms) but it had little effect.
 OPTIMIZE  Optimizing table "phabricator_worker"."worker_trigger"...
 DONE  Optimized table (in 12ms) but it had little effect.
 OPTIMIZE  Optimizing table "phabricator_worker"."worker_triggerevent"...
 DONE  Optimized table (in 10ms) but it had little effect.
...
 OPTIMIZED  Completed optimizations, reclaimed -4 GB of disk space.`
luca:~/phabriactors$ ./bin/storage probe
...
phabricator_worker                                  144.0 MB  1.4%
    edgedata                                          0.0 MB  0.0%
    lisk_counter                                      0.0 MB  0.0%
    worker_triggerevent                               0.0 MB  0.0%
    worker_trigger                                    0.0 MB  0.0%
    edge                                              0.0 MB  0.0%
    worker_activetask                                 0.1 MB  0.0%
    worker_bulkjob                                    0.3 MB  0.0%
    worker_bulkjobtransaction                         0.4 MB  0.0%
    worker_bulktask                                   0.8 MB  0.0%
    worker_archivetask                               32.6 MB  0.3%
    worker_taskdata                                 109.5 MB  1.1%
...
TOTAL                                            10,105.5 MB  100.0%