Evaluate performance impact of performing MySQL dump/restore in parallel
Open, WishlistPublic
Actions

Assigned To

Authored By

	epriestley
	Aug 31 2019, 4:04 PM

Description

I've been doing a whole lot of dump/restore recently. These operations take forever and are very obviously not limited by disk I/O, at least on the external side.

The dump process is already table-by-table, so it would be interesting to see if parallelizing these dumps improves performance or not: instead of serially dumping each table, dump N table simultaneously.

Loading could possibly work the same way, where we retain file-per-table dumps and load them in parallel.

There's some complexity involved in stitching the pieces together, but figuring out if this helps or not should be straightforward.

Event Timeline

epriestley triaged this task as Wishlist priority.Aug 31 2019, 4:04 PM

epriestley created this task.

Herald added a subscriber: amckinley. · View Herald TranscriptAug 31 2019, 4:04 PM

Anecdata: locally, using 2 subprocesses went twice as fast (~85s -> ~42s). 4 subprocesses chopped another ~20% of the time off (~42s > ~35s). It stopped getting faster at 4. However, the largest table took 24s, so even if this was completely parallelizable we wouldn't expect it to drop lower than that.

One catch here is that mysqldump takes about 70ms to start up, and it's not surprising that this part is parallelizable. We currently have 574 tables so this accounts for ~40s of runtime, which is about the size of the initial improvement. So this may be more of a flat 40-second performance improvement than a large relative improvement.

This:

./bin/storage databases | xargs time mysqldump '--hex-blob' '--single-transaction' '--default-character-set' 'utf8mb4' '-u' 'root' '-h' '127.0.0.1' '--max-allowed-packet' '1G' --databases -- > /dev/null

..comes out at 40s, even with a purged cache.

This is still sort of inconclusive because the dataset is dominated by one unusually large table (daemon.log_event). It looks like parallelizing things is at least an absolute 40s faster, and possibly also a relative 10-20% faster, but the relative difference isn't dramatic on this dataset.

Evaluate performance impact of performing MySQL dump/restore in parallelOpen, WishlistPublicActions

Description

Event Timeline

Evaluate performance impact of performing MySQL dump/restore in parallel
Open, WishlistPublic
Actions