Page MenuHomePhabricator

Limit the read buffer size in `bin/storage dump`
ClosedPublic

Authored by epriestley on Jun 25 2019, 12:25 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Mar 26, 7:16 PM
Unknown Object (File)
Sat, Mar 23, 7:04 PM
Unknown Object (File)
Tue, Mar 5, 4:01 AM
Unknown Object (File)
Feb 9 2024, 7:11 PM
Unknown Object (File)
Jan 31 2024, 4:26 AM
Unknown Object (File)
Jan 16 2024, 6:42 PM
Unknown Object (File)
Jan 7 2024, 4:52 PM
Unknown Object (File)
Jan 3 2024, 7:57 PM
Subscribers
None

Details

Summary

Ref T13328. Currently, we read from mysqldump something like this:

until (done) {
  for (100 ms) {
    mysqldump > in-memory-buffer;
  }

  in-memory-buffer > disk;
}

This general structure isn't great. In this use case, where we're streaming a large amount of data from a source to a sink, we'd prefer to have a "select()"-like way to interact with futures, so our code is called after every read (or maybe once some small buffer fills up, if we want to do the writes in larger chunks).

We don't currently have this (FutureIterator can wake up every X milliseconds, or on future exit, but, today, can not wake for readable futures), so we may buffer an arbitrary amount of data into memory (however much data mysqldump can write in 100ms).

Reduce the update frequency from 100ms to 10ms, and limit the buffer size to 32MB. This effectively imposes an artificial 3,200MB/sec limit on throughput, but hopefully that's fast enough that we'll have a "wake on readable" mechanism by the time it's a problem.

Test Plan
  • Replaced mysqldump with cat /dev/zero as the source command, to get fast input.
  • Ran bin/storage dump with var_dump() on the buffer size.
  • Before change: saw arbitrarily large buffers (300MB+).
  • After change: saw consistent maximum buffer size of 32MB.

Diff Detail

Repository
rP Phabricator
Branch
dump1
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 23056
Build 31650: Run Core Tests
Build 31649: arc lint + arc unit