Page MenuHomePhabricator

GC old Differential changesets into File storage
Closed, WontfixPublic

Description

Some of the heavier, older shards with smaller db volumes are approaching fullness. We have tools for dealing with this in the general case now (i.e., the ability to move instances across shards, per T11670) but most of the data on most of these instances is old changeset hunk data in Differential.

There is no reason this data needs to stay in the database since it's rarely accessed (and, even when accessed, it's okay if it's a little laggy since it's just doing a cache fill anyway), and one of the goals of D9290 was to provide support for alternate storage so we could eventually improve this.

Specifically, since D9290, DifferentialModernHunk (then DifferentialHunkModern) has had a dataType column and a DATATYPE_FILE constant, but currently always reads and writes hunks in DATATYPE_TEXT instead. We can implement:

  • Implement support for storing data in Files instead of in the database.
  • Support migration between formats in some bin/differential mangle-hunks sort of workflow.
  • Add a GC step to move text hunks to Files after, say, 30 or 60 days.

Moving hunks to files implies a (usually, very small) read performance penalty but should generally be transparent.

Event Timeline

Only db001 and db002 are anywhere close to having problems with this and they still have a large amount of headroom, and we can bin/storage optimize at least some amount of additional headroom into existence, so this isn't "we're about to fall off a cliff" urgent or anything, but would be nice to tackle sooner than later.

Just keeping an eye on this: growth on these shards is slow and both are still below the 80% caution threshold, so I'm punting this for another week.

Looking at the actual data, I'm less sure this is actually a good strategy. Here's the data for this install, considering the production configuration of storage.mysql-engine.max-size as 65535:

mysql> select IF(LENGTH(data) > 65535, 'S3', 'MySQL') Engine, SUM(LENGTH(data)) Size, COUNT(*) N FROM differential_hunk_modern GROUP BY Engine;
+--------+-----------+--------+
| Engine | Size      | N      |
+--------+-----------+--------+
| MySQL  | 838006438 | 285338 |
| S3     | 222007657 |   2458 |
+--------+-----------+--------+
2 rows in set (19.52 sec)

That is, if we archive all hunks on this install, only 0.8% of them (2,458 of 287,796) are actually large enough to go to S3. The hunks that go to S3 are the biggest ones, so this 0.8% of hunks represents about 20% of all data stored in hunks, but we would still put 80% of data back in MySQL, just in Files instead of Differential. This is probably worse on the balance.

So I'm currently thinking:

  • Land D18584 (support for serving hunks from Files, plus fixes for destroying diffs), but it serves mostly as a tool for dealing with exceptional cases ("we accidentally put a 1GB hunk in the database and can't just delete it for complicated reasons").
  • Don't pursue D18585 (automatically moving old hunks to Files) since the actual savings under production settings are fairly small, and I don't think this use case really motivates changing production settings.

To deal with shard fullness, I'll just schedule maintenance in this week's upgrade cycle and expand the volumes with some shard downtime. This should be relatively brief and happen off-hours so I don't think it will be particularly disruptive.

We could possibly revisit this in the future, but given that the rows are numerous and small I think there's less opportunity here than I'd originally suspected.

epriestley claimed this task.

Per above, not planning to actually go forward with the GC step since the impact isn't ultimately very large.