Page MenuHomePhabricator

Dump tables one at a time, rather than all at once
ClosedPublic

Authored by epriestley on Oct 4 2017, 5:24 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 17, 12:58 PM
Unknown Object (File)
Mon, Jan 6, 2:42 AM
Unknown Object (File)
Mon, Dec 30, 3:33 PM
Unknown Object (File)
Mon, Dec 30, 3:00 AM
Unknown Object (File)
Tue, Dec 24, 2:24 AM
Unknown Object (File)
Dec 20 2024, 7:25 AM
Unknown Object (File)
Dec 4 2024, 9:55 AM
Unknown Object (File)
Dec 3 2024, 5:32 PM

Details

Summary

Ref T13000. This allows us to be more selective about which tables we dump data for, to reduce the size of backups and exports. The immediate goal is to make large ngrams tables more manageable in the cluster, but this generally makes all backups and exports faster and easier.

Here, tables are dumped one at a time. A followup change will sometimes add the --no-data flag, to skip dumping readthrough caches and (optionally) rebuildable indexes.

Test Plan

Compared a dump from master and from this branch, found them to be essentially identical. The new dump has a little more header information in each section. Verified each contains the same number of CREATE TABLE statements.

Diff Detail

Repository
rP Phabricator
Branch
dump1
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 18629
Build 25097: Run Core Tests
Build 25096: arc lint + arc unit

Event Timeline

This presumably affects the restore workflow as well, right? This looks good, but I don't think we should land it until the corresponding diff for restore is ready.

There's actually no impact to the restore workflow -- we still end up with one output file, and the output file is exactly the same except that it has more copies of the pre-dump / post-dump compatibility statements between each table dump:

-- MySQL dump 10.13  Distrib 5.6.21, for osx10.8 (x86_64)
--
-- Host: localhost    Database: local_almanac
-- ------------------------------------------------------
-- Server version       5.6.21

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

...

/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;

-- Dump completed on 2017-10-04  9:10:34

Those are safe to run as many times as we want, so it's fine to just run them more often than we previously did. We could try to trim them out but that seems like more trouble than it's worth since I have no idea what they might look like in different versions of mysqldump. It looks like --compact disables at least some of this but I'm a little wary about it since it says "useful for debugging".

That is, the old dump looked like this:

save variables;
  create table a;
  insert data into table a;
  create table b;
  insert data into table b;
  ...
restore variables;

The new one looks like this:

save variables;
  create table a;
  insert data into table a;
restore variables;
save variables;
  create table b;
  insert data into table b;
restore variables;
...

Practically, they do the same thing when piped into mysql.

This revision is now accepted and ready to land.Oct 4 2017, 6:59 PM
This revision was automatically updated to reflect the committed changes.
joshuaspence added inline comments.
src/infrastructure/storage/management/workflow/PhabricatorStorageManagementDumpWorkflow.php
151

Shouldn't this be $command = ...?

avivey added inline comments.
src/infrastructure/storage/management/workflow/PhabricatorStorageManagementDumpWorkflow.php
151

Fixed in T13004.

src/infrastructure/storage/management/workflow/PhabricatorStorageManagementDumpWorkflow.php
151