Page MenuHomePhabricator

Perform a large data export from Phacility
Closed, ResolvedPublic

Description

See PHI2236. This has taken me months to get to and is academic at this point, but large file transfer encountered a reliability issue during export.

I vaguely recall that this code takes some shortcuts in error handling/retries, and the easiest fix is probably to just fix edge cases until the export goes through.

(Of course, it'll probably just work the first time now...)

Event Timeline

epriestley created this task.

The (anonymized) error the process encountered occurred while transferring the dump to central storage was:

STDOUT
[ 
  { 
    "path": "/core/bak/tmp//71von3l9q9kw0044/********.sql.gz",
    "phid": null,
    "errors": [
      "Unable to upload file chunks: [HTTP/504]"
    ]
  }
]

This file has 2,802 chunks so it's ~11GB, but the system handles data at that scale (the only technical limit is the CloudFront 20GB object limit described in T13352) and has performed larger exports in the past.

The export process is already robust at a coarse level: the dump is retained on disk and the process can be retried at the "upload the whole file again" level, then picked up with bin/host export using the --database or --database-file flags (probably with --keep-file).

But a single HTTP error (especially a 504) across 2,800 chunks shouldn't really cause immediate transfer failure, since it's reasonable to push those chunks to the end of the queue and retry them a couple times.

It would also be reasonable to be able to resume uploads (e.g., arc upload --resume <identifier> [--resume <identifier>] -- file [file] ... -- the files and identifiers can be matched by hash) but I suspect the export will go through long before I make it that far.

(Of course, it'll probably just work the first time now...)

Narrator: It did.