Page MenuHomePhabricator

Make migrating files less scary
Open, Needs TriagePublic

Description

I think this problem has been present for ~3 years. I've had this experience every time I've tried to migrate files across 4 production installs at three different companies. 2 of these installs were "pristine", meaning we never did any trickery or writing of our own extensions.

  • Dry run shows that some files are going to fail
  • Actually migrating the files shows even more failures that weren't present in the dry run
  • Nothing ever explains what these files are (or were, some files don't exist anymore) or what the relative risk / loss you're incurring equates to

Being a seasoned file migration error encounterer, I've come to find that the majority of files fall into these categories:

  • Pholio thumbnails
  • Files with manually set strange policies
  • Files that were not manually uploaded by users (eg, files that some other phabricator application made for some reason)

The failures encountered are mostly harmless / have managed to fix things up without a lot of pain involved. However, it's still terrifying to not really know what's happening when you migrate files, why things failed, what the files are/were etc.

I've migrated files as follows and always had these problems:

  • Disk -> DB (twice)
  • DB -> S3
  • Disk -> S3

It's a relatively small sample size I suppose (4 migrations), but seems pretty consistent. I have db dumps and files from the most recent migration I can make mostly available to Phacility for the purposes of debugging if you like.

Event Timeline

yelirekim updated the task description. (Show Details)
yelirekim added a project: Files.
yelirekim added a subscriber: yelirekim.

Dry run shows that some files are going to fail

Do you know which error messages you saw? As far as I can tell, it is not possible for --dry-run to produce any error messages or warnings about impending failures (it can only print out "already stored" or "would migrate").

I generally can't really reproduce this offhand -- both locally and in production the --dry-run output is clean across hundreds of thousands of files (only those two message types are shown).

I migrated my development instance (~3,600 files from ~1 year of work -- I wiped the instance when we started doing instancing for Phacility circa Jan 2015) from S3 -> Disk and then partially back to S3 without any real problems, modulo an issue with chunked files. (Two files did error, but I found them in a different S3 bucket I'd used for testing a year or so ago.)

The error output was not from the script, it was just errors. I think I remember looking into this a long while ago and it's because the migrate script (incl dry run) actually reads all of the files. The act of reading the file is what generated the error. I should have described this a little better than "dry run shows some files are going to fail", and instead described it as "dry run shows errors for some of the files".

This might be what you're talking about with chunked files?

We are right in the middle of moving our phabricator install to multiple nodes, I will re-run the migrations on monday and send you the output.

I think chunked files shouldn't have produced errors on reads -- at least, the --dry-run was still clean for me on both hosts before D14981.

We are right in the middle of moving our phabricator install to multiple nodes, I will re-run the migrations on monday and send you the output.

Cool, sounds good.

epriestley claimed this task.

Presuming this is fixed / irrelevant / not scary / resolved / something since I don't think this went anywhere elese, yell if not or I just forgot something (or we can follow up after T11044 if this is waiting on that),

I'm trying to figure out if I destroyed our files this weekend by migrating them or by using purge / compress.

It did a lot of damage, I just restored from backups. There are 100k files / 40GB of it, just haven't taken the time to re-run the migrations in a test environment yet.

Migrate didn't actually throw exceptions, but I suspect that it had caused some of the problems which were then exacerbated by compress or purge. Throwing encryption into the mix didn't help.

Well, that sounds scary.

eadler added a project: Restricted Project.Aug 5 2016, 5:05 PM