Page MenuHomePhabricator

When migrating files between storage engines with "bin/files migrate ...", skip expired temporary files
ClosedPublic

Authored by epriestley on Jul 25 2018, 11:21 PM.
Tags
None
Referenced Files
F13096760: D19536.diff
Thu, Apr 25, 6:40 PM
Unknown Object (File)
Thu, Apr 4, 12:21 AM
Unknown Object (File)
Mon, Apr 1, 8:38 PM
Unknown Object (File)
Mar 26 2024, 4:52 AM
Unknown Object (File)
Jan 31 2024, 5:23 PM
Unknown Object (File)
Jan 31 2024, 3:14 PM
Unknown Object (File)
Dec 4 2023, 10:53 PM
Unknown Object (File)
Nov 30 2023, 1:48 AM
Subscribers
None

Details

Summary

See T7148. This just cheats us out of a weird sort of race where we:

  • Dump an instance, including some F123 which is a temporary file which expires in 3 minutes.
  • A few minutes later, the daemons delete the data for that file.
  • A few minutes after that, we try to bin/files migrate --copy to copy the data from S3 into the MySQL blob store.
  • This fails since the data is already gone.

Instead, just skip these files since they're already dead to us.

Test Plan

Faked this locally, will migrate the PHI769 instance on aux001.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

epriestley retitled this revision from When migrating files between storage engines with "bin/files migrate ...", skip expires temporary files to When migrating files between storage engines with "bin/files migrate ...", skip expired temporary files.Jul 25 2018, 11:25 PM
amckinley added inline comments.
src/applications/files/management/PhabricatorFilesManagementMigrateWorkflow.php
103–104

I assume PHP casts a null TTL to something that would always pass the less-than check? Still curious why this isn't written as if ($ttl && $ttl < PhabricatorTime::getNow()).

Also, do we need to check if the files have actually expired? Why not just skip all files that have a TTL (assuming that's the definition of a temp file)?

This revision is now accepted and ready to land.Jul 26 2018, 1:12 AM

if ($ttl && $ttl < PhabricatorTime::getNow()) is logically equivalent, I just find the separate clauses a little easier to read here. (I think I usually prefer separate clauses unless there's some reason they're worse, and usually find if (...) statements with exactly one test easiest to read, although this is super subjective and I'm probably not consistent about it.)

I'm ignoring only expired temporary files (instead of all temporary files) because it's possible for a temporary file to expire a year from now, or a hundred years from now, and effectively be a permanent file. In particular, file.allocate lets you specify any TTL epoch with deleteAfterEpoch, and the lower-level APIs let you pick any ttl.relative or ttl.absolute. We could just skip these and effectively say "exporting data deletes your /tmp directory", but that's a slightly stronger statement than "exporting data doesn't bring any files which were recently in your /tmp directory but have already been deleted". Practically, I believe the upstream only ever uses a 24-hour TTL on anything right now, but file.allocate is an outward-facing API so any install could theoretically have long-lifetime temporary files.

(Actually, it looks like meme-generator output images have a 30-day TTL.)

Maybe another vague argument in favor of handling temporary files is that bin/files migrate also isn't only for exports, and I could imagine wanting to test the other options on a temporary file.