Page MenuHomePhabricator

File deletion should be queued up and run by daemons
Closed, ResolvedPublic

Description

Problem: I upload a big file into some high-latency backend, like S3 (or in my case, the B2 Backblaze API). This sits under the chunked storage engine, which will write small individual chunks of the file into separate entities, and recombobulate everything on command.

Now, I want to delete that file. In my case, I have a 200mb file I want to delete which has been chunked into several files. I click on 'Delete' in the UI, and this pops up a Modal dialog.

delete-modal.png (162×559 px, 8 KB)

The moment you click Delete, the UI will grey out, and the UI thread you're using will begin issuing synchronous deleteFile commands, one-by-one, for each individual file chunk. It took so long for me that once I saw all the chunks deleted in my bucket, it still didn't properly redirect. Naturally this is going to take a long time, especially because you can upload arbitrary-sized files, which is a nice feature!

Steps to reproduce: Upload a large file that's a few hundred megabytes into S3 (drag and drop in the UI, etc). Then try to delete, and go get coffee while the UI thread issues dozens of deletion commands (each requiring several HTTP requests) for each individual chunk.

Solution: The best thing to do, as @epriestley noted, was to instead simply have the Delete button shuffle off the commands into a queue for the daemons to handle.

Revisions and Commits

Event Timeline

Kicking this to the daemons should be like 20 lines of code, but we probably also need to flag the file as "queued for deletion" in the UI in the meantime and prevent you from interacting with it, which touches a few other interfaces, so this isn't completely trivial.

When using local disk storage, the file is owned by the web-server user, not the daemon user.

Broadly:

  • When you click "Delete File", we currently delete the file in the web process. Since we've supported enormous files and pluggable storage backends for a while, this could take an arbitrarily long amount of time to complete.
  • Instead, we want to flag the file as "deleted", hide it in the web UI, and queue up a task in the daemons to actually get rid of the data.

To do this:

  • D15743 may be somewhat helpful as a reference, particularly for the daemon queue stuff.
  • Since we want to hide these files from queries, we should use a separate isDeleted column, not just a property.
    • Add a new column and a schema migration for it. D17566 is a recent change you can look at.
  • Change PhabricatorFileQuery to have a method like withIsDeleted(...), similar to withIsPartial(...). This method should accept true (find only deleted files), false (find only non-deleted files) and null (find all files).
  • If the query has withIsDeleted(false) specified, don't match files with isDeleted set (deleted files) or ttl non-null and less than PhabicatorTime::getNow() (temporary files which have expired).
  • Add withIsDeleted(false) to the queries in PhabricatorFileSearchEngine, PhabricatorFileEditEngine, FileDataController, FileInfoController, FileDeleteController (and maybe some others, like FileTransformController) so they treat "deleted" files as though they really don't exist.

To test:

  • Deleting a file or setting its TTL to some time in the past (e.g., manually in the database) should make it vanish from the web UI.
  • A daemon task should queue up (visible in /daemon/ as long as you aren't running the daemons -- if you are, it may process immediately).
  • You should be able to run it in the foreground with bin/worker execute --id <id> to debug it and make sure it really destroys the file data.
  • If you want to make this a little easier to test, you can write a bin/files delete <file> that applies logic similar to the DeleteController, but uses PhabricatorWorker::setRunAllTasksInProcess(true); to run the worker in the foreground. Then you won't have to go through the whole delete a file -> find the task id -> run it manually setup every time you want to test a change. PhabricatorFeedManagementRepublishWorkflow is a similar example.

Once this works, convert deletion to use ModularTransactions and make it available via the API eventually once file.edit gets written:

  • Add a DeleteTransaction, similar to FileNameTransaction, and move the code to flag the file as deleted there.
  • Have the DeleteController apply it, similar to PhabricatorPasteArchiveController.

This will also run into T4752 but installs generally shouldn't be worse off after these changes than they already are.