Page MenuHomePhabricator

MIME type detection does not run on files large enough to activate the chunking engine
Closed, ResolvedPublic

Description

This may be an expected behavior, but today I was working on setting up video embedding. I dropped a sample mp4 onto our phab homepage to upload and it worked as expected. However, I was having trouble getting it to display/embed using the {FXXX media=video} syntax. I looked at the file details and its mime type had been changed to application/octet-stream. This may be intended behavior, but it wasn't clear to me.

Screen Shot 2016-06-30 at 3.53.06 PM.png (272×368 px, 23 KB)

What I think is especially noteworthy is that going into the Files application and uploading a file through the picker behaves differently. Specifically, in our install it threw an exception about not having a configured storage engine (I realize this may be an issue specific to our instance/config, but I would expect the two ways of upload files to behave the same...?)

Screen Shot 2016-06-30 at 3.54.47 PM.png (185×977 px, 20 KB)

Edit:

We're fairly up to date, but here you go nonetheless

phabricator cadac75b82bbed18d52c3ee7ba6d396bff69c009 (Fri, Jun 24)
arcanist 18b27b03fa3d9f2439bf998c5fa2e4f5bd93db16 (Sat, Jun 18)
phutil 8aa8612a094b4dafcf5c461b746a613a1e229b86 (Sat, Jun 18)

Event Timeline

Several behaviors are interacting here. First, here's how MIME types work:

  • We don't trust client MIME types because they're client-controlled. Trusting them potentially creates various security issues where you upload a relatively dangerous file (like an SVG with executable Javascript) and claim it has a safer MIME type (like image/png) and trick the server into a more dangerous behavior than it is configured for.
  • Instead, we detect MIME types on the server side by running file (or an equivalent detector) on them.

Here's how large files work:

  • Files over 4MB (approximately) are normally stored in chunks (bytes 0-4MB, bytes 4MB-8MB, bytes 8MB-12MB, etc). This allows us to support arbitrarily large (well, many GB, at least) file uploads, resuming, etc., even when storage engines impose limitations (like max_allowed_packet in MySQL), and without requiring all the interfaces to support streaming.
  • Chunked storage requires the client to know how to upload chunks. New versions of arc, the drag-and-drop flows, and the new file control in Remarkup text areas know how to do this. Old versions of arc, the manual upload flow in Files, and some other upload flows (profile images, macros, etc.) do not yet.
  • When you try to upload a large file using a non-chunked upload interface, the size is limited by the maximum size the engine can support. This is why uploading is failing via Files but succeeding via drag-and-drop.
  • Until recently, we didn't have a chunk-aware <input type="file" /> control. This was added about a month ago in T5187, but hasn't made it everwhere yet.

Finally, the two interact:

  • When you upload a file that activates chunking, we currently never construct the entire file on disk, and thus do not MIME-detect it. We always detect these files as "application/octet-stream".
  • Until adding video support, this never mattered, because no one is uploading 200MB .png files.

What we should probably do instead is this:

  • Detect the MIME type of large files by running file on the first chunk only, under the assumption that all or almost all MIME-magic is based on file headers? Is this reliable? Initially, it seems likely to be sufficient. In particular, the "mime magic" format appears to have no rules for detecting magic in any part of the file other than at the beginning. This will fix the primary issue.
  • Eventually, convert all vanilla uploaders to become chunk-aware uploaders.
epriestley renamed this task from Loss of mime type when file is uploaded via drag-and-drop uploader to MIME type detection does not run on files large enough to activate the chunking engine.Jun 30 2016, 8:13 PM
epriestley triaged this task as Normal priority.
epriestley added a project: Files.

Appears to only occur when a file needs chunked. Small files work fine.

Thanks for the ridiculously fast triage. Not an urgent issue for us, just wanted to make sure you were aware of it.

In T11242#183007, @chad wrote:

This timely comment saved me a great deal of effort in exporting a high-resolution video of my cat with music and effects or something to get it big enough.

In T11242#183007, @chad wrote:

This timely comment saved me a great deal of effort in exporting a high-resolution video of my cat with music and effects or something to get it big enough.

I am sad I didn't get to see this. But for future reference, http://sample-videos.com

Thanks for the report! The primary issue should be resolved at HEAD of master, and promote to stable within about 48 hours.

This change is not retroactive, so it won't affect existing files, but new files should be detected properly:

This doesn't tackle any of the vanilla file uploader stuff, but we'll phase those out eventually.