Page MenuHomePhabricator

Show images in Diffusion diffs
Closed, ResolvedPublic

Description

When viewing images in Diffusion, we should show the files inline if they're images.

Event Timeline

PhabricatorFile has an isViewableImage() method you can use to make this determination. Things should be pretty straightforward otherwise.

Also, the trickiest part of this is getting a repository imported. This workflow is vastly more difficult than it should be right now (see {T2231}). Let me know if you run into trouble.

The way I have this stuff testable locally is I just created a testing repo on my machine with git init in some random directory, and then imported that into Phabricator. Then you can make commits that add/remove images and other binaries and get reasonable confidence your change behaves correctly.

This was about viewing images INSIDE diffs. Like if someone adds images as part of a commit.

@blc, I haven't forgotten about you but have to run to a meeting. I'll update this when I have a chance.

Okay, so the revised issue makes far more sense. However, I think that turns this from a reasonably-sized task into a complicated minefield. Here are some issues it raises:

How do we actually get the data? There are two major approaches I see: we can get the binaries in the diffs themselves, or we can fill them in afterward.

In the diffs: We can use git diff --binary when we load the change, and get the binary with the diff. At first blush, this seems quite reasonable. However:

  1. No way to do this in SVN.
  2. We ignore these bodies in the parser today (marked with TODO here). Implementing this support doesn't gain us much else (we can parse --binary diffs with arc diff --raw, but ~nobody has ever tried to do this).
  3. The code to parse them is quite complicated -- we do have an encoder in ArcanistBundle, which would need to be inverted.
  4. If the user commits a 1GB non-image binary, we shouldn't try to pull it. But we can't tell how big the response will be, so this is kind of a mess.
  5. This format is substantially larger than raw for most binaries.
  6. I think --binary may cause Git to provide binary diffs in some cases, which aren't sufficient to reconstruct the original files.

Afterward: We can use DiffusionFileContentQuery to execute queries to get the data afterward. This works in SVN and will transfer the data in raw format. We still have the "1GB binary" problem though, and this requires gluing together a bunch of pieces which aren't quite there yet.

All of these problems are tractable, but most of them are complicated. They also require setting up hg, git and SVN repositories to test and develop against. And the benefit of doing all this work is fairly small.

I think we should probably find you something else to work on instead, and maybe get back to this once you've had more exposure to the code and/or the infrastructure is better prepared to support this change. There are some changes happening soon which will make this at least a little easier to deal with.

If you want to pursue it anyway, here's the approach I would take:

First, attack it by pulling data separately instead of using --binary. I think this will be simpler overall.

Second, build it for Git only for now. You can just do an explicit check against the repository VCS with a "// TODO:" comment about SVN/hg. So the skeleton will look something like this:

// DiffusionDiffController, after loading $changeset from
// DiffusionDiffQuery, near line 53.

if ($changeset->getFileType() == DifferentialChangeType::FILE_BINARY) {
  // TODO: Implement Mercurial and Subversion support.
  if ($drequest instanceof DiffusionGitRequest) {
    // Here, we will load image data and enrich $changeset.
  }
}

Next, implement a setByteLimit() method for DiffusionGitFileContentQuery. This will let us solve the "1GB binary change" problem, by loading only up to, say, 1MB of a binary file. We can eventually display a message if it is too large. In Git, you can implement this method using git cat-file -s ..., which shows the size of a file. In Mercurial and SVN, there is no such feature, so you have to stream off the process until you hit the limit. That looks something like this (this strategy will also work in Git):

$corpus_len = 0;
$future = $repository->getLocalCommandFuture(
  'hg cat --rev %s -- %s',
  ...);
$future->setStdoutSizeLimit($this->byteLimit + 1);

// select()-based wait which returns control to us ever 10ms so we can
// see how much data we've read.
foreach (Futures($future)->setUpdateInterval(0.01) as $ready) {
  if ($ready === null) {
    $corpus_len += strlen($future->read());
    if ($corpus_len > $this->byteLimit) {
      throw new ThisFileIsWayTooBigAintNobodyGotSpaceForThatException();
    }
  }
}

list($corpus) = $future->resolvex();

One especially tricky thing here is that $future->read() advances an internal cursor across the read buffer, but does not discard the buffer, so resolvex() still returns the entire content. Also, the subprocess will be automatically killed when $future leaves scope as the exception returns up the stack.

Then, you can use the new byte limit to execute a sane query which won't return 1GB binaries:

  if ($drequest instanceof DiffusionGitRequest) {
+   $new_content_query = DiffusionFileContentQuery::newFromDiffusionRequest(
+     $drequest);
+   $new_content_query->setByteLimit(1024 * 1024);
+   $new_data = $new_content_query->loadFileContent();
+   if ($new_data !== null) {
+     // Here, we will actually attach the data to the changeset
+   } 
  }

(The null check should probably really be try/catch on a TooBigException.) Finally, you can create a file and associate it with the changeset:

$file = PhabricatorFile::newFromData($new_data, $params = array());
$meta = $changeset->getMetadata();
$meta['new:binary-phid'] = $file->getPHID();
$changeset->setMetadata($meta);

Then you need to do the same for the old file (which is slightly more complicated, since you need to figure out which commit you need it at). And then SVN/hg, and make the too-large error message nice, and I guess that's it?

blc removed blc as the assignee of this task.Apr 25 2013, 4:08 PM
epriestley renamed this task from Show images in Diffusion to Show images in Diffusion diffs.Aug 25 2014, 12:43 AM
chad changed the visibility from "All Users" to "Public (No Login Required)".
epriestley claimed this task.

This will ultimately be covered by T13105.