Page MenuHomePhabricator

Diffusion shows unexpected diff when an entire directory is removed
Open, Needs TriagePublic


Diffusion is showing unexpected diffs for commits in which an entire directory is removed. See rP5c1e4488dedafda08684b33a8a4786cf687d2811 for example, in which the src/infrastructure/daemon/bot directory was deleted:

138  rP5c1e4488deda.png (951×1 px, 182 KB)

(For rP5c1e4488dedafda08684b33a8a4786cf687d2811, the "diff" shown for src/infrastructure/daemon/bot is actually the diff for src/infrastructure/daemon/bot/PhabricatorBot.php)

I often see this bug on our Phabricator installation as well.

Event Timeline

What does "mysterious" mean.

In T12959#230799, @chad wrote:

What does "mysterious" mean.

Maybe unexpected would be a better word. I meant that it was a mystery to me as to why it was showing a diff for a deleted directory.

joshuaspence renamed this task from Diffusion shows mysterious diff when an entire directory is removed to Diffusion shows unexpected diff when an entire directory is removed.Aug 9 2017, 10:23 PM
joshuaspence updated the task description. (Show Details)
joshuaspence updated the task description. (Show Details)

In general are you aware we no longer take bug reports or feature requests here?

In T12959#230805, @chad wrote:

In general are you aware we no longer take bug reports or feature requests here?

My understanding was that bug reports are filed on Discourse, where they are verified by a Community member and then filed here. I guess I need to re-read the documentation.

I would open to accepting this as a bug report if I felt it was deliberately designed to work the way you expect, but was not (like a regression). Or if information was presented that all other repository viewers do it this way. Otherwise I think it's more 'nitpick' or 'feature request'. A cursory look at GitHub, seems they show the full diff as well:

Is there something I'm missing here, maybe?

I think this is a real bug. Here's a more complete example of what's going wrong, why it's wrong, and how to reproduce it.

To set up a reproduction case, first, I added a new directory (T12959/) with one file in it (T12959/nectarine-facts.txt), using a process like this:

$ mkdir T12959/
$ nano T12959/nectarine-facts.txt # Add some facts.
$ git add T12959/nectarine-facts.txt
$ git commit -m ...
$ git push

This produced this commit: rGITTESTd0fdaf135f0d008f2f8e186a42fd9dd98d6ba7b2

Then, I removed that file, using a process like this:

$ git rm T12959/nectarine-facts.txt
$ git commit -m ...
$ git push

That produced this commit: rGITTEST657f03ff624b6bbb88166d289d8d29797b87fd61

So the second commit removes a file, and implicitly removes the containing directory because there are no remaining paths inside it.

Looking at the second commit, in the table of contents section, we list that both the directory and file were removed. I think this is correct and desirable, although it might be a little better if the icon for the directory was () instead of (), and if the path was written as T12959/ (with a slash) instead of T12959 (without a slash), since these would help give the user a hint that the path is a directory:

Screen Shot 2017-08-10 at 3.43.54 AM.png (166×368 px, 13 KB)

Note that GitHub's behavior here differs: it does not show a similar table of contents, and does not have a listing for the removed directory. All else being equal, I think this is less desirable than our behavior: it's good to have a way to see from the web UI that a directory was removed by a change, and this can help reviewers make sure that "remove X" changes are really cleaning everything up.

However, just below that, we show an actual diff section for the directory itself:

Screen Shot 2017-08-10 at 3.47.54 AM.png (512×1 px, 63 KB)

This is potentially okay. I believe one reason we do this is that in Subversion, a directory may legitimately have property changes. In this case, we must show an actual diff section to show those property changes. In Git, a directory can never have any "real" changes, and we could probably omit this element (and just have the table of contents show the directory without linking it anywhere).

The only downside I can come up with to doing this is that it would become impossible to share a link to a directory removal or leave an inline comment about it, e.g. "Why did this whole directory get removed? Shouldn't there be more stuff here?" but I'm not sure that comment really makes sense or that there's much value in this.

Even if we show the element in Git, these parts of the element are misleading:

  • The icon is , but would be a better cue.
  • The path is written as T12959, but T12959/ (with a slash) would be a better cue.
  • The text says "This file was deleted.", but "This directory was deleted." would be a better cue.
  • The UI has a "Show File Contents" link, but this is misleading (this is a "Directory", not a "File") and there is nothing valid this link could ever do.

The most buggy behavior here is that when you click this link, you get a fairly arbitrary and meaningless diff, which is what this task is surfacing. To explain the "mystery" here, I believe this is the diff for the first file in the directory, because we're probably running git diff path/to/directory/ internally (which returns diffs for every affected file) and parsing out only the first diff. That is, if you click both the "Show File Contents" links, you get this:

Screen Shot 2017-08-10 at 3.54.20 AM.png (364×1 px, 46 KB)

This is wrong -- the change is listed twice, and the change we show for the directory is relatively arbitrary -- and not similar to what GitHub is doing.

Better behavior would probably be:

  • Use directory icons in both places.
  • Use directory path cues (with a trailing slash).
  • Use "directory" language, not "file" language.
  • Either omit the diff (and don't link it in the table of contents) or keep it but don't offer a "Show File Contents" link.

I believe we already have code to do most of this, and the issue is really that the "this is a directory" flag on the path is being lost somewhere. I think if that flag is properly set/retained, most of these rendering behavior changes will correct themselves automatically. For example, here, we already have better language in the code for directories, we're just not reaching it:;71eaf3e8c4d61203ab2887ab1a294ea81bd6ae83$38-40

Certainly, the desired behavior here is better rendering. I believe that's also the intended behavior and that this is a bug (the "this is a directory" flag is being dropped somewhere), but it's possible there's a // TODO: Figuring out if these are directories or not is too hard so just treat them as files for now. somewhere in the code that I'm forgetting about, muddying intent/bugginess.

The new rules on reports are still in flux a bit and extra-muddy for Community members, but I'd generally like to move to a "report + review" model for community reports, something like we currently have for code review. So the general idea is that your bug needs someone to review it and agree with it (e.g., reproduce a bug, or agree that a feature request is a brilliant new idea that makes sense and would be appropriate for the upstream and completely describes a reasonable, generally-experienced root problem) before it comes upstream.

Some pathways are:

  1. You post on Discourse. Some other community member follows your reproduction steps and confirms that they can reproduce the issue. They file it upstream in their own words, referencing the original report. In this case, since you need to convince someone to work with you out of the goodness of their own heart, you're incentivized to make your case more clearly (e.g., not file a report which takes an hour to reproduce, since no one else is going to do that). Since they're the one bringing the report upstream, they're incentivized to make sure they really understand the issue.
  2. You post through Support. Whoever is in the support rotation (currently always me) follow your reproduction steps and confirms that they can reproduce the issue. They file it upstream if appropriate, referencing the original support ticket. In this case, you're basically paying us to review and triage (and, usually, fix) the issue for you promptly. The "mana" system defends this channel against bogus reports and reports with outsized reproduction costs.
  3. If you're totally certain that there's no possibility your bug needs review, you can file it upstream directly. T12960 is probably a reasonable example of this. This is sort of analogous to the "you can just send a diff for a typo, you don't need to file a task" rule that covers review one step deeper. If someone abuses this, they get warned a lot and eventually stripped of Community privileges if we can't find some more productive compromise.

(We're also trying to give long-standing community members some degree of special access to Support, per discussion elsewhere, so they can go through channel (2) without needing to come up with enterprise-level support fees.)

I think this report was a bit borderline under rule (3), and that we're generally raising the bar on report quality here and anything coming upstream should be clear and complete enough that any reasonable (technical, familiar-with-Phabricator) reader has very little chance of misunderstanding the issue. Empirically, both this task and my task in T12956 don't meet that bar, since they needed a lot of clarification in discussion. As written (e.g., a relatively off-the-cuff report of bad behavior with a link to an example -- but no actual reproduction instructions, or explicit expect/actual language, and a screenshot which doesn't completely show the problem) this would probably be better as a Discourse thread. Then the clarifying discussion could have taken place there (e.g., "oh, this actually isn't as clear as I thought, here's what the issue is more narrowly") and a more complete version could have made it upstream eventually. I don't think it needs to be as complete as my comment above, but somewhere between the two is probably what we're aiming for.

PHI1776 is another report of this behavior.