Page MenuHomePhabricator

Implement rough content-aware inline adjustment rules for ghosts
ClosedPublic

Authored by epriestley on May 6 2015, 11:48 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 17, 5:56 PM
Unknown Object (File)
Sun, Jan 12, 4:39 AM
Unknown Object (File)
Thu, Jan 2, 2:22 AM
Unknown Object (File)
Tue, Dec 31, 12:50 PM
Unknown Object (File)
Sun, Dec 29, 11:54 PM
Unknown Object (File)
Fri, Dec 20, 10:35 AM
Unknown Object (File)
Dec 19 2024, 7:06 PM
Unknown Object (File)
Dec 18 2024, 8:00 AM
Tokens
"Mountain of Wealth" token, awarded by btrahan.

Details

Summary

Ref T7447. Fixes T7600. This likely needs significant adjustment, but implements content-aware comment porting for line changes.

Specifically, this moves lines around to adjust their position considering added and removed lines between the diffs and across rebases.

It does not try to do any actual content (line against line) matching.

Test Plan
  • Unit tests.
  • Poking around in the web UI seems to generate mostly reasonable-ish results?
  • This may be a huge step backward in some cases that I just haven't hit.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

epriestley retitled this revision from to Implement rough content-aware inline adjustment rules for ghosts.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: btrahan.
src/applications/differential/render/DifferentialChangesetTwoUpRenderer.php
281

This is the fix for T7600, which required inlines with the same text (often synthetic lint messages) in order to trigger.

The major case I'm aware of that we don't do a great job of right now is when you make a comment in a block of new text and there are rebase changes. For example, suppose we have these lines:

  1 a
+ 2 b
+ 3 c
+ 4 d
  5 e

...and you leave a comment on line 3, "c".

We'll map it backward to the rebase, but since none of those lines existed in the old file:

1 a
2 e

...it maps to the range (1, 2), which seems correct. We'll then bring it across the rebase and push it forward through the new map.

At this point, we'll map 1 > 1 and 2 > 5, so the final range will be (2, 5), meaning the overall mapping was (3, 3) -> (2, 5). Although this makes sense given the algorithm, it expands 1-line inlines to cover a whole block.

Depending on what else crops up, using text-aware analysis to refine the (2, 5) range (which is a reasonable superset of the "best place" to put the comment) into a tighter range might be reasonable.

I thought of a couple cases I want to test more thoroughly.

  • Do a better job with comments inside large blocks of new code.

Overall, this is building "line adjustment mappings" across hunks. For example, for this diff:

  a
  b
  c
+ d
+ e
+ f
  g
  h
  i

...we get these lines in the left and right files:

1 a     1 a
2 b     2 b
3 c     3 c
4 g     4 d
5 h     5 e
6 i     6 f
        7 g
        8 h
        9 i

We compute a mapping based on the lines:

1 => 1
2 => 2
3 => 3, 4, 5, 6
4 => 7
5 => 8
6 => 9

In the simple case, like a comment on line 5 of the left side of the diff, we can just follow the map and put it on line 8 on the right side of the diff. This is correct (lines 5 and 8 are both "h").

Things get trickier when we're trying to map a line like line 3. To map 3, first I'm going to simplify the map: when we map to multiple lines (like 3 => 3, 4, 5, 6) the lines are always contiguous, so we can express that as "3-6" or just "3, 6":

1 => 1
2 => 2
3 => 3, 6
4 => 7
5 => 8
6 => 9

Since inline comments are really line ranges (an inline comment might be on lines 5-10; if it's only on one line, we say it's on "3-3"), we map this by taking the "top" and "bottom" of the map. For the "top", we take the first line of the inline and always choose the earliest value if there are multiple mappings. For the bottom, we take the last line and always choose the last value.

So we'd map a comment on lines 3-5 (on the left) to lines 3-8 (on the right). We'd map a comment on lines 3-3 to lines 3-6.

To actually move real inline comments, it's not sufficient to just map things once. There are two possible sources of changes:

  • the actual difference between the files;
  • intermediate rebases between the updates.

To account for both, we perform a chain of up to three remappings:

+----------+     +----------+
| Old File |     | New File |
|          | <<< |          |
|  Diff 1  |     |  Diff 1  |
+----------+     +----------+
     V
     V
     V
+----------+     +----------+
| Old File |     | New File |
|          | >>> |          |
|  Diff 2  |     |  Diff 2  |
+----------+     +----------+

First, the line is mapped back to the old file if it isn't already on the old file. Then, it's mapped across the rebase, and finally forward to the new file (it's possible that we'd be better off building a synthetic diff of new vs new and mapping directly, but this seemed more likely to work well to me).

In the case of comments on a block of new code, we end up with a wide range after these mappings, so there's some sketchy magic to try to store the offset across these changes.

This doesn't look at any of the text in the actual diff. This is in contrast to GitHub's approach, which only uses the text in the diff. Using the text in the diff is easier, but I want to try this first, because I think it may produce much better results when the text changes a lot but the comment is still related. Basically, this attempts to correct major errors due to lines being added or removed above or below a hunk, which I think are the most important to correct.

btrahan awarded a token.
btrahan edited edge metadata.
This revision is now accepted and ready to land.May 7 2015, 9:09 PM
This revision was automatically updated to reflect the committed changes.