Page MenuHomePhabricator

Improve prose diff smoothing rules for whitespace and prefix/suffix changes

Authored by epriestley on Jun 7 2016, 7:15 PM.
Referenced Files
Sat, Sep 24, 1:07 PM
Unknown Object (File)
Sat, Sep 17, 1:52 AM
Unknown Object (File)
Fri, Sep 16, 8:10 AM
Unknown Object (File)
Sun, Sep 11, 7:20 AM
Unknown Object (File)
Thu, Sep 8, 11:59 AM
Unknown Object (File)
Mon, Sep 5, 5:14 PM
Unknown Object (File)
Mon, Sep 5, 1:38 PM
Unknown Object (File)
Sat, Sep 3, 4:40 PM



Ref T7643. In D11297, I rewote the test plan but the algorithm chose to share spaces and produce a silly diff which a human would not produce:

Screen Shot 2016-06-07 at 12.05.01 PM.png (318×625 px, 38 KB)

This diff is technically correct, but not particularly readable.

To improve this, first allow noisy changes to be smoothed at the beginning and end of runs, not just in the middle. Part of the problem was that apple and banana (both with spaces after them) were being diffed as "xxxxxs" or similar, since the spaces were removed early in the process and not smoothed. Pad the string before smoothing, and allow strings like "...xxxxs" to be smoothed into "...xxxxx".

Second when merging runs of "-" and "+", humans would apply different rules depending on the content of the added and removed text. For example, if "elephants" is changed to "cats", it's easier for humans to read this:

- elephants
+ cats

..than this:

- elephan
+ ca
= ts

This is basically the smoothing rule we already apply. However, if the suffix isn't letters like ts but something like . (period, space), humans would prefer this:

- in the past
+ once upon a time
= .<space>

So when we merge runs of changes, find common "layout" prefixes and suffixes and merge them as "=" blocks. This eliminates cases where the smoothing rule smooths things out more than a human editor would.

Test Plan

Ran unit tests. Generated a similar local change. Before, got this "correct" mishmash which a human would not produce:

Screen Shot 2016-06-07 at 12.02.33 PM.png (293×652 px, 24 KB)

After, got this better version which is what a human editor would do with this:

Screen Shot 2016-06-07 at 12.02.44 PM.png (325×631 px, 24 KB)

Diff Detail

rPHU libphutil
Lint Not Applicable
Tests Not Applicable

Event Timeline

epriestley retitled this revision from to Improve prose diff smoothing rules for whitespace and prefix/suffix changes.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: chad.
epriestley added a subscriber: asherkin.
chad edited edge metadata.
This revision is now accepted and ready to land.Jun 7 2016, 7:52 PM
This revision was automatically updated to reflect the committed changes.