Page MenuHomePhabricator

Improve prose diff smoothing rules for whitespace and prefix/suffix changes
ClosedPublic

Authored by epriestley on Jun 7 2016, 7:15 PM.
Tags
None
Referenced Files
F14110673: D16071.id38671.diff
Wed, Nov 27, 6:31 PM
F14107592: D16071.id38672.diff
Wed, Nov 27, 8:35 AM
Unknown Object (File)
Sun, Nov 24, 12:56 AM
Unknown Object (File)
Thu, Nov 7, 6:07 PM
Unknown Object (File)
Oct 20 2024, 2:22 PM
Unknown Object (File)
Sep 21 2024, 7:15 AM
Unknown Object (File)
Sep 20 2024, 1:43 AM
Unknown Object (File)
Sep 12 2024, 7:57 PM
Subscribers

Details

Summary

Ref T7643. In D11297, I rewote the test plan but the algorithm chose to share spaces and produce a silly diff which a human would not produce:

Screen Shot 2016-06-07 at 12.05.01 PM.png (318×625 px, 38 KB)

This diff is technically correct, but not particularly readable.

To improve this, first allow noisy changes to be smoothed at the beginning and end of runs, not just in the middle. Part of the problem was that apple and banana (both with spaces after them) were being diffed as "xxxxxs" or similar, since the spaces were removed early in the process and not smoothed. Pad the string before smoothing, and allow strings like "...xxxxs" to be smoothed into "...xxxxx".

Second when merging runs of "-" and "+", humans would apply different rules depending on the content of the added and removed text. For example, if "elephants" is changed to "cats", it's easier for humans to read this:

- elephants
+ cats

..than this:

- elephan
+ ca
= ts

This is basically the smoothing rule we already apply. However, if the suffix isn't letters like ts but something like . (period, space), humans would prefer this:

- in the past
+ once upon a time
= .<space>

So when we merge runs of changes, find common "layout" prefixes and suffixes and merge them as "=" blocks. This eliminates cases where the smoothing rule smooths things out more than a human editor would.

Test Plan

Ran unit tests. Generated a similar local change. Before, got this "correct" mishmash which a human would not produce:

Screen Shot 2016-06-07 at 12.02.33 PM.png (293×652 px, 24 KB)

After, got this better version which is what a human editor would do with this:

Screen Shot 2016-06-07 at 12.02.44 PM.png (325×631 px, 24 KB)

Diff Detail

Repository
rPHU libphutil
Branch
prose9
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 12543
Build 15911: Run Core Tests
Build 15910: arc lint + arc unit

Event Timeline

epriestley retitled this revision from to Improve prose diff smoothing rules for whitespace and prefix/suffix changes.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: chad.
epriestley added a subscriber: asherkin.
chad edited edge metadata.
This revision is now accepted and ready to land.Jun 7 2016, 7:52 PM
This revision was automatically updated to reflect the committed changes.