HomePhabricator

Improve prose diff smoothing rules for whitespace and prefix/suffix changes

Tags
None
Referenced Files
F1679109: Screen Shot 2016-06-07 at 12.02.44 PM.png
Jun 7 2016, 7:58 PM
F1679108: Screen Shot 2016-06-07 at 12.02.33 PM.png
Jun 7 2016, 7:58 PM
F1679097: Screen Shot 2016-06-07 at 12.05.01 PM.png
Jun 7 2016, 7:58 PM
Subscribers
None

Description

Improve prose diff smoothing rules for whitespace and prefix/suffix changes

Summary:
Ref T7643. In D11297, I rewote the test plan but the algorithm chose to share spaces and produce a silly diff which a human would not produce:

Screen Shot 2016-06-07 at 12.05.01 PM.png (318×625 px, 38 KB)

This diff is technically correct, but not particularly readable.

To improve this, first allow noisy changes to be smoothed at the beginning and end of runs, not just in the middle. Part of the problem was that apple and banana (both with spaces after them) were being diffed as "xxxxxs" or similar, since the spaces were removed early in the process and not smoothed. Pad the string before smoothing, and allow strings like "...xxxxs" to be smoothed into "...xxxxx".

Second when merging runs of "-" and "+", humans would apply different rules depending on the content of the added and removed text. For example, if "elephants" is changed to "cats", it's easier for humans to read this:

- elephants
+ cats

..than this:

- elephan
+ ca
= ts

This is basically the smoothing rule we already apply. However, if the suffix isn't letters like ts but something like . (period, space), humans would prefer this:

- in the past
+ once upon a time
= .<space>

So when we merge runs of changes, find common "layout" prefixes and suffixes and merge them as "=" blocks. This eliminates cases where the smoothing rule smooths things out more than a human editor would.

Test Plan:
Ran unit tests. Generated a similar local change. Before, got this "correct" mishmash which a human would not produce:

Screen Shot 2016-06-07 at 12.02.33 PM.png (293×652 px, 24 KB)

After, got this better version which is what a human editor would do with this:

Screen Shot 2016-06-07 at 12.02.44 PM.png (325×631 px, 24 KB)

Reviewers: chad

Reviewed By: chad

Subscribers: asherkin

Maniphest Tasks: T7643

Differential Revision: https://secure.phabricator.com/D16071