Improve prose diff smoothing rules for whitespace and prefix/suffix changes
ClosedPublic
Actions

Authored by epriestley on Jun 7 2016, 7:15 PM.

Details

Reviewers

chad

Maniphest Tasks

T7643: Improve prose diffs (was: description changes don't generate usable diffs)

Commits

rPHU6d1eea50fb9a: Improve prose diff smoothing rules for whitespace and prefix/suffix changes

Summary

Ref T7643. In D11297, I rewote the test plan but the algorithm chose to share spaces and produce a silly diff which a human would not produce:

Screen Shot 2016-06-07 at 12.05.01 PM.png (318×625 px, 38 KB)

This diff is technically correct, but not particularly readable.

To improve this, first allow noisy changes to be smoothed at the beginning and end of runs, not just in the middle. Part of the problem was that apple and banana (both with spaces after them) were being diffed as "xxxxxs" or similar, since the spaces were removed early in the process and not smoothed. Pad the string before smoothing, and allow strings like "...xxxxs" to be smoothed into "...xxxxx".

Second when merging runs of "-" and "+", humans would apply different rules depending on the content of the added and removed text. For example, if "elephants" is changed to "cats", it's easier for humans to read this:

- elephants
+ cats

..than this:

- elephan
+ ca
= ts

This is basically the smoothing rule we already apply. However, if the suffix isn't letters like ts but something like . (period, space), humans would prefer this:

- in the past
+ once upon a time
= .<space>

So when we merge runs of changes, find common "layout" prefixes and suffixes and merge them as "=" blocks. This eliminates cases where the smoothing rule smooths things out more than a human editor would.

Test Plan

Ran unit tests. Generated a similar local change. Before, got this "correct" mishmash which a human would not produce: