HomePhabricator

Make prose diff algorithm more iterative, to improve prose diffs for (among…

Description

Make prose diff algorithm more iterative, to improve prose diffs for (among other things) removed commas

Summary:
Ref T7643. This is a little hard to explain but before we would do this:

  • Diff paragraphs.
  • For each different paragraph, diff sentences
  • For each different sentence, diff characters.

Now, we do this:

  • Diff paragraphs.
  • Collect all the identical, purely added, and purely removed paragraphs and set them aside. We know we should have good diffs for these already.
  • What's left over is sequences of removed/added/changed paragraphs, which we may not have great diffs for yet. Smush these together into big diff blocks.
  • Now, for these blocks, diff sentences.
  • Repeat all of that to diff characters.

This seems to pass all the existing unit tests, and pass new unit tests which I was previously unable to make pass by fiddling with things without changing the algorithm.

Test Plan: Passed existing unit tests. Passed new unit tests.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T7643

Differential Revision: https://secure.phabricator.com/D16839