Make prose diff algorithm more iterative, to improve prose diffs for (among other things) removed commas
Summary:
Ref T7643. This is a little hard to explain but before we would do this:
- Diff paragraphs.
- For each different paragraph, diff sentences
- For each different sentence, diff characters.
Now, we do this:
- Diff paragraphs.
- Collect all the identical, purely added, and purely removed paragraphs and set them aside. We know we should have good diffs for these already.
- What's left over is sequences of removed/added/changed paragraphs, which we may not have great diffs for yet. Smush these together into big diff blocks.
- Now, for these blocks, diff sentences.
- Repeat all of that to diff characters.
This seems to pass all the existing unit tests, and pass new unit tests which I was previously unable to make pass by fiddling with things without changing the algorithm.
Test Plan: Passed existing unit tests. Passed new unit tests.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T7643
Differential Revision: https://secure.phabricator.com/D16839