HomePhabricator

Make prose diff algorithm more iterative, to improve prose diffs for (among…

Authored by epriestley on Nov 10 2016, 8:21 PM.

Description

Make prose diff algorithm more iterative, to improve prose diffs for (among other things) removed commas

Summary:
Ref T7643. This is a little hard to explain but before we would do this:

  • Diff paragraphs.
  • For each different paragraph, diff sentences
  • For each different sentence, diff characters.

Now, we do this:

  • Diff paragraphs.
  • Collect all the identical, purely added, and purely removed paragraphs and set them aside. We know we should have good diffs for these already.
  • What's left over is sequences of removed/added/changed paragraphs, which we may not have great diffs for yet. Smush these together into big diff blocks.
  • Now, for these blocks, diff sentences.
  • Repeat all of that to diff characters.

This seems to pass all the existing unit tests, and pass new unit tests which I was previously unable to make pass by fiddling with things without changing the algorithm.

Test Plan: Passed existing unit tests. Passed new unit tests.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T7643

Differential Revision: https://secure.phabricator.com/D16839