Page MenuHomePhabricator

Improve prose diffs for changes spanning very large blocks of intermediate text
ClosedPublic

Authored by epriestley on Nov 16 2016, 6:05 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Dec 12, 10:04 PM
Unknown Object (File)
Thu, Dec 5, 3:06 AM
Unknown Object (File)
Wed, Dec 4, 11:05 AM
Unknown Object (File)
Tue, Dec 3, 8:44 PM
Unknown Object (File)
Tue, Dec 3, 12:56 AM
Unknown Object (File)
Nov 26 2024, 11:54 PM
Unknown Object (File)
Nov 22 2024, 9:35 PM
Unknown Object (File)
Nov 18 2024, 12:49 PM
Subscribers
None

Details

Summary

Ref T7643. The failure case described in T7643#200778 is a change, followed by more than 128 sentences, followed by another change.

Because the most coarse level is "split on sentences", this hits maximum length guards and just gives up, marking the whole diff as changed.

Add a new level 0 for splitting on paragraphs. This allows us to accommodate a greater range of reasonable input texts.

This will still fail for a change, followed by more than 128 paragraphs, followed by another change. But hopefully that's outside the realm of cases which we reasonably need to handle.

(Because a "paragraph" here is "text between newlines", some types of text may have a lot of "paragraphs" and we may need to continue tweaking this: for example, remarkup tables or inline code blocks.)

Also, reduce the amount of work we do after hitting an internal limit.

Test Plan

Added failing unit test; made it pass.

Diff Detail

Repository
rPHU libphutil
Branch
prose1
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 14530
Build 18950: Run Core Tests
Build 18949: arc lint + arc unit

Event Timeline

epriestley retitled this revision from to Improve prose diffs for changes spanning very large blocks of intermediate text.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: chad.
chad edited edge metadata.
This revision is now accepted and ready to land.Nov 16 2016, 6:07 PM
This revision was automatically updated to reflect the committed changes.