Page MenuHomePhabricator

Intraline diff bailout at 80 characters is inflexible and undesirable
Closed, ResolvedPublic

Description

Compare lines 629 and 681 in the attached screenshot.

Event Timeline

MarkoTintor attached 1 file(s): Restricted File.
MarkoTintor added a subscriber: MarkoTintor.

The intraline diff algorithm is O(n^2) (I believe this can't be improved much) and runs in PHP which gives it a huge runtime constant (this could be improved, but at the cost of making installation much more complicated). To avoid skyrocketing performance costs, we bail out rather than computing intraline differences if the changed span is too large. Specifically, the algorithm:

  • Takes the old and new strings, A and B;
  • computes the length of the shared prefix (if any) and shared suffix (if any); and
  • if the remaining string length between the prefix and suffix is more than 80 characters, bails out and marks the whole line as changed.

The bailout code is here:

https://secure.phabricator.com/diffusion/ARC/browse/master/src/difference/ArcanistDiffUtils.php;d23d2df7e1922b32$245

So this is currently working as intended. We could probably make the cutoff configurable, though, and possibly port it to C some day...

epriestley renamed this task from Sometimes the entire line is marked as different instead of just part of it to Intraline diff bailout at 80 characters is inflexible and undesirable.May 20 2012, 8:48 PM
epriestley triaged this task as Wishlist priority.
epriestley added a subscriber: epriestley.
btrahan added a project: Differential.
chad changed the visibility from "All Users" to "Public (No Login Required)".Jul 3 2015, 4:36 AM
epriestley added a comment.EditedMay 9 2018, 8:37 PM

D19442 is marked as fixing this, although it just raises the limit from 80 to 100. This span measures the number of characters between the first changed character and the last changed character (not the total number of characters on the line) and we haven't seen tons of interest in this over the last several years so I think few installs are running into issues with it even at 80. Bumping it to 100 should give us a bit more breathing room but I don't currently plan to make it configurable (see T8227).

Self-hosted installs are free to change 100 to some other number, just be aware that the cost rises with O(N^2).

Some day, we will likely port this to C (see T2312) and be able to raise the limit to some much larger value (perhaps 512 or similar) if you install the companion extension.