Page MenuHomePhabricator

Raise the effective corpus size limit in "PhutilProseDifferenceEngine" by using "diff" for coarse passes
Closed, ResolvedPublic

Description

See PHI1408. An install has a large Phriction document which splits into about 3,000 blocks in the outermost pass of PhutilProseDifferenceEngine. This exceeds the effective limit of 128 blocks between the outermost changes in EditDistanceMatrix.

We can remedy this by doing coarse prose difference passes with diff instead of with PhutilEditDistanceMatrix. Specifically:

  • Split the corpus into blocks.
  • Hash each block, possibly normalizing them first.
  • Generate an edit string by using diff on the hashed intermediates.

This should scale to any realistic block count (e.g., ~100K).