Page MenuHomePhabricator

Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes
Open, NormalPublic

Description

See PHI1934. An active install reports imminent ID space exhaustion of the revision_fngrams and commit_fngrams tables.

When Phabricator indexes a Ferret document, it completely deletes the old document and then inserts an entirely new document. This is simple to implement, but if a document has, say, 10K ngrams, each reindex consumes 10K AUTO_INCREMENT ID slots. Under reasonable use, this may eventually reach the 4.2B maximum value for a 32-bit autoincrement ID.

The ID is not meaningful and not referenced elsewhere: it just has to be unique, and is only used to make table manipulation easier. So possible solutions include things like:

  1. Make the column 64-bit.
  2. Remove the column entirely and rewrite any code which uses it (this isn't much code, and may be no code at all).
  3. As an immediate remedy, defragment the table. (I wrote a script for this in PHI1934.)
  4. Change the reindex logic to selectively insert/delete instead of just throwing everything out.

I'm inclined to pursue (4) here since I think it's likely fairly simple and has the fewest entanglements with other things.