Page MenuHomePhabricator

When updating a Ferret search index document, reuse existing rows where possible
ClosedPublic

Authored by epriestley on Nov 19 2020, 9:36 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Mar 23, 12:52 PM
Unknown Object (File)
Mar 6 2024, 4:02 AM
Unknown Object (File)
Jan 26 2024, 6:37 AM
Unknown Object (File)
Jan 25 2024, 2:27 AM
Unknown Object (File)
Dec 28 2023, 12:20 PM
Unknown Object (File)
Dec 27 2023, 1:39 PM
Unknown Object (File)
Dec 22 2023, 3:27 AM
Unknown Object (File)
Dec 21 2023, 7:27 AM
Subscribers
None

Details

Summary

Ref T13587. Currently, when a document is reindexed by Ferret, the old document is completely discarded and a new version is inserted to replace it.

This approach is simple to implement, but can lead to exhaustion of the ngram AUTO_INCREMENT id column in reasonable circumstances.

Conceptually, this approach "should" be fine and this exhaustion is an awkard implementation detail. However, since it's easy to be less wasteful when performing document updates and all the other approaches are awkward or leaky in other ways that are probably worse, use a more complex implementation to avoid executing unnecessary INSERT statements.

Test Plan
  • Created and indexed a new document, searched for it.
  • Updated a document, indexed it with bin/search index ... --force --trace, saw only modifications updated in the index.
  • Searched for newly added terms (got hits) and removed terms (no longer got hits) to verify add/delete index behavior.

Diff Detail

Repository
rP Phabricator
Branch
defrag1
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 25133
Build 34682: Run Core Tests
Build 34681: arc lint + arc unit