Page MenuHomePhabricator

Implement ngram search for File objects
ClosedPublic

Authored by amckinley on Apr 17 2017, 7:14 PM.
Tags
None
Referenced Files
F19089283: D17702.id.diff
Wed, Dec 3, 6:31 AM
F19082303: D17702.diff
Tue, Dec 2, 10:45 AM
F19048289: D17702.diff
Thu, Nov 27, 2:23 PM
F19048240: D17702.diff
Thu, Nov 27, 2:14 PM
F18856606: D17702.diff
Nov 1 2025, 10:34 AM
F18849244: D17702.id42610.diff
Oct 30 2025, 8:39 AM
F18846861: D17702.diff
Oct 29 2025, 9:34 PM
F18761612: D17702.diff
Oct 6 2025, 3:22 PM
Subscribers

Details

Summary

Follows the outline in D15656 for implementing ngram search for names of File objects. Also created FileFullTextEngine, because without implementing PhabricatorFulltextInterface, ./bin/search complains that File is not an indexable type.

Test Plan
  • ran ./bin/storage upgrade to apply the schema change
  • confirmed the presence of a new file_filename_ngrams table
  • added a couple file objects
  • ran bin/search index --type file --force
  • confirmed the presence of rows in file_filename_ngrams
  • did a few keyword searches and saw expected results

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

resources/sql/autopatches/20170417.files.ngrams.php
5 ↗(On Diff #42576)

Need to remove this unless we want to force a reindex of all documents during storage upgrade.

src/applications/files/search/FileFulltextEngine.php
3 ↗(On Diff #42576)

Not sure if this class needs to exist, but I couldn't force reindexing without implementing PhabricatorFulltextInterface.

Note that this also doesn't clean up ngrams once files are deleted. I'm going to follow along with D14846: Implement basic ngram search for Owners Package names to fix that.

This is right, except that we probably should't implement FulltextInterface or fulltext-index these. bin/search index was written before the ngrams stuff was written, let me see if I can send you a quick patch to let it work without needing fulltext.

The cleanup should happen automatically as a side effect of DestructionEngine, but neither PhabricatorFileTemporaryGarbageCollector nor PhabricatorFileDeleteController use the destruction engine properly (bin/remove destroy will). Swapping those to use a destruction engine should fix the ngrams stuff, maybe as part of T10828.

To use the destruction engine, replace this:

$file->delete();

...with this:

$engine = new PhabricatorDestructionEngine();
$engine->destroyObject($file);

D17704 should fix the bin/search index stuff -- it ended up easier than I thought, assuming it works.

  • removing FileFulltextEngine
  • removing migration to force file reindex
  • update File deletion logic to use DestructionEngine
This revision is now accepted and ready to land.Apr 18 2017, 12:35 AM
This revision was automatically updated to reflect the committed changes.