I'm currently trying to find a file that I manually uploaded to the Files application and it's very difficult to query for it through the UI. I resorted to querying the database directly and found it almost instantly.
Description
Revisions and Commits
Related Objects
- Mentioned In
- rP976fbee877c9: Implement ngram search for File objects
- Mentioned Here
- D17702: Implement ngram search for File objects
D15656: Make badges searchable by name
T9979: Build support for ngram indexes for substring searches (e.g., file, paste, package, task titles)
T11932: Manual Activity: Rebuild Search Index
D14411: Allow querying for files by name
Event Timeline
D14411 implements exact string matching, but I will leave this ticket open for support partial string matching as well. There are some technical considerations block this, see D14411#161139.
We ultimately built an "ngram index" for this task, see T9979 for discussion.
- Convert Files to use ngrams. D15656, for badges, is a reference.
- bin/search index --type file --force can regenerate indexes for testing.
- PhabricatorFileEditor->supportsSearch() probably needs to be changed to true so that newly created/edited files actually index.
- Files are currently unusual, in that they don't get created through ApplicationTransactions/EditEngine on all pathways, so we may need some tweaking to get the indexing tasks queued on file creation.
- For now, omit the "reindex everything" migration present in D15656 because it could take a very long time for some installs and we should be confident we've hit all the creation pathways first. Once this is stable we can either force the migration, issue upgrade guidance, or add an "activity" (T11932).
@epriestley I just landed D17702, without really tackling the problem of files being created in non-standard code paths. I searched the codebase for PhabricatorFile::initializeNewFile() and didn't find anything scary-looking; do you have any specific examples that might need updating?
If not, how would you like to handle the reindex "migration"?
To make the indexing work for new stuff, I think you can do this:
- In PhabricatorFile::buildFromFileData() and PhabricatorFile::newFileFromContentHash(), replace the calls to $file->save() near the bottom with something like $file->saveAndIndex().
- Have that method call $this->save(), then manually queue the document for indexing (below), then return $this;:
PhabricatorSearchWorker::queueDocumentForIndexing($file->getPHID());
- To test: run daemons, upload a new platypus.jpg, search for it?
- Then fix up these sites, I think: PhabricatorFileUploadSource->writeChunkedFile() and PhabricatorChunkedFileStorageEngine->allocateChunks() (only the $file->save(), not the $chunk->save(), since we don't care about indexing individual chunks).
For the migration, let me just publish "run this command if you want to search existing files" in the change log for now, and then we can do the real migration if we see confusion about it. I bet 99% of the time you want to search for something you recently uploaded so the index will be functionally useful in a week even if you miss the guidance in the changelog.