Allow querying for files by name
Needs RevisionPublic
Actions

Authored by joshuaspence on Nov 5 2015, 9:49 AM.

Details

Reviewers

epriestley

Group Reviewers

Blessed Reviewers

Maniphest Tasks

T8788: Allow querying for files by name

Summary

Ref T8788. Allow querying for files by name. This currently only performs exact string matching. In the future, it would be nice to support partial string matching.

Test Plan

Search for a bunch of files by name.

Diff Detail

Repository

rP Phabricator

Branch

master

Lint

Lint Passed

Unit

Tests Passed

Build Status

Buildable 8647
Build 10015: Run Core Tests
Build 10014: arc lint + arc unit

Event Timeline

joshuaspence updated this revision to Diff 34816.Nov 5 2015, 9:49 AM

joshuaspence retitled this revision from to Allow querying for files by name.

joshuaspence updated this object.

joshuaspence edited the test plan for this revision. (Show Details)

joshuaspence added a reviewer: epriestley.

joshuaspence added a task: T8788: Allow querying for files by name.

Herald added a reviewer: Blessed Reviewers. · View Herald TranscriptNov 5 2015, 9:49 AM

Herald added a subscriber: Korvin. · View Herald Transcript

I don't want to bring this upstream because it doesn't scale and the utility seems very marginal to me.

This install has a small amount of data (~1M files) but %...% queries take ~150-200ms to execute. I'm not immediately sure what the best strategy for providing these kinds of queries is, but I strongly suspect that just issuing LIKE against string columns in main tables isn't it. Some possibilities include:

Maybe we can reduce the cost of the scan by pulling the data into a separate <id, string> table with just the data we want to LIKE?
Build an index in MySQL (token/digraph/trigraph?).
Build an index in the SearchEngine (but JOINs are hard?).
Build some other sort of dedicated index.

I'd like to assess approaches, then implement support for a standard approach here before proceeding (e.g., a way to tag columns for submission to a separate LIKE index on DAO objects).

This implementation is also problematic: if a user searches for % or _, the character will be interpreted literally. See this recent post on the GitHub engineering blog:

http://githubengineering.com/like-injection/

Use the %~ (LIKE substring), %> (LIKE prefix) and %< (LIKE suffix) conversions to safely escape a LIKE clause in qsprintf(), not %s.

src/applications/files/query/PhabricatorFileSearchEngine.php
54	Loose check prevents searching for "0".

This revision now requires changes to proceed.Nov 5 2015, 3:05 PM

epriestley mentioned this in D14006: Allow searching for files by name.Nov 5 2015, 5:53 PM

I'd be happy to settle for exact matching if that's more likely to be accepted?

Fix LIKE escaping

Exact name matching only

joshuaspence updated this object.Nov 6 2015, 2:44 AM

joshuaspence mentioned this in T8788: Allow querying for files by name.

epriestley mentioned this in T9964: Fill in Conduit support infrastructure for ApplicationEditor changes.Dec 14 2015, 11:04 AM

epriestley mentioned this in T9979: Build support for ngram indexes for substring searches (e.g., file, paste, package, task titles).Dec 14 2015, 11:50 AM

This can be built with proper substring matching using Ngrams now (see D14846 for the implementation) but I'd like to give it some time to settle first because it requires reindexing all files and installs may get grumpy if they have to do that multiple times.

T9979 has more discussion. There are People and Projects implementations of Ngrams forthcoming to swap typeahead stuff over to it, and those might provide better examples (and work out some of the kinks).

This revision now requires changes to proceed.Dec 22 2015, 4:14 PM

epriestley mentioned this in Blog Post: Development Notes (2015 Week 52).Dec 26 2015, 2:27 PM