Page MenuHomePhabricator

Allow querying for files by name
Needs RevisionPublic

Authored by joshuaspence on Nov 5 2015, 9:49 AM.
Tags
None
Referenced Files
F14317195: D14411.diff
Wed, Dec 18, 7:53 AM
Unknown Object (File)
Fri, Dec 13, 2:51 AM
Unknown Object (File)
Tue, Dec 10, 4:16 AM
Unknown Object (File)
Fri, Dec 6, 11:17 PM
Unknown Object (File)
Wed, Dec 4, 5:16 AM
Unknown Object (File)
Thu, Nov 28, 12:39 AM
Unknown Object (File)
Wed, Nov 27, 7:26 AM
Unknown Object (File)
Tue, Nov 26, 6:53 PM
Subscribers

Details

Reviewers
epriestley
Group Reviewers
Blessed Reviewers
Maniphest Tasks
T8788: Allow querying for files by name
Summary

Ref T8788. Allow querying for files by name. This currently only performs exact string matching. In the future, it would be nice to support partial string matching.

Test Plan

Search for a bunch of files by name.

Diff Detail

Repository
rP Phabricator
Branch
master
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 8647
Build 10015: Run Core Tests
Build 10014: arc lint + arc unit

Event Timeline

joshuaspence retitled this revision from to Allow querying for files by name.
joshuaspence updated this object.
joshuaspence edited the test plan for this revision. (Show Details)
joshuaspence added a reviewer: epriestley.
epriestley edited edge metadata.

I don't want to bring this upstream because it doesn't scale and the utility seems very marginal to me.

This install has a small amount of data (~1M files) but %...% queries take ~150-200ms to execute. I'm not immediately sure what the best strategy for providing these kinds of queries is, but I strongly suspect that just issuing LIKE against string columns in main tables isn't it. Some possibilities include:

  • Maybe we can reduce the cost of the scan by pulling the data into a separate <id, string> table with just the data we want to LIKE?
  • Build an index in MySQL (token/digraph/trigraph?).
  • Build an index in the SearchEngine (but JOINs are hard?).
  • Build some other sort of dedicated index.

I'd like to assess approaches, then implement support for a standard approach here before proceeding (e.g., a way to tag columns for submission to a separate LIKE index on DAO objects).


This implementation is also problematic: if a user searches for % or _, the character will be interpreted literally. See this recent post on the GitHub engineering blog:

http://githubengineering.com/like-injection/

Use the %~ (LIKE substring), %> (LIKE prefix) and %< (LIKE suffix) conversions to safely escape a LIKE clause in qsprintf(), not %s.

src/applications/files/query/PhabricatorFileSearchEngine.php
54

Loose check prevents searching for "0".

This revision now requires changes to proceed.Nov 5 2015, 3:05 PM

I'd be happy to settle for exact matching if that's more likely to be accepted?

joshuaspence edited edge metadata.
joshuaspence marked an inline comment as done.

Fix LIKE escaping

joshuaspence edited edge metadata.

Exact name matching only

epriestley edited edge metadata.

This can be built with proper substring matching using Ngrams now (see D14846 for the implementation) but I'd like to give it some time to settle first because it requires reindexing all files and installs may get grumpy if they have to do that multiple times.

T9979 has more discussion. There are People and Projects implementations of Ngrams forthcoming to swap typeahead stuff over to it, and those might provide better examples (and work out some of the kinks).

This revision now requires changes to proceed.Dec 22 2015, 4:14 PM