Page MenuHomePhabricator

Add Ferret support to Paste
Needs ReviewPublic

Authored by amckinley on Jul 15 2019, 7:42 PM.

Details

Reviewers
epriestley
Summary

Ref PHI1292. Enable fulltext searchs in paste. Maybe this should only index a snippet instead of the entire content?

Also updates table names in PhabricatorPasteQuery.

Test Plan

Created some pastes, indexed them, searched for them.

Diff Detail

Repository
rP Phabricator
Branch
ferret-paste (branched from master)
Lint
Lint OK
Unit
Unit Tests OK
Build Status
Buildable 23133
Build 31765: Run Core Tests
Build 31764: arc lint + arc unit

Event Timeline

amckinley created this revision.Jul 15 2019, 7:42 PM
Owners added a subscriber: Restricted Owners Package.Jul 15 2019, 7:42 PM
amckinley added inline comments.Jul 15 2019, 7:43 PM
src/applications/paste/engine/PhabricatorPasteFulltextEngine.php
16

It looks like if I use $paste->getTitle() and that returns null, the document gets indexed without errors, but it doesn't come back in the search results?

amckinley requested review of this revision.Jul 15 2019, 7:44 PM

I'm a little uneasy about indexing the actual content, since I worry this will lead to a tragic event like "we learn that many installs routinely send 1GB logfiles consisting mostly of /dev/urandom output into Paste".

Ferret seems to be performing better than MyISAM/InnoDB FULLTEXT did, but I'm not confident it will stand up to use cases like this (see also T7472). Even if we try to adopt Ferret for this "codebase search" use case in the future, I'd probably like to separate the index and/or apply special rules around discarding documents that are too large or, uh, "too random" or whatever, and have this failure be non-silent ("Some documents matching all query parameters other than your fulltext query terms are really dumb and are not indexed by the fulltext engine, so they might or might not match your query. Click here for a list." or whatever, I guess, although this isn't great).

An especially cheap way to deal with this for now would be to just not index the content. This might be a bit confusing/surprising, but even a title-only search is better than nothing, so I think it's still a step forward.

Let me look at the null title thing, since that seems like it's probably a bug somewhere, and indexing the English-language untitled document string is probably not ideal.

src/applications/paste/engine/PhabricatorPasteFulltextEngine.php
23

Slowvote copy/paste mishap.

epriestley retitled this revision from Add Ferret support to Pastebin to Add Ferret support to Paste.Jul 18 2019, 5:01 PM
epriestley edited the summary of this revision. (Show Details)

(Also, "Pastebin" is the name of a product/company and our database name really shouldn't be pastebin, it just is since it was a contributed patch a million years ago and database names are a pain to change. This is basically like having a database named phabricator_yelp or whatever, though.)

..database names are a pain to change..

Let me also see exactly how much of a pain this is, I think we did it once and this patch definitely makes it harder.

That wasn't as nearly as bad as I thought.