Ref PHI1292. Enable fulltext searchs in paste. Maybe this should only index a snippet instead of the entire content?
Also updates table names in PhabricatorPasteQuery.
epriestley on Jul 15 2019, 7:42 PM.Authored by
I'm a little uneasy about indexing the actual content, since I worry this will lead to a tragic event like "we learn that many installs routinely send 1GB logfiles consisting mostly of /dev/urandom output into Paste".
Ferret seems to be performing better than MyISAM/InnoDB FULLTEXT did, but I'm not confident it will stand up to use cases like this (see also T7472). Even if we try to adopt Ferret for this "codebase search" use case in the future, I'd probably like to separate the index and/or apply special rules around discarding documents that are too large or, uh, "too random" or whatever, and have this failure be non-silent ("Some documents matching all query parameters other than your fulltext query terms are really dumb and are not indexed by the fulltext engine, so they might or might not match your query. Click here for a list." or whatever, I guess, although this isn't great).
An especially cheap way to deal with this for now would be to just not index the content. This might be a bit confusing/surprising, but even a title-only search is better than nothing, so I think it's still a step forward.
Let me look at the null title thing, since that seems like it's probably a bug somewhere, and indexing the English-language untitled document string is probably not ideal.
(Also, "Pastebin" is the name of a product/company and our database name really shouldn't be pastebin, it just is since it was a contributed patch a million years ago and database names are a pain to change. This is basically like having a database named phabricator_yelp or whatever, though.)
This didn't need much: