Page MenuHomePhabricator

Search on secure.phabricator.com doesn't find MySQL stopwords
Closed, ResolvedPublic

Description

I've searched for "seen" in All Documents and Open and Closed Documents and got "No search results." There's D3620 (abandoned) which has this word in the title.

Search for "differential" returns lots of results.

Event Timeline

vrana added a project: Unknown Object (Project).Feb 26 2013, 6:50 PM
vrana added a subscriber: vrana.

I think we haven't disabled the stopword list. At some point this required recompiling MySQL, I think, but maybe it's easier now.

http://dev.mysql.com/doc/refman/5.5/en//fulltext-stopwords.html

Ah, good catch. This one didn't require recompiling, it was the threshold for ignoring words existing in more than 50%. This one requires just repairing table.

vrana renamed this task from Search on secure.phabricator.com doesn't return all results to Search on secure.phabricator.com doesn't find MySQL stopwords.Feb 26 2013, 11:01 PM

How do we repair the table? Is this just for this install or is it a general issue? (Just going through 'needs triage' tasks...)

epriestley triaged this task as Wishlist priority.May 23 2013, 6:26 PM

Repairing is REPAIR TABLE x, but we already use IN BOOLEAN MODE which supposedly disables the 50% threshold:

it would be better to search using IN BOOLEAN MODE instead, which does not observe the 50% threshold.

Assuming this is a stopword file probelm, it looks like we can set ft_stopword_file and then repair and see if that fixes it. We probably do want a stopword list still (e.g., "the", "a", "an", etc.), just a less exhaustive one.

epriestley changed the visibility from "All Users" to "Public (No Login Required)".Feb 10 2014, 2:57 AM

Likely, the course of action here is something like:

  • Produce a less restrictive stopword file. Start with http://dev.mysql.com/doc/refman/5.5/en/fulltext-stopwords.html or similar, then delete all the words which users might reasonably want to search for, which is probably most of them. Alternatively, find a list of common English words and just use that. I'd expect us to end up with a short list of only the most common words ("the", "a", "an", "of", "in", etc).
  • Put that in resources/ in some format that ft_stopword_file accepts.
  • Add a setup check which detects that the stopword file is still set to the default and the FULLTEXT engine is in use. Suggest the user configure the ElasticSearch engine or adjust the stopword file and repair the table.

T4130 is closely related.

On my (relatively old) Phabricator install, this issue-check caused Phabricator to fail with unknown system variable 'ft_stopword_file' after upgrading phabricator.
I had to hunt down the specific file and comment out the check for this and for min_word_len which results in a similar error.

Might be an old mysqld version? I'm using 5.0.95.
Anyway, just be aware that this potentially breaks older instances.

EDIT: fixed by upgrading to mysql 5.6.

You should be able to ignore setup issues in the UI -- there's an "ignore" action to the far right on the list view. We could maybe make that more prominent.

Did you add the options to the [mysqld] section of your config (versus some other section, maybe)? It's odd to me that MySQL would return the values when queried, but not respect them in configuration. The 5.0 documentation also says these exist...

http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html

To clarify: MySQL (5.0.95) does not return the value when queried, instead it gives an error, which causes the setup warning checker to crash and bring down the entire request with it. Phabricator's homepage fails with an error message. I can't ignore the setup warning because it isn't even shown (there's no UI, only the error message on blank page). This happens both before and after adding the option to the [mysqld] section of my.cnf.

So at the minimum this should be wrapped with a try-catch block.

Aaaaaaaaah! Okay, that makes sense. I'll adjust these.

It seems ft_stopword_file can not be set on AWS RDS, I have to ignore the setup issue.