Page MenuHomePhabricator

Advanced Query's "Contains Text" seems to expect entire words instead of substrings?
Closed, ResolvedPublic

Description

  1. Go to https://secure.phabricator.com/maniphest/query/advanced/
  2. Set "In Any Project" to "Maniphest"
  3. Set "Contains Text" to "Duplicat"
  4. Execute query

ACTUAL RESULT:
No results found for this query.

EXPECTED RESULT:
12 results when doing the same query for "Duplicate" instead, and "Duplicat" is a substring of "Duplicate", so it "contains that text".

Event Timeline

aklapper raised the priority of this task from to Needs Triage.
aklapper updated the task description. (Show Details)
aklapper added a project: Maniphest.
aklapper added a subscriber: aklapper.

Yes, this uses full text search (either MySQL FULLTEXT or ElasticSearch if configured), not substring search. If "contains text" is confusing, what phrasing would you expect to indicate full-text search?

Anything that supports stemming would help. I need to find "delete" and "deleting"/"deletion" when searching for "delet"; or "duplicate" and "duplication" for "duplicat", because users use different forms of words in ticket trackers.

Refering to current naming, "contains words" feels more appropriate.

Ah, okay. It sounds like the root problem here is that you're interested in stemming, which is a search engine feature. Stemming strips word suffixes (like "-ed", "-ing", "-s", etc.) before putting them in the index, essentially ignoring them when searching. The default MySQL MyISAM FULLTEXT engine does not support stemming, but the alternate ElasticSearch engine does. With stemming, this would just work like you want (searching for "deletion" would find "delete", "deleting", etc., without requiring you to manually remove suffixes from the words).

Generally, the MyISAM engine is pretty limited, and is missing a lot of features like this. T2632: MyISAM fulltext does not support non-latin languages and we don't warn you about it is another example of a missing feature. The advantage of the engine is that it's very easy to set up, since you don't need to run any more software -- with ElasticSearch, you have to install and configure it in addition to everything else.

We could probably implement support for some of these features ourselves (stemming English isn't too difficult) but particularly outside of English we'll rapidly be out of our depth (I have no idea how to tokenize or stem Japanese or Arabic, for example, and no ability to evaluate whether an implementation is correct or not). In these cases, we should provide better support for administrators to help them evaluate the issue and make an informed decision between easier setup or better search.

Thanks for elaborating! So let's keep this ticket refering to the wording of the search option; and I'll investigate making our instance use ElasticSearch.