Applying fulltext limits first causes missing results
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	alexmv
	Mar 23 2017, 1:31 AM

Description

D16944 changed fulltext search to conduct the fulltext limit first, capping results at 1000, before filtering for other criteria. This can lead to missing results.

Take, for example, a query for "daemon" -- I've limited it down to other criteria that I perhaps remember. It finds no results because the particular task isn't in the special 1000 found by full-text search.

Adding an additional word makes it find results. It is counter-intuitive that adding constraints would broaden the result set. The logic of "Better to return quickly and let the user refine their results" from D16944 does not make sense if the user sees no results, or at least not the result they are expecting -- no user will intuitively assume they need to further refine their query from such a results page.

Revisions and Commits

rP Phabricator
	Closed	D18484 Build a prototype fulltext engine ("Ferret") using only basic MySQL primitives

Related Objects
Search...

		Status	Assigned	Task
		Resolved	epriestley	T12974 Upgrading: "Ferret" Fulltext Engine
		Resolved	epriestley	T12443 Applying fulltext limits first causes missing results

Event Timeline

alexmv created this task.Mar 23 2017, 1:31 AM

Herald added a subscriber: eadler. · View Herald TranscriptMar 23 2017, 1:31 AM

jboning added a project: Restricted Project.Mar 23 2017, 1:33 AM

Herald added a subscriber: jhurwitz. · View Herald TranscriptMar 23 2017, 1:33 AM

There's some additional stuff in D17384, where Elastic has an internal hard limit of 10K results.

See also T12353 for significant discussion.

I don't think we can reasonably "fix" this in the general case, but we should, at a minimum, provide more specific guidance to users about what's happening so they can reasonably figure out how to move forward.

In this particular case, the result set for all constraints other than fulltext is only 6 results, and we could imagine doing that half of the query first and then passing a "and only look at these documents" constraint to the search engine.

But it will always be possible to construct a non-fulltext constraint which matches a million documents and a fulltext constraint which matches a million different documents.

T12003 is also related to this (providing more explanatory, contextual help for users executing searches which may not be doing what they want for technical reasons).

20after4 added a subscriber: 20after4.Mar 23 2017, 3:11 AM

epriestley moved this task from Backlog to v2 on the Search board.Mar 26 2017, 12:32 PM

epriestley mentioned this in T12450: New Search Configuration Errata.Mar 26 2017, 12:44 PM

I think it would make a lot of sense to construct the two queries separately (and in parallel) with a short timeout, then handle the timeout gracefully allowing the user to refine their query further. This would avoid the denial of service situation which happened to Wikimedia more than once due to users repeatedly executing really expensive searches until mysql fell over from the load.

It's also possible to pass the constraints on to elasticsearch so that it can handle all of the filters, not just the fulltext part. That is, however, quite a bit more complex, requires indexing more fields, and it's doubly complex to support that on top of supporting mysql constraints-based search filters.

In this specific situation it seems like it would make sense to automatically repeat the search without the fulltext portion and give the 6 results, along with a warning that the fulltext portion wasn't being applied. I don't know if that's practical to implement though.

20after4 mentioned this in D17564: Address some New Search Configuration Errata.Mar 27 2017, 2:41 PM

20after4 mentioned this in rP699228c73b74: Address some New Search Configuration Errata.Mar 28 2017, 8:19 PM

epriestley mentioned this in T12525: amckinley's Onboarding.Apr 9 2017, 1:46 PM

joshuaspence added a subscriber: joshuaspence.May 3 2017, 12:48 PM

freeman-endlessm added a subscriber: freeman-endlessm.Jun 7 2017, 10:42 PM

epriestley mentioned this in T12819: InnoDB FULLTEXT appears to fail catastrophically once it reaches a moderate size.Jun 10 2017, 12:23 AM

epriestley added a revision: D18484: Build a prototype fulltext engine ("Ferret") using only basic MySQL primitives.Aug 28 2017, 9:35 PM

epriestley mentioned this in D18484: Build a prototype fulltext engine ("Ferret") using only basic MySQL primitives.Aug 28 2017, 9:52 PM

epriestley mentioned this in rPf97157e7edb1: Build a prototype fulltext engine ("Ferret") using only basic MySQL primitives.Aug 28 2017, 9:53 PM

epriestley mentioned this in T12974: Upgrading: "Ferret" Fulltext Engine.Sep 1 2017, 5:05 PM

epriestley added a parent task: T12974: Upgrading: "Ferret" Fulltext Engine.