Page MenuHomePhabricator

Convert PhabricatorSearchDocumentQuery to cursor based paging
Closed, InvalidPublic1000 Quest Points

Description

I'm trying to understand why global search uses offset-based paging and I can't see a good reason that it is currently implemented that way. It's terribly inefficient when the fulltext query returns a lot of matches. For example, see https://phabricator.wikimedia.org/T159575#3072387

I'm going to try to figure out how to convert to cursor paging, however, I'd love some input from @epriestley about why it's implemented with offset paging. Is it just because it was done that way originally and hasn't been converted, or is there some fundamental limitation that I am not understanding?

Event Timeline

The results are ordered by "match strength", but this is computed at runtime and isn't unique to each result: we can't start with the 100th result and use it to build any sort of clause like WHERE rank > 12345 that starts on the 101st result.

However, I think the issue in the downstream task is not related to pagination, it's because we have to do the JOIN in the application.

If the user issues a query in Maniphest like this:

  • Priorities: High
  • Contains Words: Wikimedia

...neither the Maniphest index nor the fulltext index can satisfy the entire query. That is, ElasticSeach doesn't know how to find tasks with "high" priority. Maniphest doesn't know how to find tasks which contain "Wikimedia".

We execute this query by asking the fulltext for 10,000 results containing "wikimedia", then searching within them for tasks with "high" priority.

If we ask the fulltext engine for 100 results instead, we may get back 100 results in statuses other than "high". None of these will match the other half of the constraint, so we'll tell the user "nothing matched your query". This is wrong.

We can still end up in this situation, but it's less likely with 10,000 results.

This feature was contributed in the heady days of 2012 (see T1305) when times were simpler and I was full of optimism and willing to accept features with major scalability problems upstream because they were immediately useful in some cases. The task discusses these limitations and some refinements, although application joins are inherently not scalable and none of the refinements are very good.

To fix this properly, either ElasticSearch needs to be able to drive the entire query (which is probably very difficult: I suspect queries like "show only tasks without open subtasks grouped by assigned" will be challenging to execute with ElasticSearch) or MySQL needs to be able to drive the entire query (which seems likely to require building a fulltext engine from primitives, given that FULLTEXT is broken in your environment).

Or we could just remove the feature, and require fulltext search to use the limited set of constraints in global search which the fulltext engine can entirely support on its own.

From the downstream task, I'm not sure what's actually slow. If ElasticSearch is satisfying this query quickly, I wouldn't expect Phabricator to take very long once the query returns: we basically just pull the PHIDs out and pump them to AND phid IN (...), which should not take 1m45s.

To start with here, let's identify exactly what's slow? It's not clear to me from the downstream task unless I'm just misreading, and not evident to me from reasoning about the behavior of this query. This query strategy is fundamentally not scalable, but I would not expect it to be slow unless the ElasticSearch query is slow.

Here on secure, running a similar query with the term "phabricator" takes about 300ms to do the fulltext part and 12ms to do the application join part. If we assume that the fulltext part is fast against ElasticSearch in your environment and also takes about 300ms, that leaves us with the application join executing slightly under 9,000x slower in your environment. Possible, but I suspect there's more to this story.

(That is, the expected failure mode of this query when it fails to scale is "silently return an incomplete result set", not "be really slow".)

I'll investigate further into the slowness (I wasn't involved in the debugging so far so I don't know more than what's said in the task)

I found a problem, however, that isn't really related to the downstream report.

When I started looking at pagination in the elasticsearch fulltext engine, I noticed that phabricator requests 101 results for page one, then 201 results for page 2 (then just skips the first 100 in php land) and so on...

Since the sort order is determined by elasticsearch, it seems we should be able to do paging without fetching all the previous pages from elasticsearch. I haven't figured out how to bypass this behavior and I feel like I must be missing something.

I believe that's unavoidable with offset-based paging (the solution is to switch to cursor-based paging, but I think we can't do that for fulltext).

Basically, the core issue is that the second page doesn't start after the 100th result, it starts after the 100th visible result. This might be the 815th raw result if many results are not visible to the viewer.

ElasticSearch can't apply policies, so it can't ask it return 100 visible results. We ask it for the first 101 results, making the optimistic assumption that they'll usually all be visible. (We ask for 101, instead of just 100, so we can tell if there's a "next page" or not.)

After we get the results back, we filter them. If they're all valid and the viewer can see them all, we're good and can just show the page.

However, suppose some are not visible to the viewer. Maybe they are invalid (for example, they got deleted and the index just isn't up to date yet) or maybe the viewer can't see them (they're private and not visible to the viewer). So of those 101 results, perhaps we only end up with 83 usable results after policy filtering. In this case, we ask for another page of results (currently, I believe the algorithm asks for exactly 18 additional results: the 17 missing results for a complete display page, plus one to control whether we show that there's a "next page").

Let's say we get lucky this time and they're all visible. So we have all the results we need in order to display the page, plus one more that tells us we should draw a "Next Page" button.

But where do we link the "Next Page" button to?

  • If we link it to ?offset=100, we get the wrong result: those 17 extra results we filled in will be the first 17 results on page 2.
  • If we link it to ?offset=117, we'll get the right result, but we disclose that there are 17 more objects the viewer can't see which contain the text they searched for. This is a substantial information leak and attackers can use it to discover meaningful information about objects they can not view by issuing more queries.
  • If we link it to ?after=<id of last visible result> we solve both these problems, but this is cursor-based paging and requires results be orderable in the query.

Related topics:

Why not ask for 111 results instead of 101? We could, on average, reduce the number of re-fetches to fill in missing results by asking for a little more stuff than we actually need. If we asked for 10 extra results, we wouldn't need to refetch if fewer than 10 results got filtered out. The optimal number across all installs is probably larger than 101, although maybe not much larger (I'd guess it might even be 102 or 103). My guess is that it's "usually 101" and "occasionally a bit bigger than 101". We currently just don't have any data to guide this choice or any evidence that this is an issue in practice. Changing this behavior is ~5 lines of code when we eventually get data to guide the decision.

What if all or almost all results are not visible? The query "overheats". See discussion in T11773.

Can't you perform a timing attack and disclose the information anyway? Probably -- we unavoidably do different amounts of work when results are filtered -- but I believe it's a lot harder than getting back a hard ?offset=117 answer. I suspect this is at the very edge of feasibility today, and we can reasonably mitigate it with query limits, overfetching to smooth timing differences, and maybe specific countermeasures like random timing jitter to push attackers into the query limit faster.


I'm also not certain that we can't use cursor-based paging in ElasticSearch -- if the "rank" is a stable value from query to query and you can order results by "rank desc, phid" or similar that's good enough for cursor paging. Or if there's some way to say "start results at the result with phid = X" -- there isn't, as far as I know, in MySQL. But we can't use "Start results at the result with offset 117" because of the information disclosure that represents.

This is only an issue when clicking "Next Page..." on query results, which presumably users should rarely/never need to do -- if you get 100 results back and the thing you're searching for isn't in the first 100 results, it's probably usually faster to refine your query than look through page 2? Google only shows 10 results per page, and how often do you find what you're looking for on page 11? This isn't a perfect analogy but maybe provides some kind of framing.

Particularly, see T8285: "Next" is currently completely broken in some cases and almost no one has noticed.

I agree that results should almost always be on the first page and ideally near the top of the first page. I'm also tempted to say that 10 results per page might be closer to the right number than 100.

It turns out that the downstream issue appears to be that we are somehow accidentally querying mysql (in addition to elasticsearch) and the mysql query is taking a really long time to execute.