Page MenuHomePhabricator

Pagination breaks when a task is tagged with two subprojects that have a mutual ancestor
Closed, ResolvedPublic


According to, it's okay for a task to be tagged with multiple subprojects that have a common ancestor project. When you do that and then query for tasks tagged with the ancestor (either from maniphest advanced search or via conduit) any result page which contains the multi-subproject'd task will not have a "Next" button or an "after" value for the pagination cursor even if there are more results that should be shown.

Repro Steps

  1. Create a project (we'll call it A)
  2. Create two subprojects of A (B and C)
  3. Create several tasks each tagged with one of the above projects
  4. Create a task that is tagged with B and C (we'll call this Problem Task)
  5. Edit one of the tasks you created in step 3 so it has a more recent update time than Problem Task
  6. Go to /maniphest/query/advanced/
  7. Query for projects tagged with A, Group By None, Order By Date Updated (Latest First), Limit 2.

At this point you'd expect to see a "Next" button at the bottom of the page, but there is not. If you increase the Limit to 10, you'll see that it returns more results than 2. Any page of results that contains Problem Task won't have a "Next" button on it (ie. if you set Limit to 1 and page through to Problem Task). The same happens when you perform the same query via /conduit/method/; there is no "after" value in the cursor even though there should be more results.

Version Info

phabricator c4392ba067575c5c5a6523df25dd5b341bcc1be4 (May 24 2017)
arcanist 129d51fa0936c9bae48fadf3a3f39e26d69d24da (May 18 2017)
phutil a900d7b63e954e221efe140f0f33d3d701524aae (Sun, Apr 23)

Event Timeline

I couldn't find a project with multiple subprojects on this install otherwise I would have linked you to an example

epriestley added a subscriber: epriestley.

Thanks! This reproduces for me.

haha nice. I just posted this comment on our downstream task tracking this.

For context, we issue a query that looks like this:

SELECT * FROM task JOIN edge ...

If we know that each task has zero or one corresponding rows in the edge table, we do not need a GROUP BY clause.

However, if some tasks have two or more corresponding rows in the edge table, we might get results back like this:

| T123 | Foo | Edge AAA |
| T124 | Bar | Edge BBB |
| T124 | Bar | Edge CCC |

Here, T124 relates to both edges BBB and CCC, so we get two rows back.

When this may happen we add a GROUP BY task.phid clause to the query.

Normally, queries like this don't require a GROUP BY:

SELECT * FROM task JOIN edge ON t.phid = e.src AND e.type = X AND e.dst = Y

But queries like this do:

SELECT * FROM task JOIN edge ON t.phid = e.src AND e.type = X AND e.dst IN (Y, Z)

We incorrectly considered queries using the "Ancestor Operator" to be part of the first group; they are part of the second group. Since they were categorized wrong, we didn't include GROUP BY and got duplicate tasks back.

Later, this cascaded into confusion at the application level after MySQL said "there are 5 results" but the Query only returned 4 unique results.

We could do things like:

  • Always add GROUP BY (but: this might be bad luck).
  • Throw if raw results and unique results differ (but: I think this isn't always an error, strictly, and there's no formal/general method for identifying unique results, and no formal/general place where they get reduced to only unique results).

So D18012 is not the most general fix we might imagine, but the more categorical design errors here rarely have much impact and I think everyone's probably happier here with a one-line-fix than "rebuild all queries in all applications to understand and detect result uniqueness errors".

And thanks for doing the legwork to pull a reproduction case out of this.