Page MenuHomePhabricator

Typeahead project proposals in Maniphest advanced search do not always include exact matches
Open, NormalPublic

Description

Followup on T6102: Improve typeahead behavior around exact matches / D11305 which seems to not be fully fixed.
(Not sure if months later you prefer to reopen old tasks or prefer new ones; feel free to merge into T6102.)

Steps:

  1. See a project named Discovery (primary name of project; had been renamed at some point) exists in WM Phabricator.
  2. Go to advanced search in Maniphest: https://phabricator.wikimedia.org/maniphest/query/advanced/
  3. Enter the string "discovery" (an exact match) in the "Projects" field

Actual outcome:
See the five proposals do not include the "Discovery" project:

  • discovery-system
  • In Any: discovery-system
  • Not In: discovery-system
  • Discovery-Cirrus-Sprint
  • In Any: Discovery-Cirrus-Sprint

Expected outcome:
"Discovery" should be among the five proposals, preferably as the first match.

Event Timeline

aklapper raised the priority of this task from to Needs Triage.
aklapper updated the task description. (Show Details)
aklapper added projects: Search, Infrastructure.
aklapper added a subscriber: aklapper.

D15981 doesn't, strictly speaking, prioritize exact matches. However, it should fix the underlying issue.

Here's a clue about the behavior:

https://phabricator.wikimedia.org/typeahead/class/?class=PhabricatorProjectDatasource&q=discovery

Here's what it shows, currently:

Note that "Discovery" has the internal name:

search-team search-and-discovery search-and-discovery-department sad-maps discovery Discovery

...likely because it was called "Search Team" when it was first created. That's making it sort as though it was called "search-team" instead of "Discovery".

D15981 fixes it so it will have an internal name with the display name first, so it will have the new name:

Discovery search-team search-and-discovery search-and-discovery-department sad-maps discovery

This should sort as you expect.

Until that patch lands and deploys, you may be able to fix this (and other specific cases) by doing this:

  • Edit the project.
  • Remove all "Additional hashtags".
  • Save project.
  • Edit the project again.
  • Put all the hashtags back.
  • Save the project again.

In theory, that will bump "discovery" to the head of the list and give you better behavior until the real fix arrives. Not 100% sure that will actually work, though.

T6102 and the downstream also contain discussion of a second case which does center around exact matches -- searching for "Wikidata" -- but I can't reproduce it at HEAD:

There is probably some degenerate result set where you have projects named AAA-zebra-1 through AAA-zebra-99 and then one project named zebra, and searching for zebra does not show that result first, but I suspect that no actual data in the wild hits this case.

I going to pause this since I think D15981 fixes all reproducible issues. If issues remain after D15981 deploys, let me know how to reproduce them -- fixing this stuff tends to be very data-dependent so I have a much better shot at it if I can observe the behavior directly.

(I marked D15981 as fixing this but I'm going to leave it open for confirmation since I'm not completely sure there weren't more cases that I missed.)

Here's at least one specific reproduction case which still isn't working well:

Here's what happens when adding wikidata project tag using the comment form typeahead:

And when clicking the browse button to see more results:

Wikidata isn't even in the first full 'page' of results in the browse dialog.

It's expected that the ordering in "Browse" is alphabetical/raw, and I don't plan to change that.

I think wikidata worked properly for me before mostly by accident.

D16094 should improve things, let me know if you identify specific problems after deploying that. It's deployed on this server now, although we have fewer similarly-named projects so it's harder to see the effects.

D16094 should improve things, let me know if you identify specific problems after deploying that. It's deployed on this server now, although we have fewer similarly-named projects so it's harder to see the effects.

Deployed now in downstream. Definitely an improvement, thanks!

Entering "Wikidata" in the "Tags" field on https://phabricator.wikimedia.org/maniphest/query/advanced/ still does not show that project. See https://phabricator.wikimedia.org/T76732#2384782 for a list of downstream tests.

Thanks for testing, that's quite helpful.

It looks like the remaining "wikidata" issue is because there are more than 100 results which are alphabetically earlier than "wikidata" (many of which are logical/function results) so the result set is getting cut off on the server side before the client gets a chance to promote the result to the top:

https://phabricator.wikimedia.org/typeahead/class/?class=PhabricatorProjectLogicalDatasource&q=wikidata

This currently shows a lot of results, ending somewhere in the not(...) function before actually getting to the "wikidata" tag:

I'll look into improving this.

I have some possible approaches for cheating on the "more than 100 results" case, but I think we ultimately need to put the prefix paging rule on the server side, so we return all prefix matches and then all content matches. That's kind of a big mess, but all the ways we can cheat will break down at some point (e.g., when you have 100 "wikidata" projects that are alphabetically earlier than "wikidata") so I think it's not avoidable.

I believe this is now resolved, in the sense that I expect all specific, reproducible examples I'm aware of function in a way consistent with user expectation. In particular:

  • D16826 fixed an issue with case-sensitive sorting on the client, described in T10380 ("peter" vs "Peter") and downstream in https://phabricator.wikimedia.org/T124653 ("joe" vs "Joe").
    • Users "Joe" and "Peter" should now be the first match for queries "joe" and "peter".
  • D16838 fixed an issue where the desired match (like "wikidata") does not appear in the first page of results from the server, described here in T8510#180681.
    • Project "Wikidata" should now be the first match for query "wikidata", even if there are hundreds of other similarly named project and function results available in the result set.
    • This is likely the cause of poor results for "mediawiki", for the same reasons, and I expect they are also now resolved.

After updating, please let us know if you run into additional cases where the results are poor.

epriestley triaged this task as Normal priority.
epriestley added a project: Typeahead.

I caught another case of this: here, "Phacility" should be first, but is not.

Here's the backend data:

Problems with this:

  • The server-side sort is not case-insensitive. This isn't inherently problematic, but should be corrected for consistency. This could create some problems with large, paginated queries with many similar results.
  • The prefix sort is considering internal token components which are not parts of the display name. It should not.
    • We could either define a separate matchable display name, or separate the components of the matchable name with a delimiter character that will sort ahead of spaces (like tab or newline).

phabricator.wikimedia.org is running a version including these changes plus D16886 on top. So far it looks like a great improvement!

Testing in the Tags field of https://phabricator.wikimedia.org/maniphest/query/advanced/ , some minor things (only regarding "Not In:" and "In Any:") surprised me. Just sharing them here FYI:

  • Entering Labs not displays Not In: Labs as 4th item, however entering labs not in only displays Not In: Labs-Infrastructure but not Not In: Labs
  • While entering Wikidata now does show the Wikidata project as the first result (yay!), I have not been successful in getting Not in: Wikidata offered somehow. Entering wikidata in I only get three options related to the MediaWiki-extensions-WikibaseMediaInfo project.

The most-likely-to-work secret magic for "Not in: Wikidata" is not(wikidata):

I think that should work for "Labs", too:

This is sort of vaaaaaaguely hinted at if you read between the lines on the help page (click the magnifying glass, then "Reference: Advanced Functions") but we can probably improve the behavior.

Ah, thanks! I've asked downstream if anyone still runs into any issues, but so far I also believe this is now resolved.

In https://phabricator.wikimedia.org/T76732#2803813 ksmith brings up that

In the search bar in the toolbar at the top of the screen, searching for "Maps" brings up Maps as the third option. Searching for "Discovery" brings up Discovery as the second option.
When editing a task and entering tags, autocomplete does bring up Maps and Discovery as the first items.

but that could be a separate (unprioritized) task I guess.

In https://phabricator.wikimedia.org/T76732#2803813 ksmith brings up that

In the search bar in the toolbar at the top of the screen, searching for "Maps" brings up Maps as the third option. Searching for "Discovery" brings up Discovery as the second option.
When editing a task and entering tags, autocomplete does bring up Maps and Discovery as the first items.

but that could be a separate (unprioritized) task I guess.

Do you want that described situation filed as a separate task, or keep this one opened for it?

This one is fine, I'm still going to fiddle with "in" too.

This comment was removed by ruslan_a.

Specifically, this task is about how the javascript in typeaheads rank and match choices.

In https://phabricator.wikimedia.org/T76732#2803813 ksmith brings up that "In the search bar in the toolbar at the top of the screen, searching for "Maps" brings up Maps as the third option. Searching for "Discovery" brings up Discovery as the second option."

For the records, I cannot reproduce that behavior anymore; I am currently not aware of any other examples of this issue in Wikimedia's Phab instance.

So from my POV this ticket could get resolved.