- User Since
- Nov 28 2011, 9:35 AM (286 w, 1 d)
FWIW I partially implemented this in Wikimedia's fork, and I did so in a reusable way. I'd like to eventually upstream it but I'm not sure that my approach is desired upstream. I'll give it another shot though.
Yeah I think we are good now.
Mon, May 8
phabricator trading cards ftw.
I think that @epriestley just made capitalism and socialism obsolete by re-imagining everything in terms of Mana points. You sir, have won the internet.
This should clearly be priority: Unbreak now!
Tue, May 2
@epriestley: I'll submit a new revision that expands on the resultset class as I mentioned above. Of course this is without any expectations regarding when you'll have time to look at it :)
Wed, Apr 26
Changing the type is going to run into the issue of what to do about the fields which differ between the two types. Fields which are present in the old type will continue to be displayed unless you clear them when changing types.
Mon, Apr 24
@epriestley: Given that this is an elastic-only feature and not really wanted upstream, and given that you have implemented result sets as objects in rP3245e74f16bb: Show users how fulltext search queries are parsed and executed; don't query…, I think I should abandon this and submit a new revision with just the PhabricatorSearchResultEngineExtension infrastructure.
Apr 8 2017
Apr 7 2017
Apr 5 2017
Warning: This code is still a bit messy, I will clean things up a bit more in a future diff.
Significant refactor to use a PhabricatorFulltextResult object to represent search
result hits. Results views are made extensible using the EngineExtension pattern:
Apr 4 2017
Just to see what would happen, I tried returning 100 dummy results + the real results. That didn't seem to satisfy the pager, so perhaps I'm wrong about what's actually happening. I spent quite a bit of time tracing through PhabricatorPolicyAwareQuery and still didn't ascertain exactly what is going on, however, it seems like it's skipping the results in PhabricatorPolicyAwareQuery::execute() right around line 211
The problem is that we do pagination at a higher layer and the cursor skips 100 results when the offset is set to 100. So if we also apply an offset to elastic, then we skip 100 results in two places, thus skipping to the 200th result.
track upstream branch so arcanist can push to staging.
Apr 3 2017
I implemented this recently but then pulled it out because ... project slug lookup is a gigantic mess right now.
@dreadlord2203 indeed, hiding those tags would be useful. I found one problem with using a separate repository for change handoff, and that is access control. If you maintain several repositories with different access controls, then you would have to duplicate each of them and maintain a second set of access controls for the corresponding staging repositories. This is a lot of extra work and it would be easy to make a mistake and leak the staging repo to people who shouldn't be able to see it.
Apr 2 2017
Maybe let bin/search commands target a specific service.
It would be nice to build a practical test suite instead, where we put specific documents into the index and then search for them.
It's definitely not obvious that objects are skipped based on the object version rather than what's in the index. So yeah this is helpful.
Good idea. Definitely much easier than storing per-index version numbers and trying to keep those in sync.
Yes this seems reasonable.
Really try to avoid suggesting anyone configure Elasticsearch ever for any reason.
nice! Big improvement over my initial document.
Mar 30 2017
So I ran into one problem when deploying the latest code to WMF production. We now throw a setup error if we have a cluster with no readable hosts. It actually makes sense to have a cluster that is write-only so that setup error is bogus.
I figured out that one component of the WMF scaling issue was caused by the very pathological case of searching for a word that appears in very many documents, exacerbated by lots simultaneous queries fired off from users repeatedly searching from the typeahead in the related tasks editor. I agree that we still don't know the exact cause of the 100x slowdown.
@cos: probably should not have a default value.
Mar 29 2017
Don't use $future->write, that doesn't work in all cases
Mar 28 2017
push to staging for harbormaster
Better formatting of setup warning messages.
this fixes the stemmer and tokenizer to do a better job of matching words.separated.by.punctuation as well as other issues found by @epriestley.
I've updated D17564: Address some New Search Configuration Errata to address the tokenization and word stemming issues.
- Fixed the stemmer. user matches users and vise-versa.
- Added a different tokenizer so that this.is.a.test tokenizes to the following:
Mar 27 2017
trying once more...
Try to make harbormaster happy by setting repository.callsign globally in ~/.arcrc
Maniphest advanced search is somewhat buried, indeed. I think one easy solution to this would be to add "Task search" to the main phab menu (using the new custom menus feature)... In fact, I think I will do that now at https://phabricator.wikimedia.org
f*a*c*t*o*r*y*s*u*r*p*l*u*s*z*z*q*q*z*z*q*q returns the same results as
f a c t o r y s u r p l u s z z q q z z q q so it appears to be treating those as individual single-letter tokens. strange.
I think it would make a lot of sense to construct the two queries separately (and in parallel) with a short timeout, then handle the timeout gracefully allowing the user to refine their query further. This would avoid the denial of service situation which happened to Wikimedia more than once due to users repeatedly executing really expensive searches until mysql fell over from the load.