- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Mar 28 2017
In D17564#210986, @epriestley wrote:Just as a general workflow suggestion, I'd encourage you to do this as a bunch of small changes instead of one big "fix everything" change
Seems legit.
push to staging for harbormaster
fix '\n'
Better formatting of setup warning messages.
this fixes the stemmer and tokenizer to do a better job of matching words.separated.by.punctuation as well as other issues found by @epriestley.
I've updated D17564: Address some New Search Configuration Errata to address the tokenization and word stemming issues.
- Fixed the stemmer. user matches users and vise-versa.
- Added a different tokenizer so that this.is.a.test tokenizes to the following:
- this.is.a.test
- this
- is
- a
- test
Mar 27 2017
trying once more...
Try to make harbormaster happy by setting repository.callsign globally in ~/.arcrc
Maniphest advanced search is somewhat buried, indeed. I think one easy solution to this would be to add "Task search" to the main phab menu (using the new custom menus feature)... In fact, I think I will do that now at https://phabricator.wikimedia.org
In T12450#216943, @epriestley wrote:
- Searching for f*a*c*t*o*r*y*s*u*r*p*l*u*s*z*z*q*q*z*z*q*q produces nonsenical results (many results, when I would expect no results: the results do not contain that sequence of letters in order).
- Searching or user fails to find task Grant users tokens when a mention is created, suggesting that stemming is not working.
- Searching for users finds that task, but fails to find a task containing "per user per month" in a comment, also suggesting that stemming is not working.
- Searching for maniphest fails to find task maniphest.query elephant, suggesting that tokenization is ElasticSearch is not as good as the MySQL tokenization for these words (see D17330).
f*a*c*t*o*r*y*s*u*r*p*l*u*s*z*z*q*q*z*z*q*q returns the same results as
f a c t o r y s u r p l u s z z q q z z q q so it appears to be treating those as individual single-letter tokens. strange.
I think it would make a lot of sense to construct the two queries separately (and in parallel) with a short timeout, then handle the timeout gracefully allowing the user to refine their query further. This would avoid the denial of service situation which happened to Wikimedia more than once due to users repeatedly executing really expensive searches until mysql fell over from the load.
Mar 26 2017
In T12450#216943, @epriestley wrote:I ran into a lot of confusion because the versioned object indexes are not namespaced per-service. Basically, if you insert version 95 of a document into Elastic, the indexer thinks that version 95 doesn't need to go into MySQL, even though it does. So when you run bin/search index ..., you may get only a subset of the updates you actually need. The object index versions need to change to become engine-aware so they are stored per-service, not globally, and/or the whole mechanism needs to include a hash of cluster.search or just be turned off. Until this is fixed, it can be worked around with using --force everywhere.
bin/search index might reasonably provide summary output about this ("392 documents were not indexed because they haven't changed, use --force to update them.").
In T12450#216943, @epriestley wrote:
- Searching for f*a*c*t*o*r*y*s*u*r*p*l*u*s*z*z*q*q*z*z*q*q produces nonsenical results (many results, when I would expect no results: the results do not contain that sequence of letters in order).
- Searching or user fails to find task Grant users tokens when a mention is created, suggesting that stemming is not working.
- Searching for users finds that task, but fails to find a task containing "per user per month" in a comment, also suggesting that stemming is not working.
- Searching for maniphest fails to find task maniphest.query elephant, suggesting that tokenization is ElasticSearch is not as good as the MySQL tokenization for these words (see D17330).
- Searching for users -blue returns a huge number of results: significantly more than users. Expected behavior: fewer results, omitting those results matching blue.
- Searching for users blue returns more results than users or blue. Expected behavior: fewer results, because only results which match "users" AND "blue" are returned. The result set includes completely irrelevant results.
@epriestley: Thanks for the detailed feedback... I'll get to work ;)
- Has T8602 been resolved?
I can not reproduce it on wikimedia's install.
- Write an "Upgrading: ..." guidance task with narrow instructions for installs that are upgrading.
TODO
- Do we need to add an indexing activity (T11932) for installs with ElasticSearch?
Yes, I think so
- We should more clearly detail exactly which versions of ElasticSearch are supported (for example, is ElasticSearch <2 no longer supported)? From >T9893 it seems like we may only have supported ElasticSearch <2 before, so are the two regions of support totally nonoverlapping and all ElasticSearch users will need to upgrade?
I haven't been testing with elasticsearch < 2.0 so this might break backwards compatibility. It wouldn't be difficult to fix any compatibility issues though, with a tiny bit of testing.
With the elasticsearch 'simple_query_string' query parser it only works if you use *pricot, for example, outside of quoted phrases.
Note there will finally be a little bit of documentation once this install rebuilds diviner docs: The url should be https://secure.phabricator.com/book/phabricator/article/cluster_search/ (eventually)
This should work just fine with the index mapping and query generation in rPe41c25de5050: Support multiple fulltext search clusters with 'cluster.search' config
resubmit with arc diff --config repository.callsign=P
Addressed epriestley's feedback.
try to get harbormaster to build (push to staging?)
Mar 25 2017
- actually, acutally utilize the health monitoring...
- Improved the status monitoring UI in config/cluster/search/
- Actually utilize the health monitoring cache to avoid connecting to downed servers.
Mar 23 2017
@epriestley sweet, I'll land this as soon as I see that you've merged to stable.
I can confirm that In Any: does not seem to include subprojects. I tried to make some sense of the way the project search functions work but it's pretty complicated.
@epriestley: I think this is ready to land but I want to give you one more chance to change your mind.
- Created diviner documentation: Cluster: Search
- removed stray phlog
- Fix searching relationships which I had inadvertantly broken.
- Better elasticsearch 2.x and 5.x support
- more optimized query
In T12441#216428, @epriestley wrote:If there's any good content in this feature at all, why do I never see it reposted to Reddit or Facebook or Twitter? Are Reddit and Twitter just for old people now?
Elasticsearch has much better support for non-latin language analysis. See https://www.elastic.co/guide/en/elasticsearch/guide/current/icu-tokenizer.html discusses their ability to properly tokenize Thai, Chinese and Japanese text.
Mar 22 2017
Fix method signature un-final PhabricatorElasticFulltextStorageEngine
Ok I think I've eliminated the problematic parts like indexing project slugs.
Get rid of static.
address review feedback that I hadn't gotten to yet.
Note: I'm not sure why harbormaster is failing?
- Cleaned up the elastic query and added comments describing the purpose of the clauses
- a couple of bugfixes found by further testing
Ok I've reworked this quite a bit and I may have messed up somewhere in the process.
Mar 21 2017
In T12296#216132, @epriestley wrote:While the git ls-remote change isn't really motivated as a performance improvement, it does seem to have reduced CPU usage a measurable amount (deployed on the morning of 3/18), maybe 15%:
So I've done a bit more thinking about how to implement the changes to the engine class, especially with regards to any bits that are not wanted in the upstream but are desirable for wikimedia's implementation.
In D17384#209987, @epriestley wrote:Just to make sure I haven't missed anything:
- We currently write health checks but never read them, right? So there's no effect (other than the UI "Status" changing) when a service fails health checks? That seems fine for now, I just want to make sure I didn't miss a health check read somewhere.
Mar 20 2017
In D17509#210005, @epriestley wrote:Does this require a full bin/search index for installs using Elastic? It looks like the index structure changes...
@avivey: I'm somewhat interested in this if you have any tips for getting it working locally I would like to try it out and see if I can contribute anything towards a finished extension.
Mar 16 2017
- Move the stats definitions into the engine so the status UI remains engine agnostic.
- Fix a bug where role => false was being treated like role => true in the UI
I'm pleased to report that this has been live on wikimedia's phabricator for about a week without any incidents whatsoever. Additionally, we are in the process of migrating from elasticsearch 2.x to 5.x and the ability to write to multiple clusters is really working out nicely for transition.
Mar 14 2017
@stettberger: If there is anything missing from the WMF policy API extension, please feel free to file tasks under https://phabricator.wikimedia.org/tag/wikimedia_phabricator_extensions/ or open a differential revision against https://phabricator.wikimedia.org/source/phab-extensions/. I will be glad to review patches, I can't make any promises about feature requests but reasonable changes will be considered.
Mar 10 2017
- Added index stas to status ui
- Separate mysql status from elasticsearch status and show different set of columns appropriate to each cluster type.
What I would find especially useful would be a way to create named policies and then reference them when creating objects. That might warrant a separate feature-request task though.
I don't think this is related to wikimedia since we already have the functionality live in the WMF production install.
Mar 9 2017
- Remove unused healthrecord stuff from PhabricatorSearchCluster class
- Add back getDisplayName to the PhabricatorSearchCluster class because it's needed.
Addressed latest round of feedback.