Page MenuHomePhabricator

Address some New Search Configuration Errata
ClosedPublic

Authored by 20after4 on Mar 27 2017, 2:41 PM.

Details

Summary
  • Write an "Upgrading: ..." guidance task with narrow instructions for installs that are upgrading.
  • Do we need to add an indexing activity (T11932) for installs with ElasticSearch?
  • We should more clearly detail exactly which versions of ElasticSearch are supported (for example, is ElasticSearch <2 no longer supported)? From T9893 it seems like we may only have supported ElasticSearch <2 before, so are the two regions of support totally nonoverlapping and all ElasticSearch users will need to upgrade?
  • Documentation should provide stronger guidance toward MySQL and away from Elastic for the vast majority of installs, because we've historically seen users choosing Elastic when they aren't actually trying to solve any specific problem.
  • When users search for fulltext results in Maniphest and hit too many documents, the current behavior is approximately silent failure (see T12443). D17384 has also lowered the ceiling for ElasticSearch, although previous changes lowered it for MySQL search. We should not fail silently, and ideally should build toward T12003.
  • D17384 added a new "keywords" field, but MySQL does not search it (I think?). The behavior should be as consistent across MySQL and Elastic as we can make it. Likely cleaner is giving "Project" objects a body, with "slugs" and "description" separated by newlines?
  • PhabricatorSearchEngineTestCase is now pointless and only detects local misconfigurations.
  • It would be nice to build a practical test suite instead, where we put specific documents into the index and then search for them. The upstream test could run against MySQL, and some bin/search test could run against a configured engine like ElasticSearch. This would make it easier to make sure that behavior was as uniform as possible across engine implementations.
  • Does every assigned task now match "user" in ElasticSearch?
  • PhabricatorElasticFulltextStorageEngine has a json_encode() which should be phutil_json_encode().
  • PhabricatorSearchService throws an untranslated exception.
  • When a search cluster is down, we probably don't degrade with much grace (unhandled exception)?
  • I haven't run bin/search init, but bin/search index doesn't warn me that I may want to. This might be worth adding. The UI does warn me.
  • bin/search init warns me that the index is "incorrect". It might be more clear to distinguish between "missing" and "incorrect", since it's more comforting to users to see "everything is as we expect, doing normal first-time setup now" than "something is wrong, fixing it".
  • CLI message "Initializing search service "ElasticSearch"" does not end with a period, which is inconsistent with other UI messages.
  • It might be nice to let bin/search commands like init and index select a specific service (or even service + host) to act on, as bin/storage --ref ... now does. You can generally get the result you want by fiddling with config.
  • When a service isn't writable, bin/search init reports "Search cluster has no hosts for role "write".". This is accurate but does not provide guidance: it might be more useful to the user to explain "This service is not writable, so we're skipping index check for it.".
  • Even with write off for MySQL, bin/search index --type task --trace still updates MySQL, I think? I may be misreading the trace output. But this behavior doesn't make sense if it is the actual behavior, and it seems like reindexAbstractDocument() uses "all services", not "writable services", and the MySQL engine doesn't make sure it's writable before indexing.
  • Searching or user fails to find task Grant users tokens when a mention is created, suggesting that stemming is not working.
  • Searching for users finds that task, but fails to find a task containing "per user per month" in a comment, also suggesting that stemming is not working.
  • Searching for maniphest fails to find task maniphest.query elephant, suggesting that tokenization in ElasticSearch is not as good as the MySQL tokenization for these words (see D17330).
  • The "index incorrect" warning UI uses inconsistent title case.
  • The "index incorrect" warning UI could format the command to be run more cleanly (with addCommand(), I think).

refs T12450

Test Plan
  • Stared blankly at the code.
  • Disabled 'write' role on mysql fulltext service.
  • Edited a task, ran search indexer, verified that the mysql index wasn't being updated.

Diff Detail

Repository
rP Phabricator
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Try to make harbormaster happy by setting repository.callsign globally in ~/.arcrc

epriestley added inline comments.
src/infrastructure/cluster/search/PhabricatorMySQLSearchHost.php
26–33

Maybe split this into public function getHealthRecord() (to do the caching) and protected function newHealthRecord() (to create a new object if the cache is empty). Then we wouldn't be duplicating the caching in the subclass, and the property could become private.

src/infrastructure/cluster/search/PhabricatorSearchHost.php
124

Ah, good catch.

src/infrastructure/cluster/search/PhabricatorSearchService.php
9

Can we remove this, too, if getDisabled() / setDisabled() were removed?

154–155

I don't think we should have this rule: I think it's more valuable for the configuration to be consistent (sub-config always overrides parent-config) than to make it easier to disable a role.

226

Prefer if ($hosts) by convention. Particularly, this at least partially sidesteps bizarre bugs where count(false) is 1, which is unintuitive and can be hard to track down.

This revision is now accepted and ready to land.Mar 27 2017, 7:46 PM
  • Fixed the stemmer. user matches users and vise-versa.
  • Added a different tokenizer so that this.is.a.test tokenizes to the following:
  • this.is.a.test
    • this
    • is
    • a
    • test
20after4 edited edge metadata.

this fixes the stemmer and tokenizer to do a better job of matching words.separated.by.punctuation as well as other issues found by @epriestley.

This revision is now accepted and ready to land.Mar 28 2017, 1:42 AM

20after4 requested review of this revision.
This revision is now accepted and ready to land.

(This is a bug related to T10967, I'll fix it.)

Better formatting of setup warning messages.

Just as a general workflow suggestion, I'd encourage you to do this as a bunch of small changes instead of one big "fix everything" change: I'm completely happy to review very small changes (e.g., just remove code with no callers; just do text fixes; etc). I'm also planning to help with T12450 myself and getting small pieces upstream sooner makes that easier in terms of testing/conflicts. It's completely fine for T12450 to end up with like 20 revisions attached to it, and I think this generally makes all the review/coordination/release/project management type stuff easier (and, at least for me, the engineering stuff too).

Just as a general workflow suggestion, I'd encourage you to do this as a bunch of small changes instead of one big "fix everything" change

I think I prefer that way as well. I'll pick this up in another revision.

This revision was automatically updated to reflect the committed changes.