⚓ T12450 New Search Configuration Errata

Herald added a subscriber: eadler. · View Herald TranscriptMar 26 2017, 12:20 PM

epriestley mentioned this in T5282: Provide documentation on setting up ElasticSearch.Mar 26 2017, 12:20 PM

epriestley updated the task description. (Show Details)Mar 26 2017, 12:24 PM

epriestley mentioned this in T10161: Fulltext indexing produces invalid JSON documents in Elasticsearch.

epriestley updated the task description. (Show Details)

epriestley added a subscriber: 20after4.

epriestley mentioned this in T4692: Slashes are being double-escaped (or not escaped?) when passed to ElasticSearch.Mar 26 2017, 12:26 PM

I haven't been testing with elasticsearch < 2.0 so this might break backwards compatibility. It wouldn't be difficult to fix any compatibility issues though, with a tiny bit of testing.

epriestley mentioned this in T6892: Invalid search result when I input less than 2 Korean character..Mar 26 2017, 12:28 PM

epriestley mentioned this in T8598: Incorrect Elasticsearch index.

epriestley mentioned this in T9460: Unable to search for open tasks using Elasticsearch.Mar 26 2017, 12:30 PM

epriestley updated the task description. (Show Details)

Write an "Upgrading: ..." guidance task with narrow instructions for installs that are upgrading.

TODO

Do we need to add an indexing activity (T11932) for installs with ElasticSearch?

Yes, I think so

We should more clearly detail exactly which versions of ElasticSearch are supported (for example, is ElasticSearch <2 no longer supported)? From >T9893 it seems like we may only have supported ElasticSearch <2 before, so are the two regions of support totally nonoverlapping and all ElasticSearch users will need to upgrade?

Previously there were some minor issues with 2.x and 5.x was impossible. Now I suspect that there are minor issues with 1.x and 2.x-5.x work flawlessly.

Documentation should provide stronger guidance toward MySQL and away from Elastic for the vast majority of installs, because we've historically >seen users choosing Elastic when they aren't actually trying to solve any specific problem.

Agreed. Although I feel that elasticsearch provides a vastly superior query parser and fulltext scoring, it is definitely not something that people should default to immediately after installing phabricator.

epriestley moved this task from Backlog to v2 on the Search board.Mar 26 2017, 12:32 PM

Has T8602 been resolved?

I can not reproduce it on wikimedia's install.

In T12450#216917, @20after4 wrote:

I can not reproduce it on wikimedia's install.

Great, thanks for checking!

epriestley updated the task description. (Show Details)Mar 26 2017, 12:44 PM

epriestley updated the task description. (Show Details)

epriestley updated the task description. (Show Details)Mar 26 2017, 12:54 PM

epriestley updated the task description. (Show Details)

epriestley updated the task description. (Show Details)Mar 26 2017, 1:04 PM

Personal notes:

ElasticSearch requires Java.
ElasticSearch requires 2GB of free RAM to even start.

Okay:

$ ./bin/elasticsearch
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/elasticsearch/bootstrap/Elasticsearch : Unsupported major.minor version 52.0
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:803)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:442)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:64)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:354)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:348)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:347)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)

That's Java for "ElasticSearch requires Java 1.8". I resolved this by installing Java 1.8 and then removing Java 1.7, which I'm sure nothing on the system depends on.

max file descriptors [4096] for elasticsearch process is too low

"Change nofile system ulimit."

max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

"...perhaps in /etc/sysctl.conf"

I configured cluster.search like this, copying the mysql config:

[
    {
      "type": "elastic",
      "roles": {
        "read": true,
        "write": true
      }
    }
]

That got me a fatal error:

[2017-03-26 08:33:55] ERROR 8: Undefined index: elastic at [/Users/epriestley/dev/core/lib/phabricator/src/infrastructure/cluster/search/PhabricatorSearchService.php:208]

My type is not valid, but the error is not detected at runtime.
Phabricator's web UI fatals rather than detecting + repairing this error.
For now, I worked around this by typing "elasticsearch" correctly.

That got me this error:

EXCEPTION: (PhabricatorWorkerPermanentFailureException) Failed to update search index for document "PHID-TASK-gfvrhh4g6twcjonrprwb": [cURL/6] (phabricator/TASK/PHID-TASK-gfvrhh4g6twcjonrprwb/) <CURLE_COULDNT_RESOLVE_HOST> There was an error resolving the server hostname. Check that you are connected to the internet and that DNS is correctly configured. (Did you add the domain to `/etc/hosts` on some other machine, but not this one?) at [<phabricator>/src/applications/search/worker/PhabricatorSearchWorker.php:54]

This is because I have not specified a "host", but this engine can not possibly run in this mode and should abort with a useful error earlier.
I provided a "host" and "port", which got me this error:

[HTTP/500] Internal Server Error
<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8" />
    <title>Unrecognized Hostname</title>
...

This is because I did not specify "protocol".
But the install can't possibly work without "protocol", and this doesn't guide me to the issue.
The UI issues misleading guidance:

Configuration option 'cluster.search' has invalid value and was restored to the default: Search engine configuration has an invalid service specification (at index "0"): Got unexpected parameters: host.

This is actually not right; "host" is valid a top-level I think? bin/search index appears to be working with this config, but the UI says it isn't valid:

{
  "type": "elasticsearch",
  "host": "elastic001.epriestley.com",
  "port": 9200,
  "protocol": "http",
  "roles": {
    "read": true,
    "write": true
  }
}

I haven't run bin/search init, but bin/search index doesn't warn me that I may want to. This might be worth adding. The UI does warn me.
The "index incorrect" warning UI uses inconsistent title case.
The "index incorrect" warning UI could format the command to be run more cleanly (with addCommand(), I think).
bin/search init warns me that the index is "incorrect". It might be more clear to distinguish between "missing" and "incorrect", since it's more comforting to users to see "everything is as we expect, doing normal first-time setup now" than "something is wrong, fixing it".
CLI message "Initializing search service "ElasticSearch"" does not end with a period, which is inconsistent with other UI messages.
It might be nice to let bin/search commands like init and index select a specific service (or even service + host) to act on, as bin/storage --ref ... now does. You can generally get the result you want by fiddling with config.
When a service isn't writable, bin/search init reports "Search cluster has no hosts for role "write".". This is accurate but does not provide guidance: it might be more useful to the user to explain "This service is not writable, so we're skipping index check for it.".
Even with write off for MySQL, bin/search index --type task --trace still updates MySQL, I think? I may be misreading the trace output. But this behavior doesn't make sense if it is the actual behavior, and it seems like reindexAbstractDocument() uses "all services", not "writable services", and the MySQL engine doesn't make sure it's writable before indexing.

Searching for f*a*c*t*o*r*y*s*u*r*p*l*u*s*z*z*q*q*z*z*q*q produces nonsenical results (many results, when I would expect no results: the results do not contain that sequence of letters in order).
Searching or user fails to find task Grant users tokens when a mention is created, suggesting that stemming is not working.
Searching for users finds that task, but fails to find a task containing "per user per month" in a comment, also suggesting that stemming is not working.
Searching for maniphest fails to find task maniphest.query elephant, suggesting that tokenization is ElasticSearch is not as good as the MySQL tokenization for these words (see D17330).
Searching for users -blue returns a huge number of results: significantly more than users. Expected behavior: fewer results, omitting those results matching blue.
Searching for users blue returns more results than users or blue. Expected behavior: fewer results, because only results which match "users" AND "blue" are returned. The result set includes completely irrelevant results.

I ran into a lot of confusion because the versioned object indexes are not namespaced per-service. Basically, if you insert version 95 of a document into Elastic, the indexer thinks that version 95 doesn't need to go into MySQL, even though it does. So when you run bin/search index ..., you may get only a subset of the updates you actually need. The object index versions need to change to become engine-aware so they are stored per-service, not globally, and/or the whole mechanism needs to include a hash of cluster.search or just be turned off. Until this is fixed, it can be worked around with using --force everywhere.

bin/search index might reasonably provide summary output about this ("392 documents were not indexed because they haven't changed, use --force to update them.").

J5lx added a subscriber: J5lx.Mar 26 2017, 6:11 PM

@epriestley: Thanks for the detailed feedback... I'll get to work ;)

In T12450#216943, @epriestley wrote:

Searching for f*a*c*t*o*r*y*s*u*r*p*l*u*s*z*z*q*q*z*z*q*q produces nonsenical results (many results, when I would expect no results: the results do not contain that sequence of letters in order).

Searching or user fails to find task Grant users tokens when a mention is created, suggesting that stemming is not working.

Searching for users finds that task, but fails to find a task containing "per user per month" in a comment, also suggesting that stemming is not working.

Searching for maniphest fails to find task maniphest.query elephant, suggesting that tokenization is ElasticSearch is not as good as the MySQL tokenization for these words (see D17330).

Searching for users -blue returns a huge number of results: significantly more than users. Expected behavior: fewer results, omitting those results matching blue.

Searching for users blue returns more results than users or blue. Expected behavior: fewer results, because only results which match "users" AND "blue" are returned. The result set includes completely irrelevant results.

I can't actually explain the search anomalies you've encountered, my experience with elasticsearch has been the opposite.. I will test these cases and try to identify the cause.

In T12450#216943, @epriestley wrote:

I ran into a lot of confusion because the versioned object indexes are not namespaced per-service. Basically, if you insert version 95 of a document into Elastic, the indexer thinks that version 95 doesn't need to go into MySQL, even though it does. So when you run bin/search index ..., you may get only a subset of the updates you actually need. The object index versions need to change to become engine-aware so they are stored per-service, not globally, and/or the whole mechanism needs to include a hash of cluster.search or just be turned off. Until this is fixed, it can be worked around with using --force everywhere.

bin/search index might reasonably provide summary output about this ("392 documents were not indexed because they haven't changed, use --force to update them.").

This explains something that I totally missed - I've been using --force a lot because of the same confusion.

f*a*c*t*o*r*y*s*u*r*p*l*u*s*z*z*q*q*z*z*q*q returns the same results as
f a c t o r y s u r p l u s z z q q z z q q so it appears to be treating those as individual single-letter tokens. strange.

In T12450#216943, @epriestley wrote:

Searching for f*a*c*t*o*r*y*s*u*r*p*l*u*s*z*z*q*q*z*z*q*q produces nonsenical results (many results, when I would expect no results: the results do not contain that sequence of letters in order).

Searching or user fails to find task Grant users tokens when a mention is created, suggesting that stemming is not working.

Searching for users finds that task, but fails to find a task containing "per user per month" in a comment, also suggesting that stemming is not working.

Searching for maniphest fails to find task maniphest.query elephant, suggesting that tokenization is ElasticSearch is not as good as the MySQL tokenization for these words (see D17330).

Downstream, a search for "phids to maniphest query" finds "add ids and phids to maniphest.query" so it's tokenizing maniphest.query as two separate tokens. Not sure what happened in your test case.

Searching for users -blue returns a huge number of results: significantly more than users. Expected behavior: fewer results, omitting those results matching blue.

Searching for users blue returns more results than users or blue. Expected behavior: fewer results, because only results which match "users" AND "blue" are returned. The result set includes completely irrelevant results.

Downstream search for "users -blocked -deprecated -anonymous" effectively removes results for blocked, deprecated and anonymous

I don't think this is due to any Wikimedia-specific configuration on our elasticsearch, especially because my local testing has been done with a vanilla docker image directly from elasticsearch upstream.

20after4 added a revision: D17564: Address some New Search Configuration Errata.Mar 27 2017, 2:41 PM

so it's tokenizing maniphest.query as two separate tokens

I think the commit message just has "maniphest" and "query" as separate words, and it's finding those -- not finding them from "maniphest.query". Here's a case showing this doesn't work on the WMF install:

Search for "diffusion.looksoon" finds 5 results: https://phabricator.wikimedia.org/search/query/q9Av9De3IafM/#R
Search for looksoon finds only 1 result: https://phabricator.wikimedia.org/search/query/CEwbMlWS2pAW/#R

I've updated D17564: Address some New Search Configuration Errata to address the tokenization and word stemming issues.

20after4 updated the task description. (Show Details)Mar 28 2017, 1:38 AM

epriestley updated the task description. (Show Details)Mar 28 2017, 11:30 AM

epriestley mentioned this in D17564: Address some New Search Configuration Errata.Mar 28 2017, 11:47 AM

bin/search init warns me that the index is "incorrect". It might be more clear to distinguish between "missing" and "incorrect", since it's more comforting to users to see "everything is as we expect, doing normal first-time setup now" than "something is wrong, fixing it".

It looks like we already have logic for this, so maybe indexExists() doesn't detect that the ElasticSearch index is missing properly in some cases.

epriestley added a revision: D17571: Fix isReadable() and isWritable() in SearchService.Mar 28 2017, 7:51 PM

epriestley added a revision: D17572: Make `bin/search init` messaging a little more consistent.Mar 28 2017, 7:55 PM

epriestley added a revision: D17573: Remove PhabricatorSearchEngineTestCase.Mar 28 2017, 7:57 PM

20after4 added a commit: rP699228c73b74: Address some New Search Configuration Errata.Mar 28 2017, 8:19 PM

epriestley added a revision: D17574: Re-run config validation from `bin/search`.Mar 28 2017, 8:54 PM

epriestley added a commit: rPe7c76d92d546: Make `bin/search init` messaging a little more consistent.Mar 28 2017, 8:57 PM

epriestley added a commit: rPc22693ff2915: Remove PhabricatorSearchEngineTestCase.

epriestley added a commit: rPc40be811ea9b: Fix isReadable() and isWritable() in SearchService.

epriestley updated the task description. (Show Details)Mar 28 2017, 8:59 PM

epriestley added a commit: rP5f939dcce0f8: Re-run config validation from `bin/search`.Mar 28 2017, 9:53 PM

20after4 added a revision: D17575: Provide some guidance about elasticsearch in cluster docs.Mar 28 2017, 10:06 PM

epriestley added a revision: D17576: Soften a possible cluster search setup fatal.Mar 28 2017, 10:18 PM

epriestley added a commit: rP88798354e8c9: Soften a possible cluster search setup fatal.Mar 28 2017, 10:28 PM

So I ran into one problem when deploying the latest code to WMF production. We now throw a setup error if we have a cluster with no readable hosts. It actually makes sense to have a cluster that is write-only so that setup error is bogus.

20after4 updated the task description. (Show Details)Mar 30 2017, 6:11 PM

epriestley updated the task description. (Show Details)Apr 2 2017, 3:05 PM

epriestley updated the task description. (Show Details)Apr 2 2017, 3:42 PM

In T12450#217586, @20after4 wrote:

So I ran into one problem when deploying the latest code to WMF production. We now throw a setup error if we have a cluster with no readable hosts. It actually makes sense to have a cluster that is write-only so that setup error is bogus.

I couldn't immediately reproduce this -- I used a config like this:

{
  "type": "elasticsearch",
  "hosts": [
    {
      "host": "elastic001.epriestley.com",
      "port": 9200,
      "protocol": "http",
      "roles": {
        "read": false,
        "write": true
      }
    }
  ]
},

That didn't seem to raise a config/setup error or an error when actually searching -- do you have a copy of the error itself or a reproducing config?

epriestley added a revision: D17597: Count and report skipped documents from "bin/search index".Apr 2 2017, 4:20 PM

dlackty added a subscriber: dlackty.Apr 2 2017, 4:32 PM

epriestley added a revision: D17598: When "cluster.search" changes, don't trust the old index versions.Apr 2 2017, 4:33 PM

epriestley updated the task description. (Show Details)Apr 2 2017, 4:37 PM

epriestley updated the task description. (Show Details)Apr 2 2017, 4:44 PM

epriestley added a revision: D17599: After a fulltext write to a particular service fails, keep trying writes to other services.Apr 2 2017, 4:59 PM

epriestley added a revision: D17600: Remove "url" from Elasticsearch index.Apr 2 2017, 5:05 PM

epriestley updated the task description. (Show Details)Apr 2 2017, 5:06 PM

epriestley updated the task description. (Show Details)Apr 2 2017, 6:19 PM

epriestley updated the task description. (Show Details)Apr 2 2017, 6:29 PM

(This is all wrong, see below.)

One note here is that elastic.search.namespace has been removed without replacement. Since this was added by WMF in D9798 and effectively removed by WMF in D17384, I presume it no longer has any use cases.

If that isn't the case, we should maybe consider piggy-backing on storage.default-namespace rather than having a separate setting, although cluster.search is sufficiently flexible that it probably doesn't matter too much.

If we do use storage.default-namespace (either as the entire setting, or as a default for the setting) we should include it in the index version hash added by D17598.

Oh, I'm wrong. It hasn't been removed without replacement: there's a new path option for Elastic. I think there's probably a bug in indexExists() then, since it looks for 'phabricator' by name:

$res = $this->executeRequest($host, $uri, array());
return isset($res['indices']['phabricator']);

Presumably that should actually be trim($path, '/') or something?

epriestley added a revision: D17601: Spell "Elasticsearch" correctly, not "ElasticSearch".Apr 2 2017, 7:10 PM

epriestley updated the task description. (Show Details)Apr 2 2017, 7:19 PM

epriestley updated the task description. (Show Details)

epriestley added a revision: D17602: Adjust and wordsmith Search documentation.Apr 2 2017, 8:04 PM

epriestley updated the task description. (Show Details)Apr 2 2017, 8:08 PM

epriestley added a commit: rP287e708c4d3e: Adjust and wordsmith Search documentation.Apr 2 2017, 8:09 PM

epriestley added a commit: rP6d8167503266: Remove "url" from Elasticsearch index.Apr 2 2017, 8:26 PM

epriestley added a commit: rPbd939782001e: Count and report skipped documents from "bin/search index".Apr 2 2017, 8:45 PM

epriestley added a commit: rP0f144d29e920: When "cluster.search" changes, don't trust the old index versions.

epriestley added a commit: rP304d19f92a7b: After a fulltext write to a particular service fails, keep trying writes to….Apr 2 2017, 8:47 PM

It would be nice to build a practical test suite instead, where we put specific documents into the index and then search for them.

@epriestley: It's straightforward to launch an Elasticsearch instance using Docker. I could write a test that fires up a new docker instance, configures Phabricator to use it, indexes a few specific documents and then does a search to make sure results look correct.

You came up with some pretty good test cases when you were testing my patches. The only thing I'm not really clear about is how to temporarily override phabricator's config to use the test container for search. Is there a config override mechanism for use in phabricator unit tests?

dlackty removed a subscriber: dlackty.Apr 2 2017, 8:51 PM

Yeah:

https://secure.phabricator.com/source/phabricator/browse/master/src/applications/base/controller/__tests__/PhabricatorAccessControlTestCase.php;304d19f92a7bea08573045d6951cefa4b14e7086$55-59

PhabricatorEnv::beginScopedEnv(); works anywhere if this makes more sense as, like, bin/search self-test or something rather than a unit test.

Maybe let bin/search commands target a specific service.

This is a good idea, and would be very helpful for operational use during a migration.

It might also be handy if bin/search could actually execute queries and return results, just for easy testing of the backend. Not sure about potential policy bypass though.

This reminds me of something else I have been thinking about.... If we add the view policy field to search indexes then it would be possible to implement a public search engine for logged out users without worrying about filtering results or pagination issues (at least with Elasticsearch, this is a simple filter on ViewPolicy:public). This obviously wouldn't be useful beyond identifying public documents - the search engine can't evaluate complex policy logic but it can match on static values.

If you can run bin/whatever, policies no longer matter since you can bin/storage shell or any other number of dangerous things.

If we add the view policy field to search indexes then it would be possible to implement a public search engine for logged out users without worrying about filtering results or pagination issues

Only if the objects aren't part of Spaces, don't later implement SpacesInterface, don't currently implement ExtendedPolicyInterface, don't later implement it, don't have dependent/implicit policies, no one turns off policy.allow-public, and so on. Offhand, this feels like a lot of complexity for pretty limited utility.

epriestley added a commit: rPa9e2732a5cb3: Spell "Elasticsearch" correctly, not "ElasticSearch".Apr 2 2017, 10:00 PM

joshuaspence added a subscriber: joshuaspence.Apr 3 2017, 8:51 AM

20after4 added a revision: D17615: Don't apply offset to elasticsearch query.Apr 4 2017, 5:18 PM

epriestley mentioned this in T12493: Upgrading: Fulltext Search Services.Apr 12 2017, 2:43 PM

epriestley mentioned this in T12677: Support multiple mail delivery services for automatic failover.May 5 2017, 3:45 PM

epriestley mentioned this in T12965: When no "master" database is configured, the ElasticSearch setup check can fatal.Aug 17 2017, 4:41 PM

hskiba added a subscriber: hskiba.Nov 24 2020, 1:19 PM

epriestley moved this task from v2 to External Search on the Search board.Mar 11 2021, 5:49 PM

rP Phabricator
	Abandoned		D17615 Don't apply offset to elasticsearch query
	Closed		D17580 Set content-type to application/json
	Closed		D17581 Make sure writes go to the right cluster
	Abandoned		D17575 Provide some guidance about elasticsearch in cluster docs
		D17601	rPa9e2732a5cb3 Spell "Elasticsearch" correctly, not "ElasticSearch"
		D17599	rP304d19f92a7b After a fulltext write to a particular service fails, keep trying writes to…
		D17598	rP0f144d29e920 When "cluster.search" changes, don't trust the old index versions
		D17597	rPbd939782001e Count and report skipped documents from "bin/search index"
		D17600	rP6d8167503266 Remove "url" from Elasticsearch index
		D17602	rP287e708c4d3e Adjust and wordsmith Search documentation
		D17576	rP88798354e8c9 Soften a possible cluster search setup fatal
		D17574	rP5f939dcce0f8 Re-run config validation from `bin/search`
		D17571	rPc40be811ea9b Fix isReadable() and isWritable() in SearchService
		D17573	rPc22693ff2915 Remove PhabricatorSearchEngineTestCase
		D17572	rPe7c76d92d546 Make `bin/search init` messaging a little more consistent
		D17564	rP699228c73b74 Address some New Search Configuration Errata

New Search Configuration Errata
Open, NormalPublic
Actions

Description

Revisions and Commits

Related Objects

Event Timeline

New Search Configuration ErrataOpen, NormalPublicActions

Description

Revisions and Commits

Related Objects

Event Timeline

New Search Configuration Errata
Open, NormalPublic
Actions