Page MenuHomePhabricator

Fulltext indexing produces invalid JSON documents in Elasticsearch
Closed, ResolvedPublic

Description

Indexing phabricator objects using "./bin/search index" creates JSON documents with trash in them.

Example:

/bin/search index --type PhabricatorUser

GET http://xxx:9200/phabricator/USER/PHID-USER-ciil6p5ve27rvlck2qsj

returns

{"_index":"phabricator","_type":"USER","_id":"PHID-USER-ciil6p5ve27rvlck2qsj","_version":1,"found":true,"_source":{"title":"smith (Sam Smith)","url":"http:\/\/xxx:8066\/p\/smith\/","dateCreated":"1390566742","_timestamp":"1437979845","field":[{"type":"titl","corpus":"smith (Sam Smith)","aux":null}],"relationship":{"open":[{"phid":"PHID-USER-ciil6p5ve27rvlck2qsj","phidType":"USER","when":1452764234}]}}local:0}

Note that "local:0" at the end of the document. The trash is random: somethimes single letters, sometimes special characters. Almost every JSON document is corrupted this way which leads to not working search via ElasticSearch.

This error seems to be quite new.

Versions:

:~/phabricator$ uname -a
Linux xxx 4.2.0-23-generic #28-Ubuntu SMP Sun Dec 27 17:47:31 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

phabricator

2e7f2b735702f84cdc9a7fb2167dda40dc47390c (Sat, Jan 9)

arcanist

6833ae5bd33e86b5dbc8ee75221f778fc458b89c (Sat, Jan 9)

phutil

f5120574826088cba45c5ed4c2c05be4cbacbc86 (Sat, Jan 9)

Thanks for your help.

Event Timeline

What version of ElasticSearch are you using?

(Also: why are you using ElasticSearch instead of the default search?)

I am using elasticsearch:1.7.4 docker image. We thought using elasticsearch has advantages over mysql fulltext, is that wrong?

See some general discussion in T9893, although this doesn't seem to be an ElasticSearch 2.0 issue.

epriestley claimed this task.

Presuming this is either resolved by T9893/D17384 or no longer relevant. Follow up on T12450 or file a new task if you're still seeing issues.