Page MenuHomePhabricator

Implement partial / wildcard searching (Elasticsearch)
Closed, ResolvedPublic

Description

We were getting complaints that searching through our Phabricator install was not straight forward in that you always needed to type words exactly as they appeared in a task's title, for example: "graphingcharts" will not match when searching for "charts".

Our install uses an elasticsearch instance for indexing and searching. As far as I understand, these kind of wildcard queries should be possible using elasticsearch?

See also: T6740: Put some kind of stemmer on the MySQL search index

Event Timeline

GMTA raised the priority of this task from to Needs Triage.
GMTA updated the task description. (Show Details)
GMTA added a project: Search.
GMTA added a subscriber: GMTA.
chad triaged this task as Normal priority.Nov 15 2014, 5:50 AM
chad added a subscriber: chad.

(I would like this tooooo)

One way of doing this is by adjusting the mapping for the elasticsearch index (actually way more powerful than a wildcard search).
I currently switched to the mapping below (added as a template expecting the index to be named 'phabricator' and with an english language setting).
Feel free to change the min/max ngrams setting (e.g. 4 instead of 3 letter ngrams) if you get too many / few results.
If you like to switch to another language the docs to do so are here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html
For it to take effect you have to delete the index and then reindex all objects (bin/index --all).

{
	"template": "phabricator",
	"settings": {
		"analysis": {
			"filter": {
				"trigrams_filter": {
					"type":     "ngram",
					"min_gram": 3,
					"max_gram": 3
				},
			        "english_stop": {
					"type":       "stop",
					"stopwords":  "_english_"
				},
				"english_stemmer": {
					"type":       "stemmer",
				 	"language":   "english"
			 	},
				"english_possessive_stemmer": {
					"type":       "stemmer",
					"language":   "possessive_english"
				}
			},
			"analyzer": {
				"english_trigrams": {
					"type":      "custom",
					"tokenizer": "standard",
					"filter":   [
						"english_possessive_stemmer",
						"lowercase",
						"english_stop",
						"english_stemmer",
						"trigrams_filter"
					]
				}
			}
		}
	},
	"mappings": {
		"CMIT": {
		    "properties": {
		        "field": {
		            "properties": {
		                "corpus": {
		                    "type": "string",
					"analyzer": "english_trigrams"
		                }
		            }
		        }
		    }
		},
		"DREV": {
		    "properties": {
		        "field": {
		            "properties": {
		                "corpus": {
		                    "type": "string",
					"analyzer": "english_trigrams"
		                }
		            }
		        }
		    }
		},
		"MOCK": {
		    "properties": {
		        "field": {
		            "properties": {
		                "corpus": {
		                    "type": "string",
					"analyzer": "english_trigrams"
		                }
		            }
		        }
		    }
		},
		"PROJ": {
		    "properties": {
		        "field": {
		            "properties": {
		                "corpus": {
		                    "type": "string",
					"analyzer": "english_trigrams"
		                }
		            }
		        }
		    }
		},
		"TASK": {
		    "properties": {
		        "field": {
		            "properties": {
		                "corpus": {
		                    "type": "string",
					"analyzer": "english_trigrams"
		                }
		            }
		        }
		    }
		},
		"USER": {
		    "properties": {
		        "field": {
		            "properties": {
		                "corpus": {
		                    "type": "string",
					"analyzer": "english_trigrams"
		                }
		            }
		        }
		    }
		},
		"WIKI": {
		    "properties": {
		        "field": {
		            "properties": {
		                "corpus": {
		                    "type": "string",
					"analyzer": "english_trigrams"
		                }
		            }
		        }
		    }
		}
	}
}

It is installed like this:
curl -XPUT 'http://elasticsearchhost:9200/_template/template_phabricator' -d @mapping.json
with mapping.json of course being the file containing the json above.
If you do not want that level of detail for certain types you can just remove them from the mapping and elasticsearch will assume the defaults.

qgil moved this task from Backlog to Important on the Wikimedia board.
qgil renamed this task from Implement partial / wildcard searching to Implement partial / wildcard searching (Elasticsearch).Mar 13 2015, 5:00 PM
qgil updated the task description. (Show Details)
qgil removed a project: Wikimedia.

As a general product decision, I do not expect search to be substring search by default -- searching for pricot on Google does not match documents containing apricot. But we can sort this out in the long run.

With the elasticsearch 'simple_query_string' query parser it only works if you use *pricot, for example, outside of quoted phrases.