Page MenuHomePhabricator

try to find duplicates using an analyzed elasticsearch field

Authored by fabe on Nov 28 2014, 1:03 PM.



Ref T6656 use elasticsearch to find duplicates

Test Plan

use ./bin/search find_duplicates to try without any index changes.
Then do a ./bin/search find_duplicates --installmapping to change the mapping.
Then all tasks need to be reindexed (./bin/search index --type TASK) and then try
./bin/search find_duplicates --analyzed and compare with the initial result.
Language analyzed is hardcoded to english for now.

Event Timeline

fabe updated this revision to Diff 26201.Nov 28 2014, 1:03 PM
fabe retitled this revision from to try to find duplicates using an analyzed elasticsearch field.
fabe updated this object.
fabe edited the test plan for this revision. (Show Details)

This should probably be rebased on top of the work in D10955: Properly create Elasticsearch index.

fabe added a comment.Dec 19 2014, 12:55 PM

This diff is not really meant to be merged. (And i'm not sure if can even create a diff on top of another not yet landed diff?)
But you're right that if we want the numbers to be correct you should apply the patch from D10955 and then this one on top
and then only run: ./bin/search find_duplicates
I'll just remove the mapping stuff from this diff.

fabe updated this revision to Diff 26454.Dec 19 2014, 1:03 PM
fabe edited edge metadata.

remove mapping stuff

fabe abandoned this revision.May 25 2015, 12:44 PM

Elasticsearch mapping is part of head now. So the rest is quite obsolete.