Page MenuHomePhabricator

try to find duplicates using an analyzed elasticsearch field

Authored by fabe on Nov 28 2014, 1:03 PM.
Referenced Files
F13049406: D10907.id26454.diff
Fri, Apr 19, 2:10 AM
F13049405: D10907.id26201.diff
Fri, Apr 19, 2:10 AM
Fri, Apr 19, 2:10 AM
F13049145: D10907.diff
Fri, Apr 19, 1:55 AM
Unknown Object (File)
Thu, Apr 11, 3:33 AM
Unknown Object (File)
Mon, Apr 8, 11:36 AM
Unknown Object (File)
Sun, Apr 7, 11:24 PM
Unknown Object (File)
Sat, Apr 6, 10:17 PM



Ref T6656 use elasticsearch to find duplicates

Test Plan

use ./bin/search find_duplicates to try without any index changes.
Then do a ./bin/search find_duplicates --installmapping to change the mapping.
Then all tasks need to be reindexed (./bin/search index --type TASK) and then try
./bin/search find_duplicates --analyzed and compare with the initial result.
Language analyzed is hardcoded to english for now.

Event Timeline

fabe retitled this revision from to try to find duplicates using an analyzed elasticsearch field.
fabe updated this object.
fabe edited the test plan for this revision. (Show Details)

This diff is not really meant to be merged. (And i'm not sure if can even create a diff on top of another not yet landed diff?)
But you're right that if we want the numbers to be correct you should apply the patch from D10955 and then this one on top
and then only run: ./bin/search find_duplicates
I'll just remove the mapping stuff from this diff.

Elasticsearch mapping is part of head now. So the rest is quite obsolete.