Page MenuHomePhabricator

try to find duplicates using an analyzed elasticsearch field
AbandonedPublic

Authored by fabe on Nov 28 2014, 1:03 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Dec 20, 6:49 PM
Unknown Object (File)
Thu, Dec 19, 1:50 PM
Unknown Object (File)
Thu, Dec 19, 1:50 PM
Unknown Object (File)
Thu, Dec 19, 1:50 PM
Unknown Object (File)
Thu, Dec 19, 1:50 PM
Unknown Object (File)
Mon, Dec 16, 9:59 AM
Unknown Object (File)
Thu, Nov 28, 2:03 AM
Unknown Object (File)
Thu, Nov 28, 2:03 AM

Details

Summary

Ref T6656 use elasticsearch to find duplicates

Test Plan

use ./bin/search find_duplicates to try without any index changes.
Then do a ./bin/search find_duplicates --installmapping to change the mapping.
Then all tasks need to be reindexed (./bin/search index --type TASK) and then try
./bin/search find_duplicates --analyzed and compare with the initial result.
Language analyzed is hardcoded to english for now.

Event Timeline

fabe retitled this revision from to try to find duplicates using an analyzed elasticsearch field.
fabe updated this object.
fabe edited the test plan for this revision. (Show Details)

This diff is not really meant to be merged. (And i'm not sure if can even create a diff on top of another not yet landed diff?)
But you're right that if we want the numbers to be correct you should apply the patch from D10955 and then this one on top
and then only run: ./bin/search find_duplicates
I'll just remove the mapping stuff from this diff.

Elasticsearch mapping is part of head now. So the rest is quite obsolete.