Make Ferret indexing more robust (UTF8, exception handling)
Summary:
Ref T12819. Two minor improvements from live data:
- Tokenize in a UTF8-aware way.
- When one document fails to index, kill the transaction explicitly (rather than leaving it hanging) so we don't cause other failures later.
Test Plan: Created some UTF8 documents locally, indexed them, got clean results.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12819
Differential Revision: https://secure.phabricator.com/D18487