Page MenuHomePhabricator

SearchProject
ActivePublic

Watchers (2)

  • This project does not have any watchers.
  • View All

Recent Activity

Mar 11 2021

epriestley closed T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes as Resolved.

Nothing new has arisen for a while, so presuming this is resolved.

Mar 11 2021, 5:52 PM · Search
epriestley moved T13501: Improve search index normalization of "é" and other characters with variants or multiple representations from Backlog to Future on the Search board.
Mar 11 2021, 5:49 PM · Search
epriestley moved T13196: Allow search cluster path value point to alias for Elasticsearch fulltext engine from Backlog to External Search on the Search board.
Mar 11 2021, 5:49 PM · Search, Elasticsearch
epriestley moved T12965: When no "master" database is configured, the ElasticSearch setup check can fatal from Backlog to External Search on the Search board.
Mar 11 2021, 5:49 PM · Database, Clusters, Search
epriestley moved T12450: New Search Configuration Errata from v2 to External Search on the Search board.
Mar 11 2021, 5:49 PM · Search
epriestley triaged T13633: Ferret searches which match very large result sets may be dominated by result ordering as Low priority.
Mar 11 2021, 5:47 PM · Search

Mar 10 2021

epriestley renamed T13632: Compile `_...` search tokens as substring searches from Compile `__X__` search tokens as substring searches to Compile `_...` search tokens as substring searches.
Mar 10 2021, 8:01 PM · Search
epriestley closed T13632: Compile `_...` search tokens as substring searches as Resolved.

Seems like it works:

Mar 10 2021, 8:01 PM · Search
epriestley added a revision to T13632: Compile `_...` search tokens as substring searches: D21602: Interpret search tokens in the for "_..." as substring search.
Mar 10 2021, 7:55 PM · Search
epriestley added a comment to T13632: Compile `_...` search tokens as substring searches.

I think we can be slightly more general about this, and assume any token beginning with _ is substring search. This covers __FILE__, __construct, etc. Users almost certainly intend these to be substring searches.

Mar 10 2021, 7:41 PM · Search
epriestley triaged T13632: Compile `_...` search tokens as substring searches as Wishlist priority.
Mar 10 2021, 7:35 PM · Search

Feb 17 2021

epriestley added a comment to T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes.

I've deployed these changes to secure, so hopefully any issues will present themselves.

Feb 17 2021, 12:18 AM · Search
epriestley added a comment to T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes.

If something goes wrong with this, the patch which fixes the problem can now change the indexer version and then all mis-indexed documents can be reindexed with:

Feb 17 2021, 12:10 AM · Search

Feb 16 2021

epriestley added a revision to T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes: D21560: When documents are indexed, record the indexer version (versus the object version) and index epoch.
Feb 16 2021, 11:59 PM · Search
epriestley added a comment to T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes.

The existing SearchIndexVersion table (which stores document versions) may reasonably be able to store index versions too. This limits the need to apply changes to fdocument.

Feb 16 2021, 10:00 PM · Search
epriestley added a comment to T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes.

This has stalled for a while because it's moderately expensive to recover from if the updated index logic has a bug: rebuilding all document indexes is expensive, and it's difficult to identify the set of documents that need to be reindexed if a bug is present.

Feb 16 2021, 8:52 PM · Search

Nov 19 2020

epriestley added a revision to T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes: D21495: When updating a Ferret search index document, reuse existing rows where possible.
Nov 19 2020, 9:36 PM · Search
epriestley triaged T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes as Normal priority.
Nov 19 2020, 8:24 PM · Search

Apr 17 2020

epriestley renamed T13501: Improve search index normalization of "é" and other characters with variants or multiple representations from Ngram search for "é" has slicing and collation issues with multibyte characters and multicharacter glyphs to Improve search index normalization of "é" and other characters with variants or multiple representations.
Apr 17 2020, 1:05 PM · Search
epriestley closed T13511: Allow extensions to define new document fields (like "title:") in Ferret search as Resolved.

This is now possible.

Apr 17 2020, 12:23 PM · Search
epriestley closed T13503: Index Paste documents in Ferret as Resolved.

This is now supported.

Apr 17 2020, 12:23 PM · Search, Paste
epriestley closed T13503: Index Paste documents in Ferret, a subtask of T13511: Allow extensions to define new document fields (like "title:") in Ferret search, as Resolved.
Apr 17 2020, 12:23 PM · Search

Apr 16 2020

epriestley added a revision to T13511: Allow extensions to define new document fields (like "title:") in Ferret search: D21131: Modularize Ferret fulltext functions.
Apr 16 2020, 8:39 PM · Search
epriestley added a comment to T13511: Allow extensions to define new document fields (like "title:") in Ferret search.

These field functions have somewhat-weird scopes/context.

Apr 16 2020, 6:08 PM · Search
epriestley added a revision to T13511: Allow extensions to define new document fields (like "title:") in Ferret search: D21130: Remove Ferret function aliases and overrides.
Apr 16 2020, 5:31 PM · Search
epriestley added a revision to T13501: Improve search index normalization of "é" and other characters with variants or multiple representations: D21128: Combine the two different ngram-splitting algorithms into a single engine.
Apr 16 2020, 4:38 PM · Search
epriestley added a revision to T13501: Improve search index normalization of "é" and other characters with variants or multiple representations: D21127: Remove broken and unfixable "prefix" ngram behavior.
Apr 16 2020, 4:32 PM · Search
epriestley added a revision to T13511: Allow extensions to define new document fields (like "title:") in Ferret search: D21126: Remove unused "getAllFunctionFields()" from Ferret.
Apr 16 2020, 3:05 PM · Search
epriestley added a parent task for T13503: Index Paste documents in Ferret: T13511: Allow extensions to define new document fields (like "title:") in Ferret search.
Apr 16 2020, 3:05 PM · Search, Paste
epriestley added a parent task for T13501: Improve search index normalization of "é" and other characters with variants or multiple representations: T13511: Allow extensions to define new document fields (like "title:") in Ferret search.
Apr 16 2020, 3:05 PM · Search
epriestley added subtasks for T13511: Allow extensions to define new document fields (like "title:") in Ferret search: T13509: Support "field present" and "field absent" operators in Ferret, T13503: Index Paste documents in Ferret, T13501: Improve search index normalization of "é" and other characters with variants or multiple representations.
Apr 16 2020, 3:05 PM · Search
epriestley added a parent task for T13509: Support "field present" and "field absent" operators in Ferret: T13511: Allow extensions to define new document fields (like "title:") in Ferret search.
Apr 16 2020, 3:05 PM · Search
epriestley triaged T13511: Allow extensions to define new document fields (like "title:") in Ferret search as Normal priority.
Apr 16 2020, 3:04 PM · Search
epriestley added a comment to T13501: Improve search index normalization of "é" and other characters with variants or multiple representations.

Getting through the ngram index alone isn't good enough, because LIKE operators against utf8mb4_unicode_ci treat combining accents as separate characters:

Apr 16 2020, 2:59 PM · Search
epriestley added a comment to T13501: Improve search index normalization of "é" and other characters with variants or multiple representations.

Normalizer requires intl which I'm hesitant to add a dependency on.

Apr 16 2020, 2:00 PM · Search

Apr 14 2020

epriestley closed T13509: Support "field present" and "field absent" operators in Ferret as Resolved.

This appears to be working properly, now.

Apr 14 2020, 6:19 PM · Search
epriestley added a comment to T13509: Support "field present" and "field absent" operators in Ferret.

Query parsing of certain unusual or ambiguous inputs has changed slightly.

Apr 14 2020, 5:31 PM · Search
epriestley added a revision to T13509: Support "field present" and "field absent" operators in Ferret: D21112: Document the "field present" and "field absent" operators in Ferret.
Apr 14 2020, 5:31 PM · Search
epriestley added a revision to T13509: Support "field present" and "field absent" operators in Ferret: D21111: Make the Ferret query compiler keep functions sticky across non-initial quoted tokens.
Apr 14 2020, 5:23 PM · Search
epriestley added a revision to T13509: Support "field present" and "field absent" operators in Ferret: D21110: Implement the "present" and "absent" operators in the Ferret execution engine.
Apr 14 2020, 5:22 PM · Search
epriestley added a revision to T13509: Support "field present" and "field absent" operators in Ferret: D21109: Tighten query compiler rules around spaces inside and after operators.
Apr 14 2020, 5:18 PM · Search
epriestley added a revision to T13509: Support "field present" and "field absent" operators in Ferret: D21108: Make Ferret query functions sticky only if their values are not quoted.
Apr 14 2020, 5:03 PM · Search
epriestley added a revision to T13509: Support "field present" and "field absent" operators in Ferret: D21107: Add "absent" and "present" field operators to the Ferret query compiler.
Apr 14 2020, 4:56 PM · Search
epriestley added a revision to T13509: Support "field present" and "field absent" operators in Ferret: D21106: Tighten Ferret query parsing of empty tokens and empty functions.
Apr 14 2020, 4:52 PM · Search
epriestley triaged T13509: Support "field present" and "field absent" operators in Ferret as Low priority.
Apr 14 2020, 4:48 PM · Search

Mar 20 2020

epriestley triaged T13503: Index Paste documents in Ferret as Low priority.
Mar 20 2020, 7:12 PM · Search, Paste

Mar 9 2020

epriestley added a comment to T13501: Improve search index normalization of "é" and other characters with variants or multiple representations.

We also have two separate pieces of ngram extraction code:

Mar 9 2020, 5:43 PM · Search
epriestley added a comment to T13501: Improve search index normalization of "é" and other characters with variants or multiple representations.

For now, I'm going to change the ngram slicing to be character-oriented. This should never be worse than the current behavior, and moves us closer to effective normalization.

Mar 9 2020, 5:38 PM · Search
epriestley added a comment to T13501: Improve search index normalization of "é" and other characters with variants or multiple representations.

This appears to be the unicode normalization chart:

Mar 9 2020, 5:27 PM · Search
epriestley triaged T13501: Improve search index normalization of "é" and other characters with variants or multiple representations as Low priority.
Mar 9 2020, 5:16 PM · Search