Mar 11 2021
Nothing new has arisen for a while, so presuming this is resolved.
Mar 10 2021
Seems like it works:
I think we can be slightly more general about this, and assume any token beginning with _ is substring search. This covers __FILE__, __construct, etc. Users almost certainly intend these to be substring searches.
Feb 17 2021
I've deployed these changes to secure, so hopefully any issues will present themselves.
If something goes wrong with this, the patch which fixes the problem can now change the indexer version and then all mis-indexed documents can be reindexed with:
Feb 16 2021
The existing SearchIndexVersion table (which stores document versions) may reasonably be able to store index versions too. This limits the need to apply changes to fdocument.
This has stalled for a while because it's moderately expensive to recover from if the updated index logic has a bug: rebuilding all document indexes is expensive, and it's difficult to identify the set of documents that need to be reindexed if a bug is present.
Nov 19 2020
Apr 17 2020
This is now possible.
This is now supported.
Apr 16 2020
These field functions have somewhat-weird scopes/context.
Getting through the ngram index alone isn't good enough, because LIKE operators against utf8mb4_unicode_ci treat combining accents as separate characters:
Normalizer requires intl which I'm hesitant to add a dependency on.
Apr 14 2020
This appears to be working properly, now.
Query parsing of certain unusual or ambiguous inputs has changed slightly.
Mar 20 2020
Mar 9 2020
We also have two separate pieces of ngram extraction code:
For now, I'm going to change the ngram slicing to be character-oriented. This should never be worse than the current behavior, and moves us closer to effective normalization.
This appears to be the unicode normalization chart: