Page MenuHomePhabricator
Feed Advanced Search

Mar 11 2021

epriestley closed T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes as Resolved.

Nothing new has arisen for a while, so presuming this is resolved.

Mar 11 2021, 5:52 PM · Search
epriestley moved T13501: Improve search index normalization of "é" and other characters with variants or multiple representations from Backlog to Future on the Search board.
Mar 11 2021, 5:49 PM · Search
epriestley moved T13196: Allow search cluster path value point to alias for Elasticsearch fulltext engine from Backlog to External Search on the Search board.
Mar 11 2021, 5:49 PM · Search, Elasticsearch
epriestley moved T12965: When no "master" database is configured, the ElasticSearch setup check can fatal from Backlog to External Search on the Search board.
Mar 11 2021, 5:49 PM · Database, Clusters, Search
epriestley moved T12450: New Search Configuration Errata from v2 to External Search on the Search board.
Mar 11 2021, 5:49 PM · Search
epriestley triaged T13633: Ferret searches which match very large result sets may be dominated by result ordering as Low priority.
Mar 11 2021, 5:47 PM · Search

Mar 10 2021

epriestley renamed T13632: Compile `_...` search tokens as substring searches from Compile `__X__` search tokens as substring searches to Compile `_...` search tokens as substring searches.
Mar 10 2021, 8:01 PM · Search
epriestley closed T13632: Compile `_...` search tokens as substring searches as Resolved.

Seems like it works:

Mar 10 2021, 8:01 PM · Search
epriestley added a revision to T13632: Compile `_...` search tokens as substring searches: D21602: Interpret search tokens in the for "_..." as substring search.
Mar 10 2021, 7:55 PM · Search
epriestley added a comment to T13632: Compile `_...` search tokens as substring searches.

I think we can be slightly more general about this, and assume any token beginning with _ is substring search. This covers __FILE__, __construct, etc. Users almost certainly intend these to be substring searches.

Mar 10 2021, 7:41 PM · Search
epriestley triaged T13632: Compile `_...` search tokens as substring searches as Wishlist priority.
Mar 10 2021, 7:35 PM · Search

Feb 17 2021

epriestley added a comment to T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes.

I've deployed these changes to secure, so hopefully any issues will present themselves.

Feb 17 2021, 12:18 AM · Search
epriestley added a comment to T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes.

If something goes wrong with this, the patch which fixes the problem can now change the indexer version and then all mis-indexed documents can be reindexed with:

Feb 17 2021, 12:10 AM · Search

Feb 16 2021

epriestley added a revision to T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes: D21560: When documents are indexed, record the indexer version (versus the object version) and index epoch.
Feb 16 2021, 11:59 PM · Search
epriestley added a comment to T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes.

The existing SearchIndexVersion table (which stores document versions) may reasonably be able to store index versions too. This limits the need to apply changes to fdocument.

Feb 16 2021, 10:00 PM · Search
epriestley added a comment to T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes.

This has stalled for a while because it's moderately expensive to recover from if the updated index logic has a bug: rebuilding all document indexes is expensive, and it's difficult to identify the set of documents that need to be reindexed if a bug is present.

Feb 16 2021, 8:52 PM · Search

Nov 19 2020

epriestley added a revision to T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes: D21495: When updating a Ferret search index document, reuse existing rows where possible.
Nov 19 2020, 9:36 PM · Search
epriestley triaged T13587: Ferret may exhaust AUTO_INCREMENT ID space of "ngrams" table after many reindexes as Normal priority.
Nov 19 2020, 8:24 PM · Search

Apr 17 2020

epriestley renamed T13501: Improve search index normalization of "é" and other characters with variants or multiple representations from Ngram search for "é" has slicing and collation issues with multibyte characters and multicharacter glyphs to Improve search index normalization of "é" and other characters with variants or multiple representations.
Apr 17 2020, 1:05 PM · Search
epriestley closed T13511: Allow extensions to define new document fields (like "title:") in Ferret search as Resolved.

This is now possible.

Apr 17 2020, 12:23 PM · Search
epriestley closed T13503: Index Paste documents in Ferret as Resolved.

This is now supported.

Apr 17 2020, 12:23 PM · Search, Paste
epriestley closed T13503: Index Paste documents in Ferret, a subtask of T13511: Allow extensions to define new document fields (like "title:") in Ferret search, as Resolved.
Apr 17 2020, 12:23 PM · Search

Apr 16 2020

epriestley added a revision to T13511: Allow extensions to define new document fields (like "title:") in Ferret search: D21131: Modularize Ferret fulltext functions.
Apr 16 2020, 8:39 PM · Search
epriestley added a comment to T13511: Allow extensions to define new document fields (like "title:") in Ferret search.

These field functions have somewhat-weird scopes/context.

Apr 16 2020, 6:08 PM · Search
epriestley added a revision to T13511: Allow extensions to define new document fields (like "title:") in Ferret search: D21130: Remove Ferret function aliases and overrides.
Apr 16 2020, 5:31 PM · Search
epriestley added a revision to T13501: Improve search index normalization of "é" and other characters with variants or multiple representations: D21128: Combine the two different ngram-splitting algorithms into a single engine.
Apr 16 2020, 4:38 PM · Search
epriestley added a revision to T13501: Improve search index normalization of "é" and other characters with variants or multiple representations: D21127: Remove broken and unfixable "prefix" ngram behavior.
Apr 16 2020, 4:32 PM · Search
epriestley added a revision to T13511: Allow extensions to define new document fields (like "title:") in Ferret search: D21126: Remove unused "getAllFunctionFields()" from Ferret.
Apr 16 2020, 3:05 PM · Search
epriestley added a parent task for T13503: Index Paste documents in Ferret: T13511: Allow extensions to define new document fields (like "title:") in Ferret search.
Apr 16 2020, 3:05 PM · Search, Paste
epriestley added a parent task for T13501: Improve search index normalization of "é" and other characters with variants or multiple representations: T13511: Allow extensions to define new document fields (like "title:") in Ferret search.
Apr 16 2020, 3:05 PM · Search
epriestley added subtasks for T13511: Allow extensions to define new document fields (like "title:") in Ferret search: T13509: Support "field present" and "field absent" operators in Ferret, T13503: Index Paste documents in Ferret, T13501: Improve search index normalization of "é" and other characters with variants or multiple representations.
Apr 16 2020, 3:05 PM · Search
epriestley added a parent task for T13509: Support "field present" and "field absent" operators in Ferret: T13511: Allow extensions to define new document fields (like "title:") in Ferret search.
Apr 16 2020, 3:05 PM · Search
epriestley triaged T13511: Allow extensions to define new document fields (like "title:") in Ferret search as Normal priority.
Apr 16 2020, 3:04 PM · Search
epriestley added a comment to T13501: Improve search index normalization of "é" and other characters with variants or multiple representations.

Getting through the ngram index alone isn't good enough, because LIKE operators against utf8mb4_unicode_ci treat combining accents as separate characters:

Apr 16 2020, 2:59 PM · Search
epriestley added a comment to T13501: Improve search index normalization of "é" and other characters with variants or multiple representations.

Normalizer requires intl which I'm hesitant to add a dependency on.

Apr 16 2020, 2:00 PM · Search

Apr 14 2020

epriestley closed T13509: Support "field present" and "field absent" operators in Ferret as Resolved.

This appears to be working properly, now.

Apr 14 2020, 6:19 PM · Search
epriestley added a comment to T13509: Support "field present" and "field absent" operators in Ferret.

Query parsing of certain unusual or ambiguous inputs has changed slightly.

Apr 14 2020, 5:31 PM · Search
epriestley added a revision to T13509: Support "field present" and "field absent" operators in Ferret: D21112: Document the "field present" and "field absent" operators in Ferret.
Apr 14 2020, 5:31 PM · Search
epriestley added a revision to T13509: Support "field present" and "field absent" operators in Ferret: D21111: Make the Ferret query compiler keep functions sticky across non-initial quoted tokens.
Apr 14 2020, 5:23 PM · Search
epriestley added a revision to T13509: Support "field present" and "field absent" operators in Ferret: D21110: Implement the "present" and "absent" operators in the Ferret execution engine.
Apr 14 2020, 5:22 PM · Search
epriestley added a revision to T13509: Support "field present" and "field absent" operators in Ferret: D21109: Tighten query compiler rules around spaces inside and after operators.
Apr 14 2020, 5:18 PM · Search
epriestley added a revision to T13509: Support "field present" and "field absent" operators in Ferret: D21108: Make Ferret query functions sticky only if their values are not quoted.
Apr 14 2020, 5:03 PM · Search
epriestley added a revision to T13509: Support "field present" and "field absent" operators in Ferret: D21107: Add "absent" and "present" field operators to the Ferret query compiler.
Apr 14 2020, 4:56 PM · Search
epriestley added a revision to T13509: Support "field present" and "field absent" operators in Ferret: D21106: Tighten Ferret query parsing of empty tokens and empty functions.
Apr 14 2020, 4:52 PM · Search
epriestley triaged T13509: Support "field present" and "field absent" operators in Ferret as Low priority.
Apr 14 2020, 4:48 PM · Search

Mar 20 2020

epriestley triaged T13503: Index Paste documents in Ferret as Low priority.
Mar 20 2020, 7:12 PM · Search, Paste

Mar 9 2020

epriestley added a comment to T13501: Improve search index normalization of "é" and other characters with variants or multiple representations.

We also have two separate pieces of ngram extraction code:

Mar 9 2020, 5:43 PM · Search
epriestley added a comment to T13501: Improve search index normalization of "é" and other characters with variants or multiple representations.

For now, I'm going to change the ngram slicing to be character-oriented. This should never be worse than the current behavior, and moves us closer to effective normalization.

Mar 9 2020, 5:38 PM · Search
epriestley added a comment to T13501: Improve search index normalization of "é" and other characters with variants or multiple representations.

This appears to be the unicode normalization chart:

Mar 9 2020, 5:27 PM · Search
epriestley triaged T13501: Improve search index normalization of "é" and other characters with variants or multiple representations as Low priority.
Mar 9 2020, 5:16 PM · Search

Jan 14 2020

epriestley closed T13472: Porter stemmer library uses obsolete array access syntax which raises warning under PHP 7.4 as Resolved by committing rPdb6b4ca480ad: Update deprecated array access syntax in Porter stemmer.
Jan 14 2020, 8:11 PM · Search
epriestley added a revision to T13472: Porter stemmer library uses obsolete array access syntax which raises warning under PHP 7.4: D20941: Update deprecated array access syntax in Porter stemmer.
Jan 14 2020, 8:04 PM · Search
epriestley added a comment to T13472: Porter stemmer library uses obsolete array access syntax which raises warning under PHP 7.4.

I thought this was some kind of complicated mess with the regex on line 420, but it's actually an issue with this:

Jan 14 2020, 8:03 PM · Search
epriestley added a revision to T13472: Porter stemmer library uses obsolete array access syntax which raises warning under PHP 7.4: D20940: Move search query compiler / stemmer classes out of libphutil.
Jan 14 2020, 7:48 PM · Search
epriestley added a revision to T13472: Porter stemmer library uses obsolete array access syntax which raises warning under PHP 7.4: D20939: Move search query parser/compiler classes to Phabricator.
Jan 14 2020, 7:41 PM · Search

Jan 13 2020

epriestley triaged T13472: Porter stemmer library uses obsolete array access syntax which raises warning under PHP 7.4 as Wishlist priority.
Jan 13 2020, 4:49 PM · Search

Sep 9 2019

epriestley closed T13412: Searching for the install URI with no trailing slash fatals as Resolved by committing rPaaaea5759133: Fix fatal during redirection safety check for searching for Phabricator base….
Sep 9 2019, 7:45 PM · Search
epriestley added a revision to T13412: Searching for the install URI with no trailing slash fatals: D20794: Fix fatal during redirection safety check for searching for Phabricator base-uri with no trailing slash.
Sep 9 2019, 7:30 PM · Search
epriestley triaged T13412: Searching for the install URI with no trailing slash fatals as Low priority.
Sep 9 2019, 5:10 PM · Search

Jul 18 2019

epriestley closed T13345: Ferret does not match documents with no title as Resolved by committing rPcb4add311649: In Ferret, allow documents with no title to match query terms by using LEFT….
Jul 18 2019, 5:37 PM · Search
amckinley updated the task description for T13345: Ferret does not match documents with no title.
Jul 18 2019, 5:26 PM · Search
epriestley added a revision to T13345: Ferret does not match documents with no title: D20660: In Ferret, allow documents with no title to match query terms by using LEFT JOIN on the "title" ranking field.
Jul 18 2019, 5:22 PM · Search
epriestley triaged T13345: Ferret does not match documents with no title as Low priority.
Jul 18 2019, 5:16 PM · Search

Mar 25 2019

epriestley closed T13091: Ferret "Relevance" order does not always have all the columns it needs available as Resolved.
Mar 25 2019, 6:58 PM · Search

Mar 19 2019

epriestley added a comment to T13091: Ferret "Relevance" order does not always have all the columns it needs available.

Also, what is "By Relevance" ?

Mar 19 2019, 6:34 PM · Search
epriestley added a revision to T13091: Ferret "Relevance" order does not always have all the columns it needs available: D20298: When paging by Ferret "rank", page using "HAVING rank > ...", not "WHERE rank > ...".
Mar 19 2019, 6:24 PM · Search

Mar 18 2019

epriestley added a revision to T13091: Ferret "Relevance" order does not always have all the columns it needs available: D20297: Select Ferret fulltext columns in results so fulltext queries work under UNION.
Mar 18 2019, 11:07 PM · Search
epriestley added a revision to T13091: Ferret "Relevance" order does not always have all the columns it needs available: D20296: Skip Ferret fulltext columns in "ORDER BY" if there's no fulltext query.
Mar 18 2019, 10:52 PM · Search

Feb 19 2019

epriestley closed T12425: User-initiated search reindex tasks can end up stuck behind import tasks in the daemon queue as Resolved by committing rP312ba3071485: Don't report search indexing errors to the daemon log except from "bin/search….
Feb 19 2019, 7:17 PM · Customer Impact, Daemons, Search, Diffusion

Feb 15 2019

epriestley added a revision to T12425: User-initiated search reindex tasks can end up stuck behind import tasks in the daemon queue: D20178: Don't report search indexing errors to the daemon log except from "bin/search index".
Feb 15 2019, 1:30 PM · Customer Impact, Daemons, Search, Diffusion
epriestley added a revision to T12425: User-initiated search reindex tasks can end up stuck behind import tasks in the daemon queue: D20177: Queue search indexing tasks at a new PRIORITY_INDEX, not PRIORITY_IMPORT.
Feb 15 2019, 1:09 PM · Customer Impact, Daemons, Search, Diffusion
epriestley added a comment to T12425: User-initiated search reindex tasks can end up stuck behind import tasks in the daemon queue.

A related issue here is exemplified in https://discourse.phabricator-community.org/t/importing-libphutil-repository-on-fresh-phabricator-triggers-an-error/2391/, which basically amounts to:

Feb 15 2019, 1:01 PM · Customer Impact, Daemons, Search, Diffusion
epriestley closed T8871: Indexing a task with 2,000 comments required a lot of RAM in mid-2015 as Resolved.

Presumably resolved elsewhere by D19503.

Feb 15 2019, 2:00 AM · Search, Daemons
epriestley moved T12425: User-initiated search reindex tasks can end up stuck behind import tasks in the daemon queue from Backlog to vNext on the Daemons board.
Feb 15 2019, 1:57 AM · Customer Impact, Daemons, Search, Diffusion

Feb 6 2019

epriestley merged T13246: Phabricator search has issues with paging when `relevance` ordering is chosen and crashes with `failed to return a value from getPagingValueMap() for column "rank"` into T13091: Ferret "Relevance" order does not always have all the columns it needs available.
Feb 6 2019, 1:18 PM · Search

Sep 20 2018

20after4 added a comment to T13196: Allow search cluster path value point to alias for Elasticsearch fulltext engine .

@epriestley: I can help with testing and validating on elasticsearch, if that helps at all.

Sep 20 2018, 12:54 PM · Search, Elasticsearch

Sep 5 2018

epriestley added a comment to T13196: Allow search cluster path value point to alias for Elasticsearch fulltext engine .

Yes. The diff is a relatively small amount of the work I must do to respond to this issue: install ElasticSearch, configure it in the way you describe, reproduce the issue, write or apply a change, verify it actually fixes the problem, go read through the documentation enough that I'm confident the fix is the best available fix and that we've understood the issue reasonably well.

Sep 5 2018, 4:28 PM · Search, Elasticsearch
Pawka added a comment to T13196: Allow search cluster path value point to alias for Elasticsearch fulltext engine .

Even if I'd provide the diff?

Sep 5 2018, 4:16 PM · Search, Elasticsearch
epriestley added a comment to T13196: Allow search cluster path value point to alias for Elasticsearch fulltext engine .

(Just to set expectations, anything which isn't coming in through Support Pacts may take a very long time for us to get to.)

Sep 5 2018, 2:27 PM · Search, Elasticsearch
Pawka triaged T13196: Allow search cluster path value point to alias for Elasticsearch fulltext engine as Normal priority.
Sep 5 2018, 6:09 AM · Search, Elasticsearch

Jul 13 2018

epriestley added a comment to T13091: Ferret "Relevance" order does not always have all the columns it needs available.

Trying to reproduce this locally just hits the ft_doc.epochModified issue. I'm not immediately sure why the behavior differs between my local install and secure, but that issue probably needs to be fixed first.

Jul 13 2018, 3:57 PM · Search
epriestley added a comment to T13091: Ferret "Relevance" order does not always have all the columns it needs available.

Actually, I'm not entirely right in merging that task -- T13163 isn't quite the same as the other two issues here. I think they're similar, but the query text is relevant in the case of T13163. Notably:

Jul 13 2018, 3:50 PM · Search
epriestley merged T13163: Unhandled Exception: Query "ManiphestTaskQuery" failed to return a value from getPagingValueMap() for column "rank". into T13091: Ferret "Relevance" order does not always have all the columns it needs available.
Jul 13 2018, 3:44 PM · Search

Jun 29 2018

aklapper added a comment to T8510: Typeahead project proposals in Maniphest advanced search do not always include exact matches.

In https://phabricator.wikimedia.org/T76732#2803813 ksmith brings up that "In the search bar in the toolbar at the top of the screen, searching for "Maps" brings up Maps as the third option. Searching for "Discovery" brings up Discovery as the second option."

Jun 29 2018, 9:37 AM · Typeahead, Prioritized, Wikimedia, Infrastructure, Search

Mar 19 2018

epriestley added a comment to T12993: Datasource queries with many "JOIN ... LIKE" can have explosive complexity.

See also T13102/PHI442. The MySQL setting optimizer_search_depth=0 may fix the weird explosive complexity here.

Mar 19 2018, 2:55 PM · Infrastructure, Typeahead, Search

Feb 24 2018

epriestley renamed T13091: Ferret "Relevance" order does not always have all the columns it needs available from Exception in search in Differential to Ferret "Relevance" order does not always have all the columns it needs available.
Feb 24 2018, 3:25 PM · Search

Jan 28 2018

epriestley closed T12974: Upgrading: "Ferret" Fulltext Engine as Resolved.

I think everything we know about has been resolved.

Jan 28 2018, 12:03 AM · Installing & Upgrading, Search

Oct 26 2017

epriestley added a revision to T12974: Upgrading: "Ferret" Fulltext Engine: D18736: Convert Ponder Questions to Ferret engine.
Oct 26 2017, 8:41 PM · Installing & Upgrading, Search

Oct 23 2017

epriestley added a revision to T12974: Upgrading: "Ferret" Fulltext Engine: D18728: Clean up virtual "_ft_rank" column for query construction of Ferret objects.
Oct 23 2017, 10:09 PM · Installing & Upgrading, Search

Oct 3 2017

clark.woo.12 added a comment to T12974: Upgrading: "Ferret" Fulltext Engine.

@epriestley
After i upgraded my application to the newest version,which swap the search engine to ‘Ferret’,I reindexed several times,but I can not get any result.These is my version information:
phabricator33756bcf1d70ea5579dff1ab276bbe660d10494c (Tue, Oct 3) (branched from f9110b87abf337dd1e7714d755775e53cffd4db9 on origin)arcanist0a7f403333fe9082b39bd007b9d5f9e765c8b9ce (Tue, Oct 3) (branched from c804c5026011f27614a7bbdb2bb32cab590d68ca on origin)phutilb400c6b04bb247a3e0f1941390bc450f36ac2ccd (Tue, Oct 3) (branched from 9f9c33797a3ebbf1c4dcaa474a0c4e0b32d5396a on origin)diff2.8.1 at /usr/bin/diffgit1.7.1 at /usr/bin/githgNot Availablepygmentize2.0.2 at /usr/bin/pygmentizesvn1.6.11 at /usr/bin/svn

Oct 3 2017, 10:15 AM · Installing & Upgrading, Search
clark.woo.12 added a comment to T12974: Upgrading: "Ferret" Fulltext Engine.
Oct 3 2017, 10:14 AM · Installing & Upgrading, Search

Sep 30 2017

jcarrillo7 added a comment to T12974: Upgrading: "Ferret" Fulltext Engine.

Looks to be caused by this commit rPb75a4151c8996c5dfb1c8c14378fe3259666eac2

Sep 30 2017, 11:14 PM · Installing & Upgrading, Search
jcarrillo7 added a comment to T12974: Upgrading: "Ferret" Fulltext Engine.

https://discourse.phabricator-community.org/t/daemons-tasks-crashing-in-a-loop-during-reindex/506

Sep 30 2017, 11:10 PM · Installing & Upgrading, Search
jcarrillo7 added a comment to T12974: Upgrading: "Ferret" Fulltext Engine.

Running the reindex led to a couple crash looping tasks on my end:

Sep 30 2017, 11:10 PM · Installing & Upgrading, Search

Sep 26 2017

epriestley added a revision to T12974: Upgrading: "Ferret" Fulltext Engine: D18649: Improve performance of Ferret engine ngram extraction, particularly for large input strings.
Sep 26 2017, 4:21 PM · Installing & Upgrading, Search
epriestley added a revision to T12974: Upgrading: "Ferret" Fulltext Engine: D18648: Improve search stemmer performance for large inputs.
Sep 26 2017, 2:18 AM · Installing & Upgrading, Search
epriestley added a revision to T12974: Upgrading: "Ferret" Fulltext Engine: D18647: Improve Ferret engine indexing performance for large blocks of text.
Sep 26 2017, 2:15 AM · Installing & Upgrading, Search

Sep 22 2017

epriestley closed T12995: When search terms contain CJK characters, default to substring mode as Resolved by committing rP1ac52c09e757: Improve search highlighting for CJK and substring queries.
Sep 22 2017, 6:34 PM · Localization, Search
epriestley added a revision to T12995: When search terms contain CJK characters, default to substring mode: D18635: Improve search highlighting for CJK and substring queries.
Sep 22 2017, 3:15 PM · Localization, Search
epriestley added a revision to T12995: When search terms contain CJK characters, default to substring mode: D18634: Default CJK query terms to "substring" mode, not "term" mode.
Sep 22 2017, 2:25 PM · Localization, Search