20after4 (Mukunda Modell)
Release Engineer @ Wikimedia Foundation

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Nov 28 2011, 9:35 AM (286 w, 1 d)
Availability
Available

Recent Activity

Yesterday

20after4 added a comment to T8646: Provide more context for search results, particularly wiki documents.

FWIW I partially implemented this in Wikimedia's fork, and I did so in a reusable way. I'd like to eventually upstream it but I'm not sure that my approach is desired upstream. I'll give it another shot though.

Mon, May 22, 3:51 AM · Design & Planning, Search, Restricted Project
20after4 abandoned D13581: Support custom fields in diffusion commit property views..

not upstream-able.

Mon, May 22, 3:46 AM
20after4 added a comment to D17615: Don't apply offset to elasticsearch query.

Yeah I think we are good now.

Mon, May 22, 3:46 AM
20after4 abandoned D17615: Don't apply offset to elasticsearch query.
Mon, May 22, 3:46 AM
20after4 awarded T12733: (2017 Week 20) Inline Comments Errata / Feedback a Like token.
Mon, May 22, 2:25 AM · Inline Comments, Installing & Upgrading, Differential

Mon, May 8

20after4 added a comment to T12681: Upcoming Support Product / End of Paid Prioritization.

phabricator trading cards ftw.

Mon, May 8, 6:21 PM · Guides, Phacility
20after4 added a comment to T12681: Upcoming Support Product / End of Paid Prioritization.

I think that @epriestley just made capitalism and socialism obsolete by re-imagining everything in terms of Mana points. You sir, have won the internet.

Mon, May 8, 5:52 PM · Guides, Phacility
20after4 awarded T8236: `arc weld` should do something a Pirate Logo token.
Mon, May 8, 5:46 PM · Contributor Onboarding, Arcanist
20after4 added a comment to T8236: `arc weld` should do something.

This should clearly be priority: Unbreak now!

Mon, May 8, 5:45 PM · Contributor Onboarding, Arcanist

Tue, May 2

20after4 awarded T12003: Explain to users how fulltext queries are parsed and executed a Love token.
Tue, May 2, 9:27 PM · Search
20after4 abandoned D17608: Add highlighting support to Elasticsearch fulltext engine.

@epriestley: I'll submit a new revision that expands on the resultset class as I mentioned above. Of course this is without any expectations regarding when you'll have time to look at it :)

Tue, May 2, 9:26 PM

Wed, Apr 26

20after4 added a comment to T12314: Support formal task subtypes (like "bug" vs "feature").

Changing the type is going to run into the issue of what to do about the fields which differ between the two types. Fields which are present in the old type will continue to be displayed unless you clear them when changing types.

Wed, Apr 26, 3:47 PM · EditEngine, Maniphest, Custom Fields, Feature Request

Mon, Apr 24

20after4 added a comment to D17608: Add highlighting support to Elasticsearch fulltext engine.

@epriestley: Given that this is an elastic-only feature and not really wanted upstream, and given that you have implemented result sets as objects in rP3245e74f16bb: Show users how fulltext search queries are parsed and executed; don't query…, I think I should abandon this and submit a new revision with just the PhabricatorSearchResultEngineExtension infrastructure.

Mon, Apr 24, 10:05 PM

Apr 8 2017

20after4 accepted rP699ab153e375: (stable) Promote 2017 Week 14.
Apr 8 2017, 6:16 PM

Apr 7 2017

20after4 added a comment to D17608: Add highlighting support to Elasticsearch fulltext engine.

I think this approach is at least somewhere in the realm of reasonable, yeah.

The extensions should probably get passed a list of all results at once, not individual results, so they can bulk-load additional data (for example, hierarchies for wiki documents or thumbnails for files or whatever) instead of needing to do 100 queries to show 100 items.

So maybe the API ends up looking like $extension->messAroundWithTheseViewsBeforeWeShowThemToTheUserIfYouWant($items), but general flow seems reasonable to me.

Apr 7 2017, 9:55 AM

Apr 5 2017

20after4 planned changes to D17608: Add highlighting support to Elasticsearch fulltext engine.

Warning: This code is still a bit messy, I will clean things up a bit more in a future diff.

Apr 5 2017, 7:03 AM
20after4 updated the diff for D17608: Add highlighting support to Elasticsearch fulltext engine.

Significant refactor to use a PhabricatorFulltextResult object to represent search
result hits. Results views are made extensible using the EngineExtension pattern:

Apr 5 2017, 6:57 AM

Apr 4 2017

20after4 added a comment to D17615: Don't apply offset to elasticsearch query.

Just to see what would happen, I tried returning 100 dummy results + the real results. That didn't seem to satisfy the pager, so perhaps I'm wrong about what's actually happening. I spent quite a bit of time tracing through PhabricatorPolicyAwareQuery and still didn't ascertain exactly what is going on, however, it seems like it's skipping the results in PhabricatorPolicyAwareQuery::execute() right around line 211

Apr 4 2017, 5:44 PM
20after4 added a comment to D17615: Don't apply offset to elasticsearch query.

With the HEAD behavior we return 100 results for offset = 700, limit = 100 but with the proposed behavior we return 800, which is a technical argument for fixing this somewhere above the engine in the stack.

Apr 4 2017, 5:35 PM
20after4 added a comment to D17615: Don't apply offset to elasticsearch query.

The problem is that we do pagination at a higher layer and the cursor skips 100 results when the offset is set to 100. So if we also apply an offset to elastic, then we skip 100 results in two places, thus skipping to the 200th result.

Apr 4 2017, 5:33 PM
20after4 added a task to D17615: Don't apply offset to elasticsearch query: T12450: New Search Configuration Errata.
Apr 4 2017, 5:18 PM
20after4 added a revision to T12450: New Search Configuration Errata: D17615: Don't apply offset to elasticsearch query.
Apr 4 2017, 5:18 PM · Search
20after4 updated the diff for D17615: Don't apply offset to elasticsearch query.

track upstream branch so arcanist can push to staging.

Apr 4 2017, 5:17 PM
20after4 created D17615: Don't apply offset to elasticsearch query.
Apr 4 2017, 5:15 PM
20after4 added a revision to T8285: Global advanced search only shows max. 100 results; pagination link ("Next") shown but empty: D17615: Don't apply offset to elasticsearch query.
Apr 4 2017, 5:15 PM · Search
20after4 added 1 mock(s) for T8646: Provide more context for search results, particularly wiki documents: M1474: Show snippets in search results....
Apr 4 2017, 4:40 PM · Design & Planning, Search, Restricted Project
20after4 added 1 task(s) for M1474: Show snippets in search results...: T8646: Provide more context for search results, particularly wiki documents.
Apr 4 2017, 4:40 PM · Search
20after4 created M1474: Show snippets in search results....
Apr 4 2017, 4:39 PM · Search
20after4 added a comment to D17608: Add highlighting support to Elasticsearch fulltext engine.

I lean toward having this return a SearchResult kind of object which has methods like getHandle() and getBodySnippet() and whatever. I think we need this for T8646 anyway. It doesn't make sense to put stuff like "breadcrumb hierarchy" or "thumbnail document preview" or whatever on handles, since there's like a 99% chance that only global search results will ever use them.

Apr 4 2017, 4:15 PM
20after4 added a comment to D17608: Add highlighting support to Elasticsearch fulltext engine.

Finally, it overrides subtitles on handles which really have them. Today I think this is only Users, but could be more types of objects in the future.

Apr 4 2017, 3:59 PM
20after4 added a comment to D17608: Add highlighting support to Elasticsearch fulltext engine.

I'm not really comfortable trusting Elasticsearch to return a safe blob of HTML here: it seems like we're putting a lot of trust in it by doing this (willingly applying PhutilSafeHTML() to anything it returns) but not really getting very much benefit (basically just some highlighting, which we already have code for elsewhere) relative to the level of trust involved.

In theory there's almost no possible way Elasticsearch could get things wrong since the task is so simple, and I don't think this is likely to ever lead to a security issue, it just creates a very large amount of new attack surface which we don't really need to have.

Apr 4 2017, 3:54 PM
20after4 created D17608: Add highlighting support to Elasticsearch fulltext engine.
Apr 4 2017, 2:10 AM
20after4 added a revision to T12500: Render highlighted search result body snippets in global search results views: D17608: Add highlighting support to Elasticsearch fulltext engine.
Apr 4 2017, 2:10 AM · Elasticsearch, Search, Feature Request

Apr 3 2017

20after4 created T12500: Render highlighted search result body snippets in global search results views.
Apr 3 2017, 5:41 PM · Elasticsearch, Search, Feature Request
20after4 added a comment to T7860: When users search for "#x y", treat "#x" as a project hashtag.

I implemented this recently but then pulled it out because ... project slug lookup is a gigantic mess right now.

Apr 3 2017, 3:16 PM · Search
20after4 awarded D17596: Remove FIELD_KEYWORDS, index project slugs as body content a Like token.
Apr 3 2017, 2:50 PM
20after4 added a comment to T8238: Formally support side-band change handoff in external repositories.

@dreadlord2203 indeed, hiding those tags would be useful. I found one problem with using a separate repository for change handoff, and that is access control. If you maintain several repositories with different access controls, then you would have to duplicate each of them and maintain a second set of access controls for the corresponding staging repositories. This is a lot of extra work and it would be easy to make a mistake and leak the staging repo to people who shouldn't be able to see it.

Apr 3 2017, 2:48 PM · Restricted Project, Restricted Project, Diffusion, Harbormaster
20after4 awarded T12493: Upgrading: Fulltext Search Services a Love token.
Apr 3 2017, 2:44 PM · Elasticsearch, Search, Guides

Apr 2 2017

20after4 added a comment to T12450: New Search Configuration Errata.

Maybe let bin/search commands target a specific service.

Apr 2 2017, 8:56 PM · Search
20after4 added a comment to T12450: New Search Configuration Errata.
It would be nice to build a practical test suite instead, where we put specific documents into the index and then search for them.
Apr 2 2017, 8:49 PM · Search
20after4 accepted D17597: Count and report skipped documents from "bin/search index".

It's definitely not obvious that objects are skipped based on the object version rather than what's in the index. So yeah this is helpful.

Apr 2 2017, 8:44 PM
20after4 accepted D17598: When "cluster.search" changes, don't trust the old index versions.

Good idea. Definitely much easier than storing per-index version numbers and trying to keep those in sync.

Apr 2 2017, 8:41 PM
20after4 accepted D17599: After a fulltext write to a particular service fails, keep trying writes to other services.

Yes this seems reasonable.

Apr 2 2017, 8:39 PM
20after4 added a comment to D17600: Remove "url" from Elasticsearch index.

plustwo

Apr 2 2017, 8:37 PM
20after4 added a comment to D17602: Adjust and wordsmith Search documentation.

Really try to avoid suggesting anyone configure Elasticsearch ever for any reason.

Apr 2 2017, 8:09 PM
20after4 accepted D17602: Adjust and wordsmith Search documentation.

nice! Big improvement over my initial document.

Apr 2 2017, 8:07 PM

Mar 30 2017

20after4 updated the task description for T12450: New Search Configuration Errata.
Mar 30 2017, 6:11 PM · Search
20after4 added a comment to T12450: New Search Configuration Errata.

So I ran into one problem when deploying the latest code to WMF production. We now throw a setup error if we have a cluster with no readable hosts. It actually makes sense to have a cluster that is write-only so that setup error is bogus.

Mar 30 2017, 6:10 PM · Search
20after4 added a task to D17580: Set content-type to application/json: T12450: New Search Configuration Errata.
Mar 30 2017, 6:09 PM
20after4 added a revision to T12450: New Search Configuration Errata: D17580: Set content-type to application/json.
Mar 30 2017, 6:09 PM · Search
20after4 added a revision to T12450: New Search Configuration Errata: D17581: Make sure writes go to the right cluster.
Mar 30 2017, 6:09 PM · Search
20after4 added a task to D17581: Make sure writes go to the right cluster: T12450: New Search Configuration Errata.
Mar 30 2017, 6:09 PM
20after4 committed rPcb1d90465447: Make sure writes go to the right cluster (authored by 20after4).
Make sure writes go to the right cluster
Mar 30 2017, 6:08 PM
20after4 closed D17581: Make sure writes go to the right cluster by committing rPcb1d90465447: Make sure writes go to the right cluster.
Mar 30 2017, 6:08 PM
20after4 committed rP67a1c4047647: Set content-type to application/json (authored by 20after4).
Set content-type to application/json
Mar 30 2017, 6:07 PM
20after4 closed D17580: Set content-type to application/json by committing rP67a1c4047647: Set content-type to application/json.
Mar 30 2017, 6:07 PM
20after4 added a comment to D17575: Provide some guidance about elasticsearch in cluster docs.

I figured out that one component of the WMF scaling issue was caused by the very pathological case of searching for a word that appears in very many documents, exacerbated by lots simultaneous queries fired off from users repeatedly searching from the typeahead in the related tasks editor. I agree that we still don't know the exact cause of the 100x slowdown.

Mar 30 2017, 6:05 PM
20after4 added a comment to T12314: Support formal task subtypes (like "bug" vs "feature").

@cos: probably should not have a default value.

Mar 30 2017, 5:21 PM · EditEngine, Maniphest, Custom Fields, Feature Request
Ruthvika awarded rP654f0f6043f8: Make messages translatable and more sensible. a Like token.
Mar 30 2017, 9:56 AM
20after4 created D17581: Make sure writes go to the right cluster.
Mar 30 2017, 2:31 AM

Mar 29 2017

20after4 updated the diff for D17580: Set content-type to application/json.

Don't use $future->write, that doesn't work in all cases

Mar 29 2017, 11:48 PM
20after4 created D17580: Set content-type to application/json.
Mar 29 2017, 9:48 PM

Mar 28 2017

20after4 committed rP654f0f6043f8: Make messages translatable and more sensible. (authored by 20after4).
Make messages translatable and more sensible.
Mar 28 2017, 11:17 PM
20after4 closed D17578: Make messages translatable and more sensible. by committing rP654f0f6043f8: Make messages translatable and more sensible..
Mar 28 2017, 11:17 PM
20after4 created D17578: Make messages translatable and more sensible..
Mar 28 2017, 11:14 PM
20after4 accepted rP8879118b696f: Fix a mid-air collision around SearchService roles.
Mar 28 2017, 10:13 PM
20after4 added a task to D17575: Provide some guidance about elasticsearch in cluster docs: T12450: New Search Configuration Errata.
Mar 28 2017, 10:07 PM
20after4 added a revision to T12450: New Search Configuration Errata: D17575: Provide some guidance about elasticsearch in cluster docs.
Mar 28 2017, 10:07 PM · Search
20after4 created D17575: Provide some guidance about elasticsearch in cluster docs.
Mar 28 2017, 10:06 PM
20after4 accepted D17574: Re-run config validation from `bin/search`.
Mar 28 2017, 9:46 PM
20after4 committed rP699228c73b74: Address some New Search Configuration Errata (authored by 20after4).
Address some New Search Configuration Errata
Mar 28 2017, 8:19 PM
20after4 added a commit to T12450: New Search Configuration Errata: rP699228c73b74: Address some New Search Configuration Errata.
Mar 28 2017, 8:19 PM · Search
20after4 closed D17564: Address some New Search Configuration Errata by committing rP699228c73b74: Address some New Search Configuration Errata.
Mar 28 2017, 8:19 PM
20after4 added a comment to D17564: Address some New Search Configuration Errata.

Just as a general workflow suggestion, I'd encourage you to do this as a bunch of small changes instead of one big "fix everything" change

Mar 28 2017, 8:16 PM
20after4 added a comment to D17572: Make `bin/search init` messaging a little more consistent.

Seems legit.

Mar 28 2017, 8:00 PM
20after4 accepted D17572: Make `bin/search init` messaging a little more consistent.
Mar 28 2017, 8:00 PM
20after4 accepted D17573: Remove PhabricatorSearchEngineTestCase.
Mar 28 2017, 7:59 PM
20after4 accepted D17571: Fix isReadable() and isWritable() in SearchService.
Mar 28 2017, 7:58 PM
20after4 committed rP9e2f263bb49c: Add repositories to fulltext search index. (authored by 20after4).
Add repositories to fulltext search index.
Mar 28 2017, 7:58 AM
20after4 closed D17300: Add repositories to fulltext search index. by committing rP9e2f263bb49c: Add repositories to fulltext search index..
Mar 28 2017, 7:58 AM · Diffusion, Search
20after4 updated the diff for D17300: Add repositories to fulltext search index..

push to staging for harbormaster

Mar 28 2017, 7:56 AM · Diffusion, Search
20after4 updated the diff for D17300: Add repositories to fulltext search index..

fix '\n'

Mar 28 2017, 7:55 AM · Diffusion, Search
20after4 updated the summary of D17564: Address some New Search Configuration Errata.
Mar 28 2017, 7:41 AM
20after4 updated the diff for D17564: Address some New Search Configuration Errata.

Better formatting of setup warning messages.

Mar 28 2017, 7:41 AM
20after4 requested review of D17564: Address some New Search Configuration Errata.

this fixes the stemmer and tokenizer to do a better job of matching words.separated.by.punctuation as well as other issues found by @epriestley.

Mar 28 2017, 1:42 AM
20after4 updated the summary of D17564: Address some New Search Configuration Errata.
Mar 28 2017, 1:40 AM
20after4 updated the task description for T12450: New Search Configuration Errata.
Mar 28 2017, 1:38 AM · Search
20after4 added a comment to T12450: New Search Configuration Errata.

I've updated D17564: Address some New Search Configuration Errata to address the tokenization and word stemming issues.

Mar 28 2017, 1:35 AM · Search
20after4 updated the diff for D17564: Address some New Search Configuration Errata.
  • Fixed the stemmer. user matches users and vise-versa.
  • Added a different tokenizer so that this.is.a.test tokenizes to the following:
  • this.is.a.test
    • this
    • is
    • a
    • test
Mar 28 2017, 1:34 AM

Mar 27 2017

20after4 updated the diff for D17564: Address some New Search Configuration Errata.

trying once more...

Mar 27 2017, 2:49 PM
20after4 updated the diff for D17564: Address some New Search Configuration Errata.

Try to make harbormaster happy by setting repository.callsign globally in ~/.arcrc

Mar 27 2017, 2:48 PM
20after4 created D17564: Address some New Search Configuration Errata.
Mar 27 2017, 2:41 PM
20after4 added a revision to T12450: New Search Configuration Errata: D17564: Address some New Search Configuration Errata.
Mar 27 2017, 2:41 PM · Search
20after4 added a comment to T10640: Allow application queries to be promoted as global search modes.

Maniphest advanced search is somewhat buried, indeed. I think one easy solution to this would be to add "Task search" to the main phab menu (using the new custom menus feature)... In fact, I think I will do that now at https://phabricator.wikimedia.org

Mar 27 2017, 10:21 AM · Search, Feature Request
20after4 awarded D17563: Cleaner fullscreen / preview states for Remarkup bar a Love token.
Mar 27 2017, 10:14 AM
20after4 added a comment to T12450: New Search Configuration Errata.

  • Searching for f*a*c*t*o*r*y*s*u*r*p*l*u*s*z*z*q*q*z*z*q*q produces nonsenical results (many results, when I would expect no results: the results do not contain that sequence of letters in order).
  • Searching or user fails to find task Grant users tokens when a mention is created, suggesting that stemming is not working.
  • Searching for users finds that task, but fails to find a task containing "per user per month" in a comment, also suggesting that stemming is not working.
  • Searching for maniphest fails to find task maniphest.query elephant, suggesting that tokenization is ElasticSearch is not as good as the MySQL tokenization for these words (see D17330).
Mar 27 2017, 9:45 AM · Search
20after4 added a comment to T12450: New Search Configuration Errata.

f*a*c*t*o*r*y*s*u*r*p*l*u*s*z*z*q*q*z*z*q*q returns the same results as
f a c t o r y s u r p l u s z z q q z z q q so it appears to be treating those as individual single-letter tokens. strange.

Mar 27 2017, 9:37 AM · Search
20after4 added a comment to T12443: Applying fulltext limits first causes missing results.

I think it would make a lot of sense to construct the two queries separately (and in parallel) with a short timeout, then handle the timeout gracefully allowing the user to refine their query further. This would avoid the denial of service situation which happened to Wikimedia more than once due to users repeatedly executing really expensive searches until mysql fell over from the load.

Mar 27 2017, 8:43 AM · Restricted Project, Search, Bug Report

Mar 26 2017

20after4 added a comment to T12450: New Search Configuration Errata.

I ran into a lot of confusion because the versioned object indexes are not namespaced per-service. Basically, if you insert version 95 of a document into Elastic, the indexer thinks that version 95 doesn't need to go into MySQL, even though it does. So when you run bin/search index ..., you may get only a subset of the updates you actually need. The object index versions need to change to become engine-aware so they are stored per-service, not globally, and/or the whole mechanism needs to include a hash of cluster.search or just be turned off. Until this is fixed, it can be worked around with using --force everywhere.

bin/search index might reasonably provide summary output about this ("392 documents were not indexed because they haven't changed, use --force to update them.").

Mar 26 2017, 11:08 PM · Search
20after4 added a comment to T12450: New Search Configuration Errata.

  • Searching for f*a*c*t*o*r*y*s*u*r*p*l*u*s*z*z*q*q*z*z*q*q produces nonsenical results (many results, when I would expect no results: the results do not contain that sequence of letters in order).
  • Searching or user fails to find task Grant users tokens when a mention is created, suggesting that stemming is not working.
  • Searching for users finds that task, but fails to find a task containing "per user per month" in a comment, also suggesting that stemming is not working.
  • Searching for maniphest fails to find task maniphest.query elephant, suggesting that tokenization is ElasticSearch is not as good as the MySQL tokenization for these words (see D17330).
  • Searching for users -blue returns a huge number of results: significantly more than users. Expected behavior: fewer results, omitting those results matching blue.
  • Searching for users blue returns more results than users or blue. Expected behavior: fewer results, because only results which match "users" AND "blue" are returned. The result set includes completely irrelevant results.
Mar 26 2017, 10:59 PM · Search