This query is always always driven by MySQL whether ElasticSearch is configured or not (and never served by any FULLTEXT engine), so I would not expect moving to ElasticSearch to have any effect here.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Sep 1 2017
Sep 1 2017
20after4 added a comment to rP72cb3d3c8490: Limit the damage that degenerate project name typeahead queries can cause.
20after4 added a comment to rP72cb3d3c8490: Limit the damage that degenerate project name typeahead queries can cause.
FWIW we ran into this problem at wikimedia. In fact, I think it was the thing that killed innodb enough times to motivate us to move to elasticsearch. Elastic handles whatever you throw at it.
20after4 awarded D18513: Separate fulltext engine extensions into "enrich" and "index" phases a Like token.
Aug 30 2017
Aug 30 2017
20after4 added a comment to T12965: When no "master" database is configured, the ElasticSearch setup check can fatal.
Whoops this is probably my bad :-/
Aug 10 2017
Aug 10 2017
Jul 18 2017
Jul 18 2017
20after4 added a comment to T12819: InnoDB FULLTEXT appears to fail catastrophically once it reaches a moderate size.
For phabricator.wikimedia.org, we decided to go with elasticsearch, partially because we already have a massive elasticsearch cluster and a lot of institutional elasticsearch knowledge / experience. My opinion is that we made the right choice. I believe this opinion is shared by most of the folks who use our phabricator on a daily basis. I've seen zero complaints about search since we made the switch, which is a huge improvement from what I saw with mysql FTS. Conclusion: Elasticsearch seems to perform well and the results are generally better (obviously this is subjective, but like I said, no complaints from users)
looks good to me and confirmed to work by testing locally on my test instance running upstream master branch.
Jul 13 2017
Jul 13 2017
In T12896#228825, @jmeador wrote:We"solved" the problem with a small python daemon that pings repository_statusmessage.epoch every minute and calculates the delta. When these deltas surpass Phabricator's max delta (21,600 sec) we page the oncall. For repositories that are mission critical, we use a smaller value. It obviously isn't a perfect solution, but it gets the job done and it's been extremely stable.
20after4 added a comment to T12909: Calendar ignores email preferences if a user is invited via a project.
I've seen complaints about this on phabricator.wikimedia.org as well, so I can confirm that this is an issue.
Jun 27 2017
Jun 27 2017
doh! nice. I obviously need more coffee.
I know just maintaining the hooks is more work but certainly less than what's involved in maintaining the graph code.
Maybe provide some extension hook-points in diffusion so that the community can maintain the commit graph visualization as an extension?
Jun 20 2017
Jun 20 2017
20after4 awarded T12855: In PHP7, "Throwable" and "Error" are exciting new exception classes a Baby Tequila token.
Jun 18 2017
Jun 18 2017
Jun 16 2017
Jun 16 2017
That does sound powerful. +1
20after4 awarded T11439: Retrieve Diff PHID via phid.lookup a Love token.
Jun 14 2017
Jun 14 2017
I have spent a little time poking at code, trying to figure out how to build a simple template engine into the description field, so that you can reference hidden custom fields by {field-identifier} or similar from within the remarkup. I resigned myself to building something just like publish.php, where the descriptions are maintained via conduit, however, I still long for it to be built in.
It would be really nice to default newly added fields to hidden. Going through 20+ forms to hide the fields is tedious.
Jun 13 2017
Jun 13 2017
Indeed, I tested this and removing the setDisplayName retains the correct behavior (show the monogram AND let you search by monogram)
20after4 committed rPe1850b3c4e4e: Allow dashboard panels to be found by monogram (authored by 20after4).
Allow dashboard panels to be found by monogram
This seems like it was probably just an oversight and it's a trivial change so I'm upstreaming. Already in queue to be deployed downstream in Wikimedia's install.
Jun 10 2017
Jun 10 2017
20after4 added a comment to T7593: Allow administrators to disable files to prevent "l33t w4r3z" abuse cases.
FWIW we have seen several users attempting to distribute l33t w4r3z via Wikimedia's instance of Phabricator. I had to set file upload limits to < 8MB in order to prevent chunked file storage.
This sounds REALLY useful. Relatedly, it would be useful if I could auto-generate a task for each recurrence of an event.
20after4 added a comment to T12819: InnoDB FULLTEXT appears to fail catastrophically once it reaches a moderate size.
:(
Jun 7 2017
Jun 7 2017
In T12804#226348, @epriestley wrote:You can use, for example, -C3 with git grep to show 3 lines of Context around each match.
About 95% of the time when I'm searching for content it's because I want to edit a subset of the callsites. I do this with git grep some_function | maybe_more_grep | give, which opens all matches in my editor. This would take me about 100 years from the web UI for a nontrivial number of results. Do you have some other use case for searching for content?
In T12804#226366, @epriestley wrote:When I view a repository, it would be incredibly helpful if the files I most recently touched were at the top of the UI.
What I would really love to see is global filename search. Currently you can search for a filename within one repository but you have to know which repo to look in first.
We have thousands of repositories and it's hard to differentiate in a list which repository I need.
May 22 2017
May 22 2017
20after4 added a comment to T8646: Provide more context for search results, particularly wiki documents.
FWIW I partially implemented this in Wikimedia's fork, and I did so in a reusable way. I'd like to eventually upstream it but I'm not sure that my approach is desired upstream. I'll give it another shot though.
not upstream-able.
Yeah I think we are good now.
20after4 awarded T12733: (2017 Week 20) Inline Comments Errata / Feedback a Like token.
May 8 2017
May 8 2017
phabricator trading cards ftw.
I think that @epriestley just made capitalism and socialism obsolete by re-imagining everything in terms of Mana points. You sir, have won the internet.
20after4 awarded T8236: `arc weld` should do something a Pirate Logo token.
This should clearly be priority: Unbreak now!
May 2 2017
May 2 2017
20after4 awarded T12003: Explain to users how fulltext queries are parsed and executed a Love token.
@epriestley: I'll submit a new revision that expands on the resultset class as I mentioned above. Of course this is without any expectations regarding when you'll have time to look at it :)
Apr 26 2017
Apr 26 2017
Changing the type is going to run into the issue of what to do about the fields which differ between the two types. Fields which are present in the old type will continue to be displayed unless you clear them when changing types.
Apr 24 2017
Apr 24 2017
@epriestley: Given that this is an elastic-only feature and not really wanted upstream, and given that you have implemented result sets as objects in rP3245e74f16bb: Show users how fulltext search queries are parsed and executed; don't query…, I think I should abandon this and submit a new revision with just the PhabricatorSearchResultEngineExtension infrastructure.
Apr 8 2017
Apr 8 2017
Apr 7 2017
Apr 7 2017
In D17608#211910, @epriestley wrote:I think this approach is at least somewhere in the realm of reasonable, yeah.
The extensions should probably get passed a list of all results at once, not individual results, so they can bulk-load additional data (for example, hierarchies for wiki documents or thumbnails for files or whatever) instead of needing to do 100 queries to show 100 items.
So maybe the API ends up looking like $extension->messAroundWithTheseViewsBeforeWeShowThemToTheUserIfYouWant($items), but general flow seems reasonable to me.
Apr 5 2017
Apr 5 2017
Warning: This code is still a bit messy, I will clean things up a bit more in a future diff.
Significant refactor to use a PhabricatorFulltextResult object to represent search
result hits. Results views are made extensible using the EngineExtension pattern:
Apr 4 2017
Apr 4 2017
Just to see what would happen, I tried returning 100 dummy results + the real results. That didn't seem to satisfy the pager, so perhaps I'm wrong about what's actually happening. I spent quite a bit of time tracing through PhabricatorPolicyAwareQuery and still didn't ascertain exactly what is going on, however, it seems like it's skipping the results in PhabricatorPolicyAwareQuery::execute() right around line 211
In D17615#211735, @epriestley wrote:With the HEAD behavior we return 100 results for offset = 700, limit = 100 but with the proposed behavior we return 800, which is a technical argument for fixing this somewhere above the engine in the stack.
The problem is that we do pagination at a higher layer and the cursor skips 100 results when the offset is set to 100. So if we also apply an offset to elastic, then we skip 100 results in two places, thus skipping to the 200th result.
20after4 added a revision to T12450: New Search Configuration Errata: D17615: Don't apply offset to elasticsearch query.
track upstream branch so arcanist can push to staging.
In D17608#211696, @epriestley wrote:I lean toward having this return a SearchResult kind of object which has methods like getHandle() and getBodySnippet() and whatever. I think we need this for T8646 anyway. It doesn't make sense to put stuff like "breadcrumb hierarchy" or "thumbnail document preview" or whatever on handles, since there's like a 99% chance that only global search results will ever use them.
In D17608#211656, @epriestley wrote:Finally, it overrides subtitles on handles which really have them. Today I think this is only Users, but could be more types of objects in the future.
In D17608#211656, @epriestley wrote:I'm not really comfortable trusting Elasticsearch to return a safe blob of HTML here: it seems like we're putting a lot of trust in it by doing this (willingly applying PhutilSafeHTML() to anything it returns) but not really getting very much benefit (basically just some highlighting, which we already have code for elsewhere) relative to the level of trust involved.
In theory there's almost no possible way Elasticsearch could get things wrong since the task is so simple, and I don't think this is likely to ever lead to a security issue, it just creates a very large amount of new attack surface which we don't really need to have.
Apr 3 2017
Apr 3 2017
I implemented this recently but then pulled it out because ... project slug lookup is a gigantic mess right now.
20after4 awarded D17596: Remove FIELD_KEYWORDS, index project slugs as body content a Like token.
20after4 added a comment to T8238: Formally support side-band change handoff in external repositories.
@dreadlord2203 indeed, hiding those tags would be useful. I found one problem with using a separate repository for change handoff, and that is access control. If you maintain several repositories with different access controls, then you would have to duplicate each of them and maintain a second set of access controls for the corresponding staging repositories. This is a lot of extra work and it would be easy to make a mistake and leak the staging repo to people who shouldn't be able to see it.
20after4 awarded T12493: Upgrading: Fulltext Search Services a Love token.
Apr 2 2017
Apr 2 2017
Maybe let bin/search commands target a specific service.
It would be nice to build a practical test suite instead, where we put specific documents into the index and then search for them.
It's definitely not obvious that objects are skipped based on the object version rather than what's in the index. So yeah this is helpful.
Good idea. Definitely much easier than storing per-index version numbers and trying to keep those in sync.
20after4 accepted D17599: After a fulltext write to a particular service fails, keep trying writes to other services.
Yes this seems reasonable.
Really try to avoid suggesting anyone configure Elasticsearch ever for any reason.
nice! Big improvement over my initial document.
Mar 30 2017
Mar 30 2017
So I ran into one problem when deploying the latest code to WMF production. We now throw a setup error if we have a cluster with no readable hosts. It actually makes sense to have a cluster that is write-only so that setup error is bogus.
20after4 added a revision to T12450: New Search Configuration Errata: D17580: Set content-type to application/json.
20after4 added a revision to T12450: New Search Configuration Errata: D17581: Make sure writes go to the right cluster.
Make sure writes go to the right cluster
Set content-type to application/json
I figured out that one component of the WMF scaling issue was caused by the very pathological case of searching for a word that appears in very many documents, exacerbated by lots simultaneous queries fired off from users repeatedly searching from the typeahead in the related tasks editor. I agree that we still don't know the exact cause of the 100x slowdown.
@cos: probably should not have a default value.
• Ruthvika awarded rP654f0f6043f8: Make messages translatable and more sensible. a Like token.
Mar 29 2017
Mar 29 2017
Don't use $future->write, that doesn't work in all cases
Mar 28 2017
Mar 28 2017
20after4 committed rP654f0f6043f8: Make messages translatable and more sensible. (authored by 20after4).
Make messages translatable and more sensible.