User Details
- User Since
- Aug 19 2016, 5:35 PM (446 w, 6 d)
- Availability
- Available
Aug 1 2017
@ofbeaton Thanks for the tips. Sounds better than the approach we are using to slightly modify the code base.
@ofbeaton Just curious what's your strategy on applying patches as part of your deployment for phabricator? Do you use a find/replace strategy? Apply diffs?
Apr 20 2017
Yes, it would. We haven't fully integrated the build system with Phabricator, but I'll review the docs and see how much more work we need to do.
Engineers are not hitting a prompt when landing, they are disregarding the unit test failure message or not reviewing why the build failed. Since the build failure is on an external system, the engineer needs to view this information separately (the system posts a comment why the build failed). The overall goal of a herald rule blocking the land is so that owner can communicate that checking in code without running or successful unit tests is not cool.
Dec 22 2016
Thanks I made the smtp-encoding change. We're just about ready to update to a vanilla install. If it doesn't fix it and I can provide reproducible steps, I'll update this bug report.
We're running a fork and sorry, I know it's a little behind stable. We're working on getting back to a vanilla install to make updates easier.
$ git merge-base HEAD origin/master 33bf2a79def2e03c2248730afa080ead489942be
Oct 12 2016
@jcox I just noticed this line from you:
@jcox, I dove into this a bit, but like you, I didn't come up with much. We have not used bin/remove destroy for any of our repos and there doesn't appear to be any meaningful relationship between the failing tasks that I have found yet. The script that @epriestley kindly provided was able to load any diff I threw at it. The script was helpful in determining the right queries to make to investigate further.
Just checked this morning and many tasks have returned and started to fail again. I'm sure there's a way to reproduce this, but our setup is a bit complex and I'm not sure where to start. @epriestley Could you tell me how to determine what these tasks are even doing? Is there a way to see what action triggered them and why they will fail indefinitely?
Oct 11 2016
Also, like @jcox, the diffs always load in that advanced space-age script like there's no problem. Perhaps these exceptions are aware of our attempts to end them. @epriestley Do you have an idea of something else we could try to see why so many exceptions are occurring?
We're also seeing about 100k of these failures in our logs over the past few weeks. I clear out all the tasks and they quickly return. Have you found the root cause?
Sep 1 2016
Just some quick googling revealed this issue with a prior version of APCu.
I turned it back on and ran additional load tests but Apache was able to handle everything. To be honest, I was surprised how well the site held up with APCu turned off. The numbers are about the same.
I've been running load tests for the past 2-3 hours and for some reason, I can't bring it down anymore. You can probably mark this bug as 'unable to reproduce'. I'll continue looking into the issue and see if there's a definitive way to reproduce.
I've started the load tests without APC. One odd item I didn't notice before that may or may not be related is that Phabricator is having a hard time knowing that daemons are running (e.g. you have 2 unresolved setup issues instead of just 1 because APC is turned off). I checked the daemon logs and see they are flooded with this with one entry occurring a lot. I can't track down the source because I don't know what address it's trying to reach out to (is this a repository pull or a herald rule?). Again, this may be completely unrelated and a different issue all together.
Here's the output. I removed "events" from some of them to improve formatting.
I've added a simple javascript file on the installation I grabbed this report from. As part of the troubleshooting process, I installed a vanilla installation of Phabricator on a separate machine with no modifications (version referenced in the bug report) and was able to reproduce the problem.
Aug 22 2016
To build on what was originally suggested, you can have all your repos stop publishing events by running the following which will list all repos and put them back to importing mode so they don't publish events:
Aug 20 2016
Yes, this seems to be exactly what the other issue was referring to. We've just recently upgraded (previous codebase was 2+ years old) and massive amounts of commit email are getting sent out.