To make a new comment on a Differential revision, it takes a long time (1s) for the UI to respond with the comment box.
Description
Revisions and Commits
rP Phabricator | |||
D13956 | rP4b1815d6cca2 Add a "Startup" to DarkConsole |
Status | Assigned | Task | ||
---|---|---|---|---|
Open | None | T8444 Increase typeahead speed | ||
Resolved | epriestley | T7664 Moving a task on a workboard is very slow (and sometimes times out with a fatal error) | ||
Resolved | epriestley | T8588 One instance has a severe, global performance issue |
Event Timeline
If you have a chance, can you pull a Services/Profile on this too? Round trip time on this install is <50ms.
Chrome network console shows 654ms for the AJAX call to /differential/comment/inline/edit/####. The Services profile shows:
Calls to External Services Type Count Total Cost Page Weight connect 2 3,097 us 0.6% query 3 1,464 us 0.3% kvcache-get 2 73 us 0.0% event 1 36 us 0.0% All Services 8 4,671 us 0.9% Entire Page 0 496,299 us 100.0% Event Start Duration Details Analysis connect +284 ms 1,395 us 172.17.8.82:phabricator_config query +285 ms 371 us SELECT * FROM `config_entry` WHERE namespace = 'default' AND isDeleted = 0 Disabled kvcache-get +387 ms 38 us - kvcache-get +387 ms 36 us - connect +411 ms 1,702 us 172.17.8.82:phabricator_user query +413 ms 747 us SELECT s.id AS s_id, s.sessionExpires AS s_sessionExpires, s.sessionStart AS s_sessionStart, s.highSecurityUntil AS s_highSecurityUntil, s.isPartial AS s_isPartial, s.signedLegalpadDocuments as s_signedLegalpadDocuments, u.* FROM `user` u JOIN `phabricator_session` s ON u.phid = s.userPHID AND s.type = 'web' AND s.sessionKey = 'XXXX' Disabled query +416 ms 347 us SELECT * FROM `user_preferences` WHERE userPHID = 'PHID-USER-XXXX' Disabled event +428 ms 36 us -
Those service calls look fine (~4.5ms -- about the same as this install) and the 150ms discrepancy between the 654ms on the network tab vs 500ms of observed wall time isn't too far out of line (I see about half that discrepancy on this install, but services can't do true end-to-end timing and at least some of that time is spent saving Services data after the timer stops).
The 496ms spent in PHP is not fine, and is ~11x slower than the 45ms of PHP time I see on this install.
Have you had any luck getting XHProf set up? That's probably most likely to give us additional clues.
Just installed XHProf and ran another profile. Services:
Type Count Total Cost Page Weight connect 4 5,601 us 1.1% query 6 42,350 us 8.1% kvcache-get 2 160 us 0.0% event 1 23 us 0.0% All Services 13 48,135 us 9.2% Entire Page 0 521,100 us 100.0%
(The extra query time, relative to the last profile I posted, appears to be from an INSERT INTO file_storageblob query. But still, most of the time is in PHP.)
The XHProf profile shows a total wall time of 107ms:
That bump for the INSERT is expected when you enable the profiler -- it's just saving the profile itself (the page as a whole will run a bit slower, too, which is normal/expected).
The huge discrepancy between "Entire Page" (521ms) and "Wall Time" in the profile (107ms) is bizarre and not something I've encountered before. My best guess is that something terrible is happening between the time we start services and the time we start the profiler, or after we stop the profiler but before we stop services. I'll take a look at those execution phases and see if I can get them more fully accounted for.
D13956 adds a new DarkConsole tab with performance information about request startup. Here's what it looks like on this server:
That corresponds to 8ms spent loading libraries and 29ms spent initializing environmental configuration. This should promote to stable tomorrow morning.
When you get around to upgrading, let me know what it looks like for slow requests?
New startup tab for creating an inline comment:
. The time spend loading libraries varied from 50k-75kI also tried this on my crappy local phabricator install (no apc or anything fancy) just for comparison. - 19,800 us for library, 51,100 us for env.init
Interesting. That "load libraries" step is much slower than I'd expect; we see ~5-10ms on this host.
The load libraries step roughly does this:
- Loads all the library maps.
- Loads anything in src/extensions/.
It's possible that (2) is the problem: for example, if one of your extensions has usleep(50000); at the top (or something similar) that could explain the observation. Locating the extensions might also be an issue, conceivably (we do some directory traversal to enumerate them, so if they were on some super slow NFS volume that cost could be paid every time, even though the actual loads would come from APC).
You could rule this out by adding this to phabricator/support/preamble.php (if this file does not exist, create it):
<?php $_SERVER['PHUTIL_DISABLE_RUNTIME_EXTENSIONS'] = true; // ...
This will temporarily stop Phabricator from trying to load extensions or enter the extension codepath at all.
Then dump APC, reload a couple times until things stabilize, and see if the number drops into the ~5ms range.
If it does, we can go figure out if this is an NFS/disk problem or a specific extension problem.
If not, APC it looks like APC is somehow making things slower in production than on your local machine. It might be easiest to try updating to a more modern PHP with OPCache + APCu and see if that just fixes things, since I believe we've already tweaked most of the common stuff APC exposes.
If it's helpful, we run these versions in the Phacility cluster, including on this server:
- OS: ami-5c120b19 (Ubuntu 14.04.3 LTS)
- PHP 5.5.9-1ubuntu4.11 (apt-get install php5)
- Zend OPcache v7.0.3 (default)
- APCu 4.0.2 (apt-get install php5-apcu)
I'd expect any modern version of PHP with OpCache to have similar performance to these versions, though (I selected them for the cluster because they were just the defaults for the distro).
Conceivably, you could try bringing a second web frontend up with more modern versions and pointing it at the same database to see if this avenue is promising. This is safe as long as you do not start the daemons (repository stuff won't work in a naive configuration, but everything else will). (I'm not sure how practical this is in your environment, I know it took me 8 months to get one machine provisioned at one point at Facebook.)
One caveat to the above is that loading src/extensions/ may trigger loads of other code through autoloading extended classes. So I'd expect one of three outcomes from the PHUTIL_DISABLE_RUNTIME_EXTENSIONS test:
- No change: points at APC not working properly.
- Significant reduction in overall runtime (cheaper "load libraries", no corresponding increase in "env.init"): points at extensions incurring a high cost.
- Shift of time from "load libraries" to "env.init" ("load libraries" gets cheaper, but "env.init" gets about as expensive): this points at APC as the problem, not extensions (in this scenario, the theory is that extensions triggered some core loading work to occur earlier, but that work is what is slow, so APC is the root issue).
just a heads up that I'm not actively working on this any more (have to finish other things before my contract ends.)
Last time this was discussed, @vishal proposed tweaking our current APC settings.
We can't reproduce this and haven't seen other reports of similar issues (including from other installs of greater size and complexity), so we have no way to move forward on this.
This issue is in fact actually fixed 🎊 ! Most recently, moving to a new host with a newer PHP (and everything else, running Ubuntu 16.04 rather than 12.04) made a big difference; we've also made a lot of improvements to our Phabricator setup over the past year or so, some of which likely helped too.
Browsing around our install and around this one, ours feels maybe slightly slower but I'm not even 100% sure it's slower. Previously the difference was really obvious with just a little browsing.
/cc @jboning