Page MenuHomePhabricator

One instance has a severe, global performance issue
Closed, ResolvedPublic

Description

To make a new comment on a Differential revision, it takes a long time (1s) for the UI to respond with the comment box.

Revisions and Commits

Event Timeline

angie renamed this task from Making a new comment slow to Making a new comment UI is too slow.
angie raised the priority of this task from to Needs Triage.
angie updated the task description. (Show Details)
angie added a project: Restricted Project.
angie added subscribers: jhurwitz, angie, aj.

If you have a chance, can you pull a Services/Profile on this too? Round trip time on this install is <50ms.

Chrome network console shows 654ms for the AJAX call to /differential/comment/inline/edit/####. The Services profile shows:

Calls to External Services
Type	Count	Total Cost	Page Weight
connect	2	3,097 us	0.6%
query	3	1,464 us	0.3%
kvcache-get	2	73 us	0.0%
event	1	36 us	0.0%
All Services	8	4,671 us	0.9%
Entire Page	0	496,299 us	100.0%


Event	Start	Duration	Details	Analysis
connect	+284 ms	1,395 us	172.17.8.82:phabricator_config	
query	+285 ms	371 us	SELECT * FROM `config_entry` WHERE namespace = 'default' AND isDeleted = 0	Disabled
kvcache-get	+387 ms	38 us	-	
kvcache-get	+387 ms	36 us	-	
connect	+411 ms	1,702 us	172.17.8.82:phabricator_user	
query	+413 ms	747 us	SELECT s.id AS s_id, s.sessionExpires AS s_sessionExpires, s.sessionStart AS s_sessionStart, s.highSecurityUntil AS s_highSecurityUntil, s.isPartial AS s_isPartial, s.signedLegalpadDocuments as s_signedLegalpadDocuments, u.* FROM `user` u JOIN `phabricator_session` s ON u.phid = s.userPHID AND s.type = 'web' AND s.sessionKey = 'XXXX'	Disabled
query	+416 ms	347 us	SELECT * FROM `user_preferences` WHERE userPHID = 'PHID-USER-XXXX'	Disabled
event	+428 ms	36 us	-

Those service calls look fine (~4.5ms -- about the same as this install) and the 150ms discrepancy between the 654ms on the network tab vs 500ms of observed wall time isn't too far out of line (I see about half that discrepancy on this install, but services can't do true end-to-end timing and at least some of that time is spent saving Services data after the timer stops).

The 496ms spent in PHP is not fine, and is ~11x slower than the 45ms of PHP time I see on this install.

Have you had any luck getting XHProf set up? That's probably most likely to give us additional clues.

Just installed XHProf and ran another profile. Services:

Type	Count	Total Cost	Page Weight
connect	4	5,601 us	1.1%
query	6	42,350 us	8.1%
kvcache-get	2	160 us	0.0%
event	1	23 us	0.0%
All Services	13	48,135 us	9.2%
Entire Page	0	521,100 us	100.0%

(The extra query time, relative to the last profile I posted, appears to be from an INSERT INTO file_storageblob query. But still, most of the time is in PHP.)

The XHProf profile shows a total wall time of 107ms:

That bump for the INSERT is expected when you enable the profiler -- it's just saving the profile itself (the page as a whole will run a bit slower, too, which is normal/expected).

The huge discrepancy between "Entire Page" (521ms) and "Wall Time" in the profile (107ms) is bizarre and not something I've encountered before. My best guess is that something terrible is happening between the time we start services and the time we start the profiler, or after we stop the profiler but before we stop services. I'll take a look at those execution phases and see if I can get them more fully accounted for.

jhurwitz moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 15 2015, 9:29 PM

D13956 adds a new DarkConsole tab with performance information about request startup. Here's what it looks like on this server:

Screen Shot 2015-08-21 at 2.55.05 PM.png (803×1 px, 162 KB)

That corresponds to 8ms spent loading libraries and 29ms spent initializing environmental configuration. This should promote to stable tomorrow morning.

When you get around to upgrading, let me know what it looks like for slow requests?

angie moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Aug 23 2015, 3:12 PM

New startup tab for creating an inline comment:

Screen Shot 2015-09-07 at 7.37.27 AM.png (824×2 px, 150 KB)
. The time spend loading libraries varied from 50k-75k

I also tried this on my crappy local phabricator install (no apc or anything fancy) just for comparison. - 19,800 us for library, 51,100 us for env.init

/changes/inline is new and has only existed for <1 week, btw

Interesting. That "load libraries" step is much slower than I'd expect; we see ~5-10ms on this host.

The load libraries step roughly does this:

  1. Loads all the library maps.
  2. Loads anything in src/extensions/.

It's possible that (2) is the problem: for example, if one of your extensions has usleep(50000); at the top (or something similar) that could explain the observation. Locating the extensions might also be an issue, conceivably (we do some directory traversal to enumerate them, so if they were on some super slow NFS volume that cost could be paid every time, even though the actual loads would come from APC).

You could rule this out by adding this to phabricator/support/preamble.php (if this file does not exist, create it):

<?php

$_SERVER['PHUTIL_DISABLE_RUNTIME_EXTENSIONS'] = true;

// ...

This will temporarily stop Phabricator from trying to load extensions or enter the extension codepath at all.

Then dump APC, reload a couple times until things stabilize, and see if the number drops into the ~5ms range.

If it does, we can go figure out if this is an NFS/disk problem or a specific extension problem.

If not, APC it looks like APC is somehow making things slower in production than on your local machine. It might be easiest to try updating to a more modern PHP with OPCache + APCu and see if that just fixes things, since I believe we've already tweaked most of the common stuff APC exposes.

If it's helpful, we run these versions in the Phacility cluster, including on this server:

  • OS: ami-5c120b19 (Ubuntu 14.04.3 LTS)
  • PHP 5.5.9-1ubuntu4.11 (apt-get install php5)
  • Zend OPcache v7.0.3 (default)
  • APCu 4.0.2 (apt-get install php5-apcu)

I'd expect any modern version of PHP with OpCache to have similar performance to these versions, though (I selected them for the cluster because they were just the defaults for the distro).

Conceivably, you could try bringing a second web frontend up with more modern versions and pointing it at the same database to see if this avenue is promising. This is safe as long as you do not start the daemons (repository stuff won't work in a naive configuration, but everything else will). (I'm not sure how practical this is in your environment, I know it took me 8 months to get one machine provisioned at one point at Facebook.)

One caveat to the above is that loading src/extensions/ may trigger loads of other code through autoloading extended classes. So I'd expect one of three outcomes from the PHUTIL_DISABLE_RUNTIME_EXTENSIONS test:

  • No change: points at APC not working properly.
  • Significant reduction in overall runtime (cheaper "load libraries", no corresponding increase in "env.init"): points at extensions incurring a high cost.
  • Shift of time from "load libraries" to "env.init" ("load libraries" gets cheaper, but "env.init" gets about as expensive): this points at APC as the problem, not extensions (in this scenario, the theory is that extensions triggered some core loading work to occur earlier, but that work is what is slow, so APC is the root issue).
epriestley renamed this task from Making a new comment UI is too slow to One instance has a severe, global performance issue.Sep 9 2015, 10:58 PM
epriestley triaged this task as Normal priority.

just a heads up that I'm not actively working on this any more (have to finish other things before my contract ends.)
Last time this was discussed, @vishal proposed tweaking our current APC settings.

chad moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Feb 2 2016, 4:46 AM
chad moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Feb 2 2016, 4:51 AM
epriestley claimed this task.

We can't reproduce this and haven't seen other reports of similar issues (including from other installs of greater size and complexity), so we have no way to move forward on this.

This hasn't seen activity in four months so I'm sending it to the murky depths.

gregprice changed the task status from Wontfix to Resolved.May 9 2017, 2:14 AM
gregprice added subscribers: jboning, gregprice.

This issue is in fact actually fixed 🎊 ! Most recently, moving to a new host with a newer PHP (and everything else, running Ubuntu 16.04 rather than 12.04) made a big difference; we've also made a lot of improvements to our Phabricator setup over the past year or so, some of which likely helped too.

Browsing around our install and around this one, ours feels maybe slightly slower but I'm not even 100% sure it's slower. Previously the difference was really obvious with just a little browsing.

/cc @jboning