Page MenuHomePhabricator

Add a setup warning about using "Burstable CPU" (T2) instance classes in AWS
Closed, ResolvedPublic

Description

When I launch PhabricatorTaskmasterDaemon or PhabricatorTriggerDaemon, everything on the server slows to a complete halt — pages will not load, and even so much as hitting ctrl-C to interrupt them takes minutes to have an effect.

I don't know how to give repro instructions using a test install because I don't have access to daemon control on those. Please let me know the best way to develop repro instructions.

PhabricatorTaskmasterDaemon produces the following output:

2016-08-26 3:49:49 PM [STDE] <VERB> PhabricatorTaskmasterDaemon Working on task 182098 (PhabricatorApplicationTransactionPublishWorker)...

PhabricatorTriggerDaemon produces no output other than the normal "Starting process" stuff.

Versions:
Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-34-generic x86_64)
php -v → 5.6.25-1+deb.sury.org~xenial+1 (cli)
phabricator 067d12d7168a580bce695cb26331df785f86c2e9 (Fri, Aug 26)
arcanist 9e82ef979e8148c43b9b8439025d505b1219e213 (Thu, Aug 25)
phutil 5fd1af8b4f2b9631e2ceb06bd88d21f2416123c2 (Wed, Aug 24)

It may be relevant that this problem began after I upgraded to Ubuntu 16.04. Doing so brought in PHP version 7, which I understand is not supported; I reverted to 5.6 (successfully as far as I can tell — I can't find any remaining evidence of PHP 7, and everything seems to run correctly on the website as long as neither of those daemons is running).

Event Timeline

do you have any more information from the logs? This also seems like more a question than a bug report since daemons not working right is rather hard to debug.

I'm running 16.04 with PHP5.6 as well, haven't seen this issue. Here's my installed PHP info if that helps. Might want to verify what PHP your webserver is running, not just CLI.

var/www/html/dev# dpkg -l|grep php
ii  libapache2-mod-php5.6                5.6.25-1+deb.sury.org~xenial+1      amd64        server-side, HTML-embedded scripting language (Apache 2 module)
ii  php-common                           1:44+deb.sury.org~xenial+1          all          Common files for PHP packages
ii  php5.6                               5.6.25-1+deb.sury.org~xenial+1      all          server-side, HTML-embedded scripting language (metapackage)
ii  php5.6-cli                           5.6.25-1+deb.sury.org~xenial+1      amd64        command-line interpreter for the PHP scripting language
ii  php5.6-common                        5.6.25-1+deb.sury.org~xenial+1      amd64        documentation, examples and common module for PHP
ii  php5.6-curl                          5.6.25-1+deb.sury.org~xenial+1      amd64        CURL module for PHP
ii  php5.6-fpm                           5.6.25-1+deb.sury.org~xenial+1      amd64        server-side, HTML-embedded scripting language (FPM-CGI binary)
ii  php5.6-gd                            5.6.25-1+deb.sury.org~xenial+1      amd64        GD module for PHP
ii  php5.6-json                          5.6.25-1+deb.sury.org~xenial+1      amd64        JSON module for PHP
ii  php5.6-mbstring                      5.6.25-1+deb.sury.org~xenial+1      amd64        MBSTRING module for PHP
ii  php5.6-mcrypt                        5.6.25-1+deb.sury.org~xenial+1      amd64        libmcrypt module for PHP
ii  php5.6-mysql                         5.6.25-1+deb.sury.org~xenial+1      amd64        MySQL module for PHP
ii  php5.6-opcache                       5.6.25-1+deb.sury.org~xenial+1      amd64        Zend OpCache module for PHP
ii  php5.6-readline                      5.6.25-1+deb.sury.org~xenial+1      amd64        readline module for PHP
ii  php5.6-xml                           5.6.25-1+deb.sury.org~xenial+1      amd64        DOM, SimpleXML, WDDX, XML, and XSL module for PHP
root@phac-dev:/var/www/html/dev#

Does /usr/bin/env php -v also print 5.6, versus 7?

If you run a single Taskmaster daemon in debug mode with tracing like this:

phabricator/ $ ./bin/phd debug task --trace

...what happens?

@michaeljs1990:

do you have any more information from the logs?

Should I post the output from bin/phd log? I can't even tell where the actual logfiles are.

@chad

ubuntu@ip-172-31-30-226:~/phacility/phabricator$ dpkg -l|grep php
ii  dh-php                             0.21+deb.sury.org~xenial+1                           all          debhelper add-on to handle PHP PECL extensions
ii  libapache2-mod-php5.6              5.6.25-1+deb.sury.org~xenial+1                       amd64        server-side, HTML-embedded scripting language (Apache 2 module)
ii  php-common                         1:44+deb.sury.org~xenial+1                           all          Common files for PHP packages
ii  php-pear                           1:1.10.1+submodules+notgz-8+donate.sury.org~xenial+2 all          PEAR Base System
ii  php5.6                             5.6.25-1+deb.sury.org~xenial+1                       all          server-side, HTML-embedded scripting language (metapackage)
ii  php5.6-cli                         5.6.25-1+deb.sury.org~xenial+1                       amd64        command-line interpreter for the PHP scripting language
ii  php5.6-common                      5.6.25-1+deb.sury.org~xenial+1                       amd64        documentation, examples and common module for PHP
ii  php5.6-curl                        5.6.25-1+deb.sury.org~xenial+1                       amd64        CURL module for PHP
ii  php5.6-dev                         5.6.25-1+deb.sury.org~xenial+1                       amd64        Files for PHP5.6 module development
ii  php5.6-gd                          5.6.25-1+deb.sury.org~xenial+1                       amd64        GD module for PHP
ii  php5.6-json                        5.6.25-1+deb.sury.org~xenial+1                       amd64        JSON module for PHP
ii  php5.6-mbstring                    5.6.25-1+deb.sury.org~xenial+1                       amd64        MBSTRING module for PHP
ii  php5.6-mcrypt                      5.6.25-1+deb.sury.org~xenial+1                       amd64        libmcrypt module for PHP
ii  php5.6-mysql                       5.6.25-1+deb.sury.org~xenial+1                       amd64        MySQL module for PHP
ii  php5.6-opcache                     5.6.25-1+deb.sury.org~xenial+1                       amd64        Zend OpCache module for PHP
ii  php5.6-readline                    5.6.25-1+deb.sury.org~xenial+1                       amd64        readline module for PHP
ii  php5.6-xml                         5.6.25-1+deb.sury.org~xenial+1                       amd64        DOM, SimpleXML, WDDX, XML, and XSL module for PHP
ii  php5.6-xmlrpc                      5.6.25-1+deb.sury.org~xenial+1                       amd64        XMLRPC-EPI module for PHP
ii  pkg-php-tools                      1.33+deb.sury.org~xenial+1                           all          various packaging tools and scripts for PHP packages

@epriestley

Does /usr/bin/env php -v also print 5.6, versus 7?

ubuntu@ip-172-31-30-226:~$ /usr/bin/env php -v
PHP 5.6.25-1+deb.sury.org~xenial+1 (cli) 
Copyright (c) 1997-2016 The PHP Group
Zend Engine v2.6.0, Copyright (c) 1998-2016 Zend Technologies
    with Zend OPcache v7.0.6-dev, Copyright (c) 1999-2016, by Zend Technologies

./bin/phd debug task --trace

A crapton of stuff goes scrolling by, which I realize is not a very helpful description. The site remains usable. Do you want all the actual stuff that goes by?

@jonah214 I vote for see stuff that goes by. You can use the paste application https://secure.phabricator.com/paste/ so it's easier for us to look at.

P2005. This was all written to stderr, not stdout. I let it run for a little while and then interrupted.

That looks pretty normal to me.

you posted this in the description

2016-08-26 3:49:49 PM [STDE] <VERB> PhabricatorTaskmasterDaemon Working on task 182098 (PhabricatorApplicationTransactionPublishWorker)...

does this have a stack trace after it?

Not that was printed to the console.

(Is your server an underclocked Raspberry Pi emulated on an Amiga?)

That trace looks pretty normal. That was from running it for about 30 seconds, right?

From the trace, the daemon didn't actually do anything in those 30 seconds. There were no tasks for it to execute, it just looked for new tasks about once per second. Those queries look normal/responsive (they took about 2-3ms to execute).

Was the server impacted when running bin/phd debug task --trace (e.g., increased load)?

Is this a convenient, well-isolated test system with no important data on it which you can give me access to mutate?

(Is your server an underclocked Raspberry Pi emulated on an Amiga?)

EC2 t2.micro

That trace looks pretty normal. That was from running it for about 30 seconds, right?

I didn't time it, but that sounds about right.

Was the server impacted when running bin/phd debug task --trace (e.g., increased load)?

No.

Is this a convenient, well-isolated test system with no important data on it which you can give me access to mutate?

I'm not sure what you mean by "convenient" or "well-isolated". It's not a test system and has important data on it, but I'm happy to give you access to it after I do a database backup. Do you want that? How do you prefer to transfer keys?

In AWS, my understanding is that micro instances have CPU credits and their CPU is stopped or heavily throttled if you exceed the credit hours. Has the instance exhausted its CPU credits? I think there's a graph on the console, maybe in the "CloudWatch" tab or something like that? I don't actually have any micro instances up right now to check.

We also repeatedly had micro instances stop working for no apparent reason (see T8204) and stopped using them after the third or fourth one died.

By default, we run an autoscaling pool with up to 4 taskmasters, which could leave you with 8 processes (MySQL, Webserver, PullLocal, Trigger, Taskmaster x4) vying for one virtual CPU. You may have somewhat better luck adjusting phd.taskmasters down to 1.

Hm, that's interesting. I know basically nothing about AWS, but I'll try adjusting phd.taskmasters, and if that doesn't work, I'll try using a bigger instance. Is the next size up sufficient, in your experience?

It's hard to say because workloads differ, but I'd steer away from the CPU credited "burstable" t2.* instances based on generally negative experiences with micros in the past.

We currently run everything on m3.large instances. I suspect you'd be OK on an m3.medium, but haven't run production workloads on them. AWS doesn't really have an instance type which feels like it's in the sweet spot as an entry-level low-cost production instance type for this workload. The c3.large is almost certainly fine but only slightly cheaper than an m3.large.

One thing we could possibly do is try to hit the AWS meta-service during startup:

$ curl http://169.254.169.254/latest/meta-data/instance-type

If that works and returns t2.*, we could raise a setup warning like:

You are running on an instance in a burstable CPU instance class, which means AWS may aggressively throttle the CPU this instance is allowed to consume, causing everything to run very slowly and become unresponsive. These instances are generally unsuitable for production use. If this is not a production instance and you've reviewed the documentation on AWS CPU credits, you can ignore this setup issue.

There doesn't immediately appear to be a good way for us to read CPUCreditBalance or CPUCreditUsage from the CLI without credentials.

Here's the actual credit graph in the AWS web console, under the Monitoring tab (lower right):

Screen Shot 2016-08-26 at 3.54.22 PM.png (876×1 px, 169 KB)

If yours looks like that but upside down, that's probably fairly conclusive that this is a burstable CPU throttling issue.

My graph certainly seems to imply that this is the issue.

The weird thing is that it was running totally fine until I upgraded to 16.04. Maybe performing the upgrade just depleted the credits so much that doing normal Phabricator stuff killed the rest?

If you fiddle with the dropdown in the upper right you should be able to get historic data and maybe see if the graph went not-so-good around then. Doesn't seem implausible.

I did, and it certainly seems to match up. I'm surprised it's so dramatic. I guess since it was working fine before, the sensible thing is probably to just wait for the credits to recover and go back to things on the t2 micro.

I'll retarget this to add a setup warning about micro instances, this isn't the first time they've done surprising things (T2727, T8204, T8210).

epriestley renamed this task from Launching daemons kills the system to Add a setup warning about using "Burstable CPU" (T2) instance classes in AWS.Aug 26 2016, 11:44 PM
epriestley edited projects, added Setup, Documentation; removed Bug Report.

Specifically, my plan here is:

  • As a setup warning, make a request to http://169.254.169.254/latest/meta-data/instance-type with a short timeout (~2 seconds). On installs not in AWS, this will just fail. On installs in AWS, this will give us data from the AWS secret meta-service.
  • If we get HTTP 200 back from that and the response body matches "t2.*", raise a warning about bustable CPU types.

There are a cluster of other similar checks ("make an HTTP request to something") which we'd benefit from running at the same time so we can wait on them in parallel instead of in sequence. I'll collect them into some kind of group.

In case someone else ends up with my original problem, I'll mention that https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/69 fixed it.