Paths

Table of Contentst

Differential D16389

In taskmaster daemons, only close connections which were not used recently
ClosedPublic
Actions

Authored by epriestley on Aug 11 2016, 4:07 PM.

Details

Reviewers

chad

Maniphest Tasks

T11458: Make daemons less aggressive about cycling connections

Commits

rP5e3efca08a57: In taskmaster daemons, only close connections which were not used recently

Summary

Ref T11458. Depends on D16388. Currently, we're very aggressive about closing connections in the taskmaster daemons.

This can end up taking up a lot of resources. In particular, because the outgoing port for outbound connections normally can not be reused for 60 seconds after a connection closes, we may exhaust outbound ports on the host if there's a big queue full of stuff that's being processed very quickly.

At a minimum, we always are holding open a worker connection, which we always need again right away. So even in the best case we end up opening/closing this about once per second and each daemon takes up about ~60 outbound ports when it should take up ~1.

So, make two adjustments:

First, only close connections which we haven't issued a query on in the last 60 seconds. This should prevent us from closing connections that we'll need again immediately in most cases. In the worst case, we shouldn't be eating up any extra ports under default TCP behavior.
Second, explicitly close connections. We were relying on implicit/GC behavior (maybe as a holdover from very long ago, before we got connection wrappers in place?), which probably did about the same thing but isn't as predictable and can't be profiled or instrumented.

Test Plan

This is somewhat difficult to test completely convincingly in isolation since the problem behavior depends on production scales and the workload, and to some degree on configuration.

I tested that this stuff baiscally works by adding logging to connect/close and running the daemons, verifying that they churned connections a lot before this change (e.g., ~1/s even at no load) and churn rarely afterward (e.g., almost never at no load).

I ran some workload through them to make sure I didn't completely break anything.

The best real test is just seeing how production responds. Current inbound/outbound connections on secure001 are 1,200:

secure001 $ netstat -t | grep :mysql | wc -l
1164

Current outbound from repo001 are 18,600:

repo001 $ netstat -t | grep :mysql | wc -l
18663

Diff Detail

Repository

rP Phabricator

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

epriestley updated this revision to Diff 39413.Aug 11 2016, 4:07 PM

epriestley retitled this revision from to In taskmaster daemons, only close connections which were not used recently.

epriestley updated this object.

epriestley edited the test plan for this revision. (Show Details)

epriestley added a reviewer: chad.

epriestley added a task: T11458: Make daemons less aggressive about cycling connections.

epriestley added a parent revision: D16388: Record the last time a connection was used on the connection object.

Actually close after 60 seconds inactive instead of 15 (which I was testing with), to align with default TCP behavior.

chad accepted this revision.Aug 11 2016, 6:19 PM

chad edited edge metadata.

This revision is now accepted and ready to land.Aug 11 2016, 6:19 PM

Closed by commit rP5e3efca08a57: In taskmaster daemons, only close connections which were not used recently (authored by epriestley, committed by epriestley). · Explain WhyAug 11 2016, 7:04 PM

This revision was automatically updated to reflect the committed changes.

epriestley mentioned this in T11458: Make daemons less aggressive about cycling connections.Aug 11 2016, 7:24 PM

epriestley mentioned this in D21369: When acquiring a GlobalLock, put good connections that just got unlucky back in the pool.Jun 26 2020, 1:00 AM

epriestley mentioned this in rP22de618d3bd3: When acquiring a GlobalLock, put good connections that just got unlucky back in….Jun 26 2020, 1:06 AM

epriestley mentioned this in rP0e4d62847cfc: (stable) When acquiring a GlobalLock, put good connections that just got….

Revision Contents
Changeset List

Path

Size

src/

infrastructure/

daemon/

PhabricatorDaemon.php

2 lines

storage/

lisk/

LiskDAO.php

47 lines

Diff 39416

View Options

src/infrastructure/daemon/PhabricatorDaemon.php