Page MenuHomePhabricator

Modularize rate/connection limits in Phabricator
ClosedPublic

Authored by epriestley on Oct 11 2017, 11:39 PM.
Tags
None
Referenced Files
F13050277: D18703.id44908.diff
Fri, Apr 19, 2:43 AM
F13050275: D18703.id44904.diff
Fri, Apr 19, 2:43 AM
F13050274: D18703.id44896.diff
Fri, Apr 19, 2:43 AM
F13050273: D18703.id.diff
Fri, Apr 19, 2:43 AM
F13049104: D18703.diff
Fri, Apr 19, 1:21 AM
Unknown Object (File)
Mon, Apr 15, 11:07 AM
Unknown Object (File)
Thu, Apr 11, 4:04 AM
Unknown Object (File)
Sun, Apr 7, 1:17 AM
Subscribers
None

Details

Summary

Depends on D18702. Ref T13008. This replaces the old hard-coded single rate limit with multiple flexible limits, and defines two types of limits:

  • Rate: reject requests if a client has completed too many requests recently.
  • Connection: reject requests if a client has too many more connections than disconnections recently.

The connection limit adds +1 to the score for each connection, then adds -1 for each disconnection. So the overall number is how many open connections they have, at least approximately.

Supporting multiple limits will let us do limiting by Hostname and by remote address (e.g., a specific IP can't exceed a low limit, and all requests to a hostname can't exceed a higher limit).

Configuring the new limits looks something like this:

PhabricatorStartup::addRateLimit(new PhabricatorClientRateLimit())
  ->setLimitKey('rate')
  ->setClientKey($_SERVER['REMOTE_ADDR'])
  ->setLimit(5);

PhabricatorStartup::addRateLimit(new PhabricatorClientConnectionLimit())
  ->setLimitKey('conn')
  ->setClientKey($_SERVER['REMOTE_ADDR'])
  ->setLimit(2);
Test Plan
  • Configured limits as above.
  • Made a lot of requests, got cut off by the rate limit.
  • Used curl --limit-rate -F 'data=@the_letter_m.txt' ... to upload files really slowly. Got cut off by the connection limit. With enable_post_data_reading off, this correctly killed the connections before the uploads finished.
  • I'll send this stuff to secure before production to give it more of a chance.

Diff Detail

Repository
rP Phabricator
Branch
post5
Lint
Lint Errors
SeverityLocationCodeMessage
Errorsupport/startup/PhabricatorClientLimit.php:202XHP45PHP Compatibility
Errorsupport/startup/PhabricatorClientLimit.php:219XHP45PHP Compatibility
Errorsupport/startup/PhabricatorClientLimit.php:240XHP45PHP Compatibility
Errorsupport/startup/PhabricatorClientLimit.php:250XHP45PHP Compatibility
Errorsupport/startup/PhabricatorClientLimit.php:264XHP45PHP Compatibility
Errorsupport/startup/PhabricatorClientLimit.php:265XHP45PHP Compatibility
Errorsupport/startup/PhabricatorClientLimit.php:279XHP45PHP Compatibility
Unit
Tests Passed
Build Status
Buildable 18672
Build 25154: Run Core Tests
Build 25153: arc lint + arc unit

Event Timeline

  • Fix unit test path to PhabricatorStartup.php.
amckinley added inline comments.
support/startup/PhabricatorClientLimit.php
265

Have we tested this code yet? I'm worried that if this somehow doesn't work, the size of APCu will grow forever.

This revision is now accepted and ready to land.Oct 13 2017, 7:50 PM

The previous version of this code has been active in production on secure since D17761, and is currently active, and we haven't seen issues. (You can use ConfigCache Status to review APCu usage. Whichever secure host I just hit is using 2MB, which suggests there's no leak.)

That doesn't completely guarantee that I didn't make a mistake in the conversion, but I pretty much just renamed variables, and my local install isn't showing any meaningful usage either.

APC also just drops the whole cache if it fills up, so even if we fill it with unlimited buckets (or, e.g., there is no bug but an attacker repeatedly makes requests with very long, unique, random hostnames to intentionally fill the host-based cache) I'd expect to just see the rate limiting stutter momentarily when the cache gets dropped.

I do think this might be slightly clearer by using buckets 0-N and selecting a bucket with (time() / 60) % bucket_count. The problem with this is that figuring out when we should clear a bucket because it has expired is a little tricky. We could sort of cheat and use APC TTLs, or we could add a version/generation marker to buckets. I think this might end up cleaner (the current code is kind of hard to figure out) but didn't want to expand the scope of this rewrite.

This revision was automatically updated to reflect the committed changes.