Modularize rate/connection limits in Phabricator

Authored by epriestley on Oct 11 2017, 11:39 PM.



Depends on D18702. Ref T13008. This replaces the old hard-coded single rate limit with multiple flexible limits, and defines two types of limits:

  • Rate: reject requests if a client has completed too many requests recently.
  • Connection: reject requests if a client has too many more connections than disconnections recently.

The connection limit adds +1 to the score for each connection, then adds -1 for each disconnection. So the overall number is how many open connections they have, at least approximately.

Supporting multiple limits will let us do limiting by Hostname and by remote address (e.g., a specific IP can't exceed a low limit, and all requests to a hostname can't exceed a higher limit).

Configuring the new limits looks something like this:

PhabricatorStartup::addRateLimit(new PhabricatorClientRateLimit())

PhabricatorStartup::addRateLimit(new PhabricatorClientConnectionLimit())
Test Plan
  • Configured limits as above.
  • Made a lot of requests, got cut off by the rate limit.
  • Used curl --limit-rate -F 'data=@the_letter_m.txt' ... to upload files really slowly. Got cut off by the connection limit. With enable_post_data_reading off, this correctly killed the connections before the uploads finished.
  • I'll send this stuff to secure before production to give it more of a chance.

Diff Detail

rP Phabricator
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.
epriestley created this revision.Oct 11 2017, 11:39 PM
  • Fix unit test path to PhabricatorStartup.php.
amckinley accepted this revision.Oct 13 2017, 7:50 PM
amckinley added inline comments.

Have we tested this code yet? I'm worried that if this somehow doesn't work, the size of APCu will grow forever.

This revision is now accepted and ready to land.Oct 13 2017, 7:50 PM

The previous version of this code has been active in production on secure since D17761, and is currently active, and we haven't seen issues. (You can use ConfigCache Status to review APCu usage. Whichever secure host I just hit is using 2MB, which suggests there's no leak.)

That doesn't completely guarantee that I didn't make a mistake in the conversion, but I pretty much just renamed variables, and my local install isn't showing any meaningful usage either.

APC also just drops the whole cache if it fills up, so even if we fill it with unlimited buckets (or, e.g., there is no bug but an attacker repeatedly makes requests with very long, unique, random hostnames to intentionally fill the host-based cache) I'd expect to just see the rate limiting stutter momentarily when the cache gets dropped.

I do think this might be slightly clearer by using buckets 0-N and selecting a bucket with (time() / 60) % bucket_count. The problem with this is that figuring out when we should clear a bucket because it has expired is a little tricky. We could sort of cheat and use APC TTLs, or we could add a version/generation marker to buckets. I think this might end up cleaner (the current code is kind of hard to figure out) but didn't want to expand the scope of this rewrite.

This revision was automatically updated to reflect the committed changes.