Page MenuHomePhabricator

integration with reverse caching proxies such as squid, varnish
Closed, ResolvedPublic

Description

It would be great if phabricator would integrate with reverse caching proxies such as squid, varnish. By doing stuff like setting the appropriate Cache-Control, Expires header. And by sending purge requests to the reverse proxy when pages were modified. Perhaps using Edge Side Includes.

Webapps capable of that are for example mediawiki. Example implementation:
https://doc.wikimedia.org/mediawiki-core/master/php/html/SquidUpdate_8php_source.html

Note, "SquidUpdate" works fine with varnish. You don't need specific implementations of varnish plus and extra one for squid. If this is done once, it's up to the reverse proxies to work with it.

Another webapp capable of this is wordpress's w3 total cache plugin:
https://wordpress.org/plugins/w3-total-cache/

Event Timeline

patrick.schleizer raised the priority of this task from to Needs Triage.
patrick.schleizer updated the task description. (Show Details)

We should already do this. See this document for some details. We refer to caching reverse proxies as "CDNs", under the assumption that most users will not be configuring their own, but the principle is the same:

https://secure.phabricator.com/book/phabricator/article/configuring_file_domain/

In particular:

  • Cacheable static resources (like JS, CSS, and some images) are served with cacheable headers. These URIs are versioned and their content never changes; they can be cached indefinitely. Although see, e.g., T7134, for caveats about deploying new code to a large tier behind a caching proxy. You can avoid these problems by stopping the world during a deployment or dumping the cache after a deployment.
  • Other pages are never cacheable, and are served with headers which prevent caching (Cache-Control, Expires, Pragma).

Notably, this install is set up with CloudFlare as a caching proxy for static resources.

If you're interested in pursuing content caching, not just static resource caching (for example, if you want to cache the content of pages like /T123), it's not supported and we have no plans to ever pursue this:

  • Content is too dynamic and interconnected to accurately dirty caches. For example, if someone adds vacation information to Calendar, every page where their name is mentioned becomes dirty.
  • It would only be relevant for logged out users, who are a tiny minority of Phabricator users.

I'm also not sure what actual problem you're looking to solve here (performance? load? love of caching?). See:

https://secure.phabricator.com/book/phabcontrib/article/feature_requests/#describe-problems

chad claimed this task.
chad added a subscriber: chad.

@patrick.schleizer if there's something specific with performance we can address, let us know, otherwise I think @epriestley covered the basics on where we are and what's not going to be possible.

Thanks a lot for your detailed answer!

I should have checked, that static resources already have sufficient headers.

It would only be relevant for logged out users, who are a tiny minority of Phabricator users.

Are you sure about that? I don't understand a lot about this, but isn't this what Edge Side Includes is supposed to solve? Better caching for logged in users?

I'm also not sure what actual problem you're looking to solve here (performance? load? love of caching?). See:

A bit of everything of that. All computation that can be saved doesn't clobber the server.

https://secure.phabricator.com/book/phabcontrib/article/feature_requests/#describe-problems

Good writeup!

Are you sure about that? I don't understand a lot about this, but isn't this what Edge Side Includes is supposed to solve? Better caching for logged in users?

I haven't actually used Edge Side Includes before, but it looks like the problem they solve is more narrow in scope than the problem we generally face. If we had mostly-similar content with a few variant pieces (like just a "logged in" vs "logged out" header) they might be a good approach to consider, but a lot of our page content varies significantly from user to user and doesn't look like it's a great fit for ESI.

For example, we execute policy checks for each object we load which is represented on the page, so one viewer may see:

Projects: Surprise Birthday Party

...while another sees:

Projects: Restricted Project

In the general case, policy checks can not be meaningfully cached and can not be efficiently dirtied, so we'd be hard-pressed to get much benefit from proxy-level caching even if we were just serving the same page to the same user multiple times in a row.

This isn't to say that we can't be faster or more resource efficient. We have an architecture which is generally mindful of performance at a high level, but have spent relatively little effort directly attempting to improve it (except in cases where something is noticeably slow), and can definitely do far better than we do today -- I just think proxy-level caching and ESI aren't the most promising avenues.

Some of the performance-related tech in the pipeline includes:

  • Better tools for understanding performance from a non-Phabricator-developer point of view (T6930).
  • Using ajax to perform most navigation events (T2086).
  • I'd like to take another shot at this patch, which runs includes before the page loads, making them free from a user perspective -- the server still spends the CPU cycles, but the user doesn't see the end-to-end time (we ran that patch in production for ~a year, but I haven't gotten around to updating it for newer PHP).
  • Some other similar cheap wins by moving a few things like UTF8 functions to C (T2312).
  • Because we have no conditional global code and store no request state in globals or statics, we can theoretically go further than that and reuse the entire PHP interpreter. This would make includes and anything we can move to a setup phase nearly-free to the server, too.
  • We have some high-concept patches to make SQL queries future-based (see D5104 through D5112), although it's not really clear if that's going anywhere.
  • A few pieces of the infrastructure, like Handles, were specifically built to enable cache optimizations in the future, but we haven't gotten around to implementing those optimizations.

Broadly, a lot of this stuff just isn't moving forward very quickly because Phabricator is "fast enough" in most situations and users are almost universally more interested in new features than in performance and resource utilization improvements (with a few exceptions, like T5644). Some of it also makes configuration more complicated, and for many installs it's cheaper to give Phabricator more hardware than make configuration (and thus the operational/maintenance cost in employee-hours) more complex.

You can also address scaling issues by configuring Phabricator on a cluster of machines. This is in its infancy, but we run it in production on Phacility today; see T7024. We currently support an arbitrarily large number of web/application machines; T4292 and T4209 discuss plans around scaling repository and database hosts. But this stuff is also not a major priority because installs aren't generally outscaling a single machine (WMF has a better chance than most installs, of course) and are far more interested in features than theoretical scalability headroom.