Diviner Flavor Text Things You Should Do Soon: Static Resources

Things You Should Do Soon: Static Resources
Phabricator Flavor Text (Sundries)

Over time, you'll write more JS and CSS and eventually need to put systems in place to manage it.

This is part of Things You Should Do Soon, which describes architectural problems in web applications which you should begin to consider before you encounter them.

Manage Dependencies Automatically

The naive way to add static resources to a page is to include them at the top of the page, before rendering begins, by enumerating filenames. Facebook used to work like that:

<?php

require_js('js/base.js');
require_js('js/utils.js');
require_js('js/ajax.js');
require_js('js/dialog.js');
// ...

This was okay for a while but had become unmanageable by 2007. Because dependencies were managed completely manually and you had to explicitly list every file you needed in the right order, everyone copy-pasted a giant block of this stuff into every page. The major problem this created was that each page pulled in way too much JS, which slowed down frontend performance.

We moved to a system (called Haste) which declared JS dependencies in the files using a docblock-like header:

/**
 * @provides dialog
 * @requires utils ajax base
 */

We annotated files manually, although theoretically you could use static analysis instead (we couldn't realistically do that, our JS was pretty unstructured). This allowed us to pull in the entire dependency chain of component with one call:

require_static('dialog');

...instead of copy-pasting every dependency.

Include When Used

The other part of this problem was that all the resources were required at the top of the page instead of when they were actually used. This meant two things:

you needed to include every resource that could ever appear on a page;
if you were adding something new to 2+ pages, you had a strong incentive to put it in base.js.

So every page pulled in a bunch of silly stuff like the CAPTCHA code (because there was one obscure workflow involving unverified users which could theoretically show any user a CAPTCHA on any page) and every random thing anyone had stuck in base.js.

We moved to a system where JS and CSS tags were output after page rendering had run instead (they still appeared at the top of the page, they were just prepended rather than appended before being output to the browser -- there are some complexities here, but they are beyond the immediate scope), so require_static() could appear anywhere in the code. Then we moved all the require_static() calls to be proximate to their use sites (so dialog rendering code would pull in dialog-related CSS and JS, for example, not any page which might need a dialog), and split base.js into a bunch of smaller files.

Packaging

The biggest frontend performance killer in most cases is the raw number of HTTP requests, and the biggest hammer for addressing it is to package related JS and CSS into larger files, so you send down all the core JS code in one big file instead of a lot of smaller ones. Once the other groundwork is in place, this is a relatively easy change. We started with manual package definitions and eventually moved to automatic generation based on production data.

Caches and Serving Content

In the simplest implementation of static resources, you write out a raw JS tag with something like src="/js/base.js". This will break disastrously as you scale, because clients will be running with stale versions of resources. There are bunch of subtle problems (especially once you have a CDN), but the big one is that if a user is browsing your site as you push/deploy, their client will not make requests for the resources they already have in cache, so even if your servers respond correctly to If-None-Match (ETags) and If-Modified-Since (Expires) the site will appear completely broken to everyone who was using it when you push a breaking change to static resources.

The best way to solve this problem is to version your resources in the URI, so each version of a resource has a unique URI:

rsrc/af04d14/js/base.js

When you push, users will receive pages which reference the new URI so their browsers will retrieve it.

But, there's a big problem, once you have a bunch of web frontends:

While you're pushing, a user may make a request which is handled by a server running the new version of the code, which delivers a page with a new resource URI. Their browser then makes a request for the new resource, but that request is routed to a server which has not been pushed yet, which delivers an old version of the resource. They now have a poisoned cache: old resource data for a new resource URI.

You can do a lot of clever things to solve this, but the solution we chose at Facebook was to serve resources out of a database instead of off disk. Before a push begins, new resources are written to the database so that every server is able to satisfy both old and new resource requests.

This also made it relatively easy to do processing steps (like stripping comments and whitespace) in one place, and just insert a minified/processed version of CSS and JS into the database.

Reference Implementation: Celerity

Some of the ideas discussed here are implemented in Phabricator's Celerity system, which is essentially a simplified version of the Haste system used by Facebook.

Defined: src/docs/flavor/soon_static_resources.diviner:1