Currently, there's no CLI caching for instance data, which means that it gets fetched every time on script startup. Particularly for the PullLocal daemon, which runs several Phabricator subprocesses, this creates a significant amount of load on the admin tier (>1M requests/day).
This isn't really causing any concrete problems today, although I've occasionally seen what appear to be load-related failures during deploys, which I think stem from the large additional burst of API queries that deploys cause.
We should put a disk cache in front of the API call so that, e.g., repository subprocesses hit the disk cache most of the time. This cache does not need to be particularly aggressive about invalidation as long as management commands (like RestartWorker) can invalidate it explicitly.
My plan is:
- Put a disk-based, per-instance cache in tmp/cache/cluster/<instance>.json.
- Add it to the PhacilityServices::getClusterCache() stack.
- Provide a way for administrative commands (deploy, restart) to purge it (by nuking the file wholesale).
- Purge the cache when upgrading (entire host) or restarting daemons (instance only).
This should drop API load by a large factor (99%?) with no visible effect on behavior.