Page MenuHomePhabricator

`bin/ssh-auth` cache includes paths which may be (or may become) stale
Closed, ResolvedPublic


Quoting from Z1336: General Chat:

ran into an ugly issue today for which I'm still trying to figure out what to make out of it…
in our setup, Phabricator (and subcomponents) updates happen atomically, meaning: we don't update an existing phabricator installation but whenever any param (new commit, configuration changes, …) changes, a new unique build is generated and once the build is complete, the finished build is symlinked as the active one… (a bit similar to what Nix(OS) is doing on a global level)

this lead to a weird result in bin/ssh-auth which generated the response for the bin/ssh-exec command pointing to old (already removed) builds, e.g.:

# echo {} | ssh git@ conduit
sh: 1: /var/www/ not found

it turned out to be a caching issue in bin/ssh-auth which apparently doesn't take into account, whether the realpath of itself ($root) has changed and reuses a now invalid cache result for $authfile
I worked around this issue for now by setting phabricator.cache-namespace to a per-build unique value (e.g. phabricator-0f42c560)


If you want to file a task for that, we can make the cachekey depend on the Phabricator directory root sooner or later. Using cache-namespace will also work.

Version Information

Event Timeline

eliaspro created this task.Mar 15 2017, 2:08 AM
epriestley triaged this task as Wishlist priority.Mar 20 2017, 2:31 PM

This is vaguely involved.

My initial thought was just to add the binary path to the end of the cache key, but the key is also used in PhabricatorAuthSSHKeyEditor to purge the cache, and the cache purge should not really be path-dependent. For example, if you deploy Phabricator from two different paths on two different hosts, updating an SSH key on one should invalidate the cache on both. A simple adjustment to the cache key would mean that only one host saw its cache invalidated. This deployment scheme is exceptionally ridiculous, but within the realm of possibility, even with the scheme described in the task description (for example, half of a cluster may be upgraded to a more recent build, while the other half is still serving an old build).

An alternate approach would be to cache a template document and replace the path outside of the cache layer, but the document is currently built as a string with several layers of escaping.

So we could either:

  • purge more aggressively (e.g., all keys with some prefix); or
  • cache structured data and build the document from it outside of the cache layer.

Either of these changes require more than a trivial set of edits.

The circumstances required to trigger this issue (moving the directory where Phabricator is stored while it's running) are also rare. The build/deployment strategy described in the task description is also somewhat pointless, because database migrations can not be run backward so it is unsafe to revert to older builds in the general case. I think our behavior is still incorrect and worth fixing eventually.

See PHI801. This is a similar case to the one above, but I think the underlying use case was a bit more compelling: the install was transitioning from a set of older web nodes to newer web nodes as a part of a general infrastructure upgrade, and making various fixes to the deployment and support software as part of that process. One aspect of these adjacent fixes was changing the path from /something/something/phabricator to /something-else/something-else/phabricator (I think as a consequence of adjusting deploy strategy), so newer nodes would have a different absolute path than older nodes. Both sets of nodes were in production simultaneously.

The net effect is the same as above, but this use case ("rare upgrades to infrastructure") feels like it's on somewhat firmer ground to me.

To fix it, I think I lean strongly toward caching structured data over trying to invalidate more broadly.

epriestley raised the priority of this task from Wishlist to Normal.Aug 6 2018, 10:44 PM
epriestley renamed this task from `bin/ssh-auth` uses stale caches for its response to `bin/ssh-auth` cache includes paths which may be (or may become) stale.