Page MenuHomePhabricator

Phacility deploy workflow should not conflate versions-for-deployment with "latest stable release"
Open, LowPublic

Description

See PHI1363. In the Phacility cluster, bin/remote deploy and related workflows currently deploy the stable branch. This is normally reasonable, but if an incident that requires a redeploy occurs between a release cut and the next regular deploy, redeploying affected nodes will upgrade them as a side effect.

This window is usually small (Friday evening to Saturday morning) and issues which require node redeployment are rare, but we're asking for trouble by conflating "latest stable release version" with "desired deploy version". (In this case, the release cut at an unusual time and we suffered what seems to be an unusual hardware failure.)

Although most versions of Phabricator aren't wildly incompatible across release boundaries, any change which adds a database migration will cause the new version to complain that the migration hasn't been run, so the user-facing behavior is "all HTTP service calls to the node fail".

(See also T13320, which added a --hold-versions flag to bin/host upgrade. This is a step in the right direction, and it's good that this flag is reducing conflation of "actual deployed version" with other versions, but this flag is not helpful if an entire node is dead: even if bin/remote deploy accepted --hold-version, there's no version to hold if we're doing a clean deploy on a fresh host.)