Phacility deploy workflow should not conflate versions-for-deployment with "latest stable release"
Open, LowPublic
Actions

Assigned To

Authored By

	epriestley
	Aug 1 2019, 5:48 PM

Description

See PHI1363. In the Phacility cluster, bin/remote deploy and related workflows currently deploy the stable branch. This is normally reasonable, but if an incident that requires a redeploy occurs between a release cut and the next regular deploy, redeploying affected nodes will upgrade them as a side effect.

This window is usually small (Friday evening to Saturday morning) and issues which require node redeployment are rare, but we're asking for trouble by conflating "latest stable release version" with "desired deploy version". (In this case, the release cut at an unusual time and we suffered what seems to be an unusual hardware failure.)

Although most versions of Phabricator aren't wildly incompatible across release boundaries, any change which adds a database migration will cause the new version to complain that the migration hasn't been run, so the user-facing behavior is "all HTTP service calls to the node fail".

(See also T13320, which added a --hold-versions flag to bin/host upgrade. This is a step in the right direction, and it's good that this flag is reducing conflation of "actual deployed version" with other versions, but this flag is not helpful if an entire node is dead: even if bin/remote deploy accepted --hold-version, there's no version to hold if we're doing a clean deploy on a fresh host.)

Related Objects

Mentioned In: T13466: AWS instance termination may fail/hang indefinitely
Mentioned Here: T13320: Unannounced/unlogged AWS reboots are apparently routine procedure, not cosmic rays / ghosts

Event Timeline

epriestley triaged this task as Low priority.Aug 1 2019, 5:48 PM

epriestley created this task.

Herald added a subscriber: amckinley. · View Herald TranscriptAug 1 2019, 5:48 PM

epriestley mentioned this in T13466: AWS instance termination may fail/hang indefinitely.Nov 25 2019, 3:57 PM

Phacility deploy workflow should not conflate versions-for-deployment with "latest stable release"Open, LowPublicActions

Description

Related Objects

Event Timeline

Phacility deploy workflow should not conflate versions-for-deployment with "latest stable release"
Open, LowPublic
Actions