Paths

Table of Contentst

Differential D14116

storage status`
AbandonedPublic
Actions

Authored by joshuaspence on Sep 16 2015, 6:10 AM.

Details

Reviewers

epriestley

Group Reviewers

Blessed Reviewers

Summary

Return a non-zero status from ./bin/storage status if there are unapplied patches.

Test Plan

Added an empty file to the resources/sql/autopatches/ directory and ran ./bin/storage status. Saw a non-zero exit status.

Diff Detail

Repository

rP Phabricator

Branch

master

Lint

Lint Passed

Unit

Tests Passed

Build Status

Buildable 7990
Build 9010: [Placeholder Plan] Wait for 30 Seconds
Build 9009: arc lint + arc unit

Event Timeline

joshuaspence updated this revision to Diff 34109.Sep 16 2015, 6:10 AM

joshuaspence retitled this revision from to Return a non-zero status from `./bin/storage status`.

joshuaspence updated this object.

joshuaspence edited the test plan for this revision. (Show Details)

joshuaspence added a reviewer: epriestley.

Herald added a reviewer: Blessed Reviewers. · View Herald TranscriptSep 16 2015, 6:10 AM

Herald added a subscriber: Korvin. · View Herald Transcript

I think a better solution to whatever problem you are facing is likely a --json flag, not an exit status.

Callers generally can not distinguish between system failure, shell failure, script failure, interpreter failure, random segfault, argument errors, etc., and specific exit conditions by examining exit statuses. It is much more important to me that callers can be implemented correctly than that callers can be implemented easily. If you want an easy implementation, pipe the output to grep and look for "Not Applied". This patch makes this case particularly bad because it may cause the script to exit with literally any status code.

I believe exit status is generally represented as an unsigned byte internally, so (a) "-1" is not really a meaningful status and (b) incrementing a counter will run off the end of the range. For example, the shell does not appear to recognize exit(777) on my system:

$ php -r 'exit(777);'
$ echo $?
9

My system has 914 storage patches, so we exceeded the size of a byte long ago.

Because of this behavior, callers can not distinguish between 256, 512, ... unapplied patches and 0 unapplied patches on some (probably most?) systems:

$ php -r 'exit(512);'
$ echo $?
0

This is also why "-1" is meaningless:

$ php -r 'exit(-1);'
$ echo $?
255

This can't be distinguished from 255 unapplied patches, at least on my system.

I'll accept a --json flag if you have a good reason to test this instead of just running bin/storage upgrade unconditionally (this is what we do in the Phacility cluster).

This revision now requires changes to proceed.Sep 16 2015, 11:51 AM

The use case here is for having serverspec tests that run on our Phabricator hosts (independently of our puppet manifests which do run ./bin/storage upgrade. The problem with using JSON is that it really complicates the actual test case, which would end up looking like this:

context command('./bin/storage status --json') do
  # Not real code, I fail at Ruby.
  JSON(command.stdout)['success'] should eq true
end

Instead of:

context command('./bin/storage status') do
  its(:exit_status) { should eq 0 }
end

I probably got carried away with the exit status and would probably be fine with returning 2 (or some other constant integer) if there are unapplied patches.

I probably can't be bothered writing a --json mode so the alternative for me would be this:

context command('./bin/storage status') do
  its(:stdout) { should_not match(/Not Applied/) }
end

I think that's fairly reasonable as a correctness vs ease of use tradeoff (although test the exit code for 0, too, if that's not automatic)?

Generally, I never want to accept patches which make correctness impossible if someone does actually want to do things right. In my view, this generally means that exit codes can only signal total/catastrophic failure and can't be meaningfully distinguishable (broadly, the system may cause commands to exit with codes we consider "meaningful" without reaching the command actually doing the "meaningful" thing).

For example, you can get a 127 exit code if PHP is not installed on the system.
You can get a 126 exit code if you don't have permission to execute PHP.
You can get a 1 exit code if you try to run a script you can't read (php -f unreadable).
You can get a 143 exit code by SIGTERM'ing the process.
You can get a 137 exit code by SIGKILL'ing the process.

As far as I know, there is no way to enumerate all the possible codes or guarantee that they are stable across systems and shells.

Obviously, Linux as a whole works fine overall with exit codes everywhere and my viewpoint here is philosophically unusual since pretty much everyone is fine with the simplifying assumption that exit codes are fully controlled by the command you ran, but this just feels like a big janky mess that I want no part of to me.

In my view, T6996 is a particularly compelling example of exit codes being unsound for handling application-level errors: it is very difficult to run x | y correctly (i.e., checking errors), and impossible to do so portably (you have to use a bunch of bash nonsense) as far as I was able to determine. Exit codes have a place, I just view it as a place very low in the stack which shouldn't be involved in application-level error behavior.

epriestley mentioned this in D14173: Explain upstream attitudes toward CLI exit codes.Sep 27 2015, 5:32 PM

epriestley mentioned this in rPc99508cfe26f: Explain upstream attitudes toward CLI exit codes.Sep 27 2015, 8:10 PM

joshuaspence abandoned this revision.Oct 31 2015, 11:19 AM