Return a non-zero status from ./bin/storage status if there are unapplied patches.
Details
- Reviewers
epriestley - Group Reviewers
Blessed Reviewers
Added an empty file to the resources/sql/autopatches/ directory and ran ./bin/storage status. Saw a non-zero exit status.
Diff Detail
- Repository
- rP Phabricator
- Branch
- master
- Lint
Lint Passed - Unit
Tests Passed - Build Status
Buildable 7990 Build 9010: [Placeholder Plan] Wait for 30 Seconds Build 9009: arc lint + arc unit
Event Timeline
I think a better solution to whatever problem you are facing is likely a --json flag, not an exit status.
Callers generally can not distinguish between system failure, shell failure, script failure, interpreter failure, random segfault, argument errors, etc., and specific exit conditions by examining exit statuses. It is much more important to me that callers can be implemented correctly than that callers can be implemented easily. If you want an easy implementation, pipe the output to grep and look for "Not Applied". This patch makes this case particularly bad because it may cause the script to exit with literally any status code.
I believe exit status is generally represented as an unsigned byte internally, so (a) "-1" is not really a meaningful status and (b) incrementing a counter will run off the end of the range. For example, the shell does not appear to recognize exit(777) on my system:
$ php -r 'exit(777);' $ echo $? 9
My system has 914 storage patches, so we exceeded the size of a byte long ago.
Because of this behavior, callers can not distinguish between 256, 512, ... unapplied patches and 0 unapplied patches on some (probably most?) systems:
$ php -r 'exit(512);' $ echo $? 0
This is also why "-1" is meaningless:
$ php -r 'exit(-1);' $ echo $? 255
This can't be distinguished from 255 unapplied patches, at least on my system.
I'll accept a --json flag if you have a good reason to test this instead of just running bin/storage upgrade unconditionally (this is what we do in the Phacility cluster).
The use case here is for having serverspec tests that run on our Phabricator hosts (independently of our puppet manifests which do run ./bin/storage upgrade. The problem with using JSON is that it really complicates the actual test case, which would end up looking like this:
context command('./bin/storage status --json') do # Not real code, I fail at Ruby. JSON(command.stdout)['success'] should eq true end
Instead of:
context command('./bin/storage status') do its(:exit_status) { should eq 0 } end
I probably got carried away with the exit status and would probably be fine with returning 2 (or some other constant integer) if there are unapplied patches.
I probably can't be bothered writing a --json mode so the alternative for me would be this:
context command('./bin/storage status') do its(:stdout) { should_not match(/Not Applied/) } end
I think that's fairly reasonable as a correctness vs ease of use tradeoff (although test the exit code for 0, too, if that's not automatic)?
Generally, I never want to accept patches which make correctness impossible if someone does actually want to do things right. In my view, this generally means that exit codes can only signal total/catastrophic failure and can't be meaningfully distinguishable (broadly, the system may cause commands to exit with codes we consider "meaningful" without reaching the command actually doing the "meaningful" thing).
For example, you can get a 127 exit code if PHP is not installed on the system.
You can get a 126 exit code if you don't have permission to execute PHP.
You can get a 1 exit code if you try to run a script you can't read (php -f unreadable).
You can get a 143 exit code by SIGTERM'ing the process.
You can get a 137 exit code by SIGKILL'ing the process.
As far as I know, there is no way to enumerate all the possible codes or guarantee that they are stable across systems and shells.
Obviously, Linux as a whole works fine overall with exit codes everywhere and my viewpoint here is philosophically unusual since pretty much everyone is fine with the simplifying assumption that exit codes are fully controlled by the command you ran, but this just feels like a big janky mess that I want no part of to me.
In my view, T6996 is a particularly compelling example of exit codes being unsound for handling application-level errors: it is very difficult to run x | y correctly (i.e., checking errors), and impossible to do so portably (you have to use a bunch of bash nonsense) as far as I was able to determine. Exit codes have a place, I just view it as a place very low in the stack which shouldn't be involved in application-level error behavior.