Details
- Reviewers
hach-que btrahan - Group Reviewers
Blessed Reviewers - Maniphest Tasks
- T6240: Implement Conduit request signing for host-to-host calls
T4209: Multiserver / High-Availability Configuration - Commits
- Restricted Diffusion Commit
rP657b36dd0685: Allow Phabricator to accept Conduit requests signed with an SSH key
Diff Detail
- Repository
- rP Phabricator
- Branch
- conduitsigned
- Lint
Lint Passed - Unit
Tests Passed - Build Status
Buildable 2409 Build 2413: [Placeholder Plan] Wait for 30 Seconds
Event Timeline
This broadly looks reasonable, some discussion inline.
src/applications/almanac/util/AlmanacConduitUtil.php | ||
---|---|---|
22–24 | Oh, yuck. We need to cache this. | |
63–66 | We need to sign other stuff too, not just the parameters. For example, the method is not part of the signature, so you can replay a call to user.query with no parameters as a call to topsecretstuff.query. The method, host, and all the metadata should be signed. Ideally, the algorithm for sorting and representing the parameters should also be specified, so some code other than this code could produce a signature. This might be too much of a pain to deal with right now, but we'll probably have to do it sooner or later if we don't, and changing the signature scheme will break existing callers. For example, AWS has a bunch of sorting/encoding rules (http://bit.ly/wU0JFh). I don't think we can use the same rules, since AWS doesn't transmit objects/arrays. Offhand, one issue is that array() will encode as {}, when the encoding in another language would likely be []. It might also be worthwhile to include a timestamp and a nonce to prevent replays completely, although we could safely add those later. Another issue with this signature scheme is that JSON isn't binary safe, so we won't always be able to sign the JSON encoding of a request (T5955). I guess we have to specify a signature per encoding type (yuck). Or we could just provide some unambiguous binary-safe way to serialize lists and objects and use it for everything. Maybe that's cleanest? | |
src/applications/conduit/controller/PhabricatorConduitAPIController.php | ||
219–221 | We should let these requests act as a user. For example, when you view a repository in Diffusion, you should be authenticated as yourself on the other side of the Conduit call, so you can't see anything you couldn't normally see. |
src/applications/almanac/util/AlmanacConduitUtil.php | ||
---|---|---|
22–24 | Yeah, in hindsight we probably need to have an additional field on the authorized host table; the current one for OpenSSH public key (in case we want servers to SSH into each other or something of that nature) and the OpenSSL public key. | |
63–66 | I'll address the signing not including method name, host and metadata. As for the other stuff: I guess this implementation is strongly orientated around letting Phabricator instances internally call each other, and not so much external entities using public / private keys to sign requests and make them. In this implementation, there's no way to register an external device (without running bin/almanac register and the same DB connection) to authorize it to make signed requests like this. So basically you can really only register Phabricator for these kinds of signed requests at the moment. This is also the reason that the signing code isn't in D10402 (and the reason it pulls in those paths to do the signing). Now we can change the signing / verification mechanism later on to support external (non-Phabricator servers making calls). Since all of the tiers need to be upgraded in sync because of DB changes (i.e. you can't have an older web tier talking to a newer storage tier), I think it's pretty safe to change the mechanism to offer a more consistent mechanism for external code later. That said, if there are places where we need to make Conduit calls that contain binary data, between servers in order for HA to work, then we probably need to address this now instead of later. | |
src/applications/conduit/controller/PhabricatorConduitAPIController.php | ||
219–221 | Should web requests not just use the (current) way of making Conduit requests (by getting a session for the user and using that)? |
I'd guess the signature algorithm needs to look something like:
- Build up some object with the parameters and also all the metadata.
- Encode the object as a value, with these rules:
- If the value is an object/dictionary, sort the keys. Write "O", then the number of keys, then a colon, then encode each key and each value in order.
- If the value is an array, write "A", then the number of elements, then a colon, then encode each value in order.
- If the value is a string, write "S", then the number of bytes, then a colon, then the bytes of the string.
- If the value is an integer, write "I", then the number of digits in the integer, then a colon, then the digits of the integer.
- If the value is null, write "N", then a colon.
- If the value is false, write B0:.
- If the value is true, write B1:.
- Sign the resulting string.
I think this should be pretty easy to implement in another language, and easy to understand/debug (no ambiguity around which characters get urlencoded, for example, and theoretically fairly easy to implement in-place in memory if you're into that kind of thing), and binary safe, while giving every request a unique encoding.
(Or we could look if there's a standard way that everyone else does it, but the need to be binary-safe and support signatures across encodings seems likely to make things complicated and prevent reuse of any standard signing technique.)
We might be able to use protobuf. That has implementations in most languages and I think it should support all the types we need.
(Then we just need to document the ordering semantics since fields are by numeric ID)
We need at least one binary encoding as an alternative to JSON anyway, although the id-encoding of protobuf might be less convenient than BSON or messagepack.
I think messagepack is probably very close, although some of the encodings aren't human-readable (minor) and array ordering isn't part of the spec (easy to compensate for by sorting before or during encoding).
I guess messagepack technically has ambiguous encodings, like the integer 1 can be encoded as either 0x01 or 0xcc 0x01, but presumably no encoders do this and there's an unambiguous minimal encoding.
Rework with signing / verification in libphutil. Also use public keys for identifiation
- Reduce scope: only accept SSH signed requests, don't try to support host-to-host communication yet.
- Use the more modern key storage stuff.
- Work slightly harder to raise usable errors.
src/applications/auth/sshkey/PhabricatorAuthSSHPublicKey.php | ||
---|---|---|
116–118 ↗ | (On Diff #26033) | The big messy issue here is that this doesn't work on OSX because ssh-keygen fails. We only need to do this on the host, so it's not a huge issue (you can still sign requests on an OSX client, and it's unlikely that anyone will want to deploy OSX host clusters too soon, so maybe this will get fixed by then). It does make testing on an OSX host a big pain, though. My plan to partly reduce that is:
|
Actually, if you have the private key, you should be able to generate the correctly formatted public key with:
openssl rsa -in <private-keyfile> -pubout
...without needing to use some other intermediate host to run a good ssh-keygen. But that requires the private key, so we can't use it on the host (which only has knowledge of the public key).
Is there no newer binary for ssh-keygen on OSX / version through homebrew or something like that? Presumably it's possible to grab the OpenSSH code and build a more modern version that supports pcks8 (and then no caching is needed for a single platform unlikely to be used in production).
If you wanted to host a cluster on OSX you'd have to do that, yeah. For development, it's probably easier to just prime the cache.
I don't think it's a version problem, I think it's an issue with Apple's build. Mavericks (where I'm hitting the issue) has 6.2p2, which is recent.
I also found a couple of references to it possibly working in 6.1 and being broken in 6.2, but this server (secure.phabricator.com) also has 6.2p2 and everything works fine, which leads me toward believing this is an Apple thing.
(We want the cache no matter what, I don't want to shell out to ssh-keygen on every Conduit request, so the only special code would be priming it artificially.)
Oh right, because the pcks8 version isn't being stored when the Almanac device is registered (like in the previous diffs), so it has to be calculated + cached at runtime anyway.
Yeah. That cache could technically go in the SSH key table (and maybe I'll just put it there) but we don't have the cache anywhere else now since I unified SSH key storage across users and devices.
Sorry I missed this one for a bit.
src/applications/auth/sshkey/PhabricatorAuthSSHPublicKey.php | ||
---|---|---|
116–118 ↗ | (On Diff #26033) | I know 3ish Mac OSX developers... :D I think it would be neat if we had some command like bin/almanac gen-and-prime-ssh-key <public-key> this command should fatal horribly on mac osx, letting you know it won't work, and then otherwise it should kind of collapse steps 2 and 3 from above. |
src/applications/auth/sshkey/PhabricatorAuthSSHPublicKey.php | ||
---|---|---|
116–118 ↗ | (On Diff #26033) | well, I meant fail horribly if the server is mac osx; clients can be whatever. |
I know 3ish Mac OSX developers... :D I think it would be neat if we had some command like
bin/almanac gen-and-prime-ssh-key <public-key>
this command should fatal horribly on mac osx, letting you know it won't work, and then otherwise it should kind of collapse steps 2 and 3 from above.
Just to respond to this specifically, since I'm about to shoot a diff out for it:
- If your ssh-keygen works, we're just doing the whole thing inline, so you don't have to do any work -- everything works automatically, and you don't have to run any scripts. This is everywhere except OSX.
- If you're on OSX, ssh-keygen is busted, so writing a CLI script that says "ssh-keygen is busted" wouldn't be very helpful. The automatic thing is going to fail with a message "ssh-keygen is busted, go run xxxxx" instead, and then you'll have to go run that. I think there's no real way around this.