Page MenuHomePhabricator

Allow Phabricator to accept Conduit requests signed with an SSH key
ClosedPublic

Authored by epriestley on Sep 2 2014, 10:46 AM.

Details

Summary

Ref T4209. Depends on D10402.

This updates Conduit to support authenticating calls from other servers by signing the request parameters with the sending server's private key and verifying it with the public key stored in the database.

Test Plan
  • Made like 500 bad calls using the stuff in D10402.
  • Made a few valid calls using the stuff in D10402.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

hach-que retitled this revision from to Implement signed HTTP Conduit calls for making requests between servers.
hach-que updated this object.
hach-que edited the test plan for this revision. (Show Details)
hach-que added a reviewer: epriestley.
hach-que edited edge metadata.

This broadly looks reasonable, some discussion inline.

src/applications/almanac/util/AlmanacConduitUtil.php
22–24 ↗(On Diff #25025)

Oh, yuck. We need to cache this.

63–66 ↗(On Diff #25025)

We need to sign other stuff too, not just the parameters. For example, the method is not part of the signature, so you can replay a call to user.query with no parameters as a call to topsecretstuff.query. The method, host, and all the metadata should be signed.

Ideally, the algorithm for sorting and representing the parameters should also be specified, so some code other than this code could produce a signature. This might be too much of a pain to deal with right now, but we'll probably have to do it sooner or later if we don't, and changing the signature scheme will break existing callers.

For example, AWS has a bunch of sorting/encoding rules (http://bit.ly/wU0JFh). I don't think we can use the same rules, since AWS doesn't transmit objects/arrays.

Offhand, one issue is that array() will encode as {}, when the encoding in another language would likely be [].

It might also be worthwhile to include a timestamp and a nonce to prevent replays completely, although we could safely add those later.

Another issue with this signature scheme is that JSON isn't binary safe, so we won't always be able to sign the JSON encoding of a request (T5955). I guess we have to specify a signature per encoding type (yuck).

Or we could just provide some unambiguous binary-safe way to serialize lists and objects and use it for everything. Maybe that's cleanest?

src/applications/conduit/controller/PhabricatorConduitAPIController.php
219–221

We should let these requests act as a user. For example, when you view a repository in Diffusion, you should be authenticated as yourself on the other side of the Conduit call, so you can't see anything you couldn't normally see.

src/applications/almanac/util/AlmanacConduitUtil.php
22–24 ↗(On Diff #25025)

Yeah, in hindsight we probably need to have an additional field on the authorized host table; the current one for OpenSSH public key (in case we want servers to SSH into each other or something of that nature) and the OpenSSL public key.

63–66 ↗(On Diff #25025)

I'll address the signing not including method name, host and metadata. As for the other stuff:

I guess this implementation is strongly orientated around letting Phabricator instances internally call each other, and not so much external entities using public / private keys to sign requests and make them. In this implementation, there's no way to register an external device (without running bin/almanac register and the same DB connection) to authorize it to make signed requests like this. So basically you can really only register Phabricator for these kinds of signed requests at the moment. This is also the reason that the signing code isn't in D10402 (and the reason it pulls in those paths to do the signing).

Now we can change the signing / verification mechanism later on to support external (non-Phabricator servers making calls). Since all of the tiers need to be upgraded in sync because of DB changes (i.e. you can't have an older web tier talking to a newer storage tier), I think it's pretty safe to change the mechanism to offer a more consistent mechanism for external code later. That said, if there are places where we need to make Conduit calls that contain binary data, between servers in order for HA to work, then we probably need to address this now instead of later.

src/applications/conduit/controller/PhabricatorConduitAPIController.php
219–221

Should web requests not just use the (current) way of making Conduit requests (by getting a session for the user and using that)?

I'd guess the signature algorithm needs to look something like:

  • Build up some object with the parameters and also all the metadata.
  • Encode the object as a value, with these rules:
    • If the value is an object/dictionary, sort the keys. Write "O", then the number of keys, then a colon, then encode each key and each value in order.
    • If the value is an array, write "A", then the number of elements, then a colon, then encode each value in order.
    • If the value is a string, write "S", then the number of bytes, then a colon, then the bytes of the string.
    • If the value is an integer, write "I", then the number of digits in the integer, then a colon, then the digits of the integer.
    • If the value is null, write "N", then a colon.
    • If the value is false, write B0:.
    • If the value is true, write B1:.
  • Sign the resulting string.

I think this should be pretty easy to implement in another language, and easy to understand/debug (no ambiguity around which characters get urlencoded, for example, and theoretically fairly easy to implement in-place in memory if you're into that kind of thing), and binary safe, while giving every request a unique encoding.

src/applications/conduit/controller/PhabricatorConduitAPIController.php
219–221

Oh, right.

I do want to give users more explicit control over Conduit tokens/sessions in T5955, but that's fine for now, and we can change it in T5955 if necessary.

(Or we could look if there's a standard way that everyone else does it, but the need to be binary-safe and support signatures across encodings seems likely to make things complicated and prevent reuse of any standard signing technique.)

bencode is similar and unambiguous, but doesn't have null or bool types.

We might be able to use protobuf. That has implementations in most languages and I think it should support all the types we need.

(Then we just need to document the ordering semantics since fields are by numeric ID)

We need at least one binary encoding as an alternative to JSON anyway, although the id-encoding of protobuf might be less convenient than BSON or messagepack.

I think messagepack is probably very close, although some of the encodings aren't human-readable (minor) and array ordering isn't part of the spec (easy to compensate for by sorting before or during encoding).

I guess messagepack technically has ambiguous encodings, like the integer 1 can be encoded as either 0x01 or 0xcc 0x01, but presumably no encoders do this and there's an unambiguous minimal encoding.

hach-que edited edge metadata.

Rework with signing / verification in libphutil. Also use public keys for identifiation

Not sure if this already matches the arc diff?

epriestley edited reviewers, added: hach-que; removed: epriestley.

Yanking this to make it play with D10402.

epriestley retitled this revision from Implement signed HTTP Conduit calls for making requests between servers to Allow Phabricator to accept Conduit requests signed with an SSH key.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
  • Reduce scope: only accept SSH signed requests, don't try to support host-to-host communication yet.
  • Use the more modern key storage stuff.
  • Work slightly harder to raise usable errors.
src/applications/auth/sshkey/PhabricatorAuthSSHPublicKey.php
116–118

The big messy issue here is that this doesn't work on OSX because ssh-keygen fails. We only need to do this on the host, so it's not a huge issue (you can still sign requests on an OSX client, and it's unlikely that anyone will want to deploy OSX host clusters too soon, so maybe this will get fixed by then). It does make testing on an OSX host a big pain, though.

My plan to partly reduce that is:

  • Put a cache in front of this.
  • Add a bin/almanac prime-ssh-key-cache <ssh-key> <pcks8-key>.
  • If you're on OSX, you run ssh-keygen -e -m pcks8 -f ... on your public key on some other server and then use the result to prime the cache and you're good for ~60 days or whatever, until the cache dies.
  • We can point at this in some kind of error message here.
  • This command also doesn't work on older versions of SSH, so it's not like 100% crazy-insane, even though it's highly ridiculous.

Actually, if you have the private key, you should be able to generate the correctly formatted public key with:

openssl rsa -in <private-keyfile> -pubout

...without needing to use some other intermediate host to run a good ssh-keygen. But that requires the private key, so we can't use it on the host (which only has knowledge of the public key).

Is there no newer binary for ssh-keygen on OSX / version through homebrew or something like that? Presumably it's possible to grab the OpenSSH code and build a more modern version that supports pcks8 (and then no caching is needed for a single platform unlikely to be used in production).

If you wanted to host a cluster on OSX you'd have to do that, yeah. For development, it's probably easier to just prime the cache.

I don't think it's a version problem, I think it's an issue with Apple's build. Mavericks (where I'm hitting the issue) has 6.2p2, which is recent.

I also found a couple of references to it possibly working in 6.1 and being broken in 6.2, but this server (secure.phabricator.com) also has 6.2p2 and everything works fine, which leads me toward believing this is an Apple thing.

(We want the cache no matter what, I don't want to shell out to ssh-keygen on every Conduit request, so the only special code would be priming it artificially.)

Oh right, because the pcks8 version isn't being stored when the Almanac device is registered (like in the previous diffs), so it has to be calculated + cached at runtime anyway.

Yeah. That cache could technically go in the SSH key table (and maybe I'll just put it there) but we don't have the cache anywhere else now since I unified SSH key storage across users and devices.

Oops, forgot to get you in on this action.

btrahan edited edge metadata.

Sorry I missed this one for a bit.

src/applications/auth/sshkey/PhabricatorAuthSSHPublicKey.php
116–118

I know 3ish Mac OSX developers... :D I think it would be neat if we had some command like

bin/almanac gen-and-prime-ssh-key <public-key>

this command should fatal horribly on mac osx, letting you know it won't work, and then otherwise it should kind of collapse steps 2 and 3 from above.

This revision is now accepted and ready to land.Nov 17 2014, 6:03 PM
src/applications/auth/sshkey/PhabricatorAuthSSHPublicKey.php
116–118

well, I meant fail horribly if the server is mac osx; clients can be whatever.

This revision was automatically updated to reflect the committed changes.

I know 3ish Mac OSX developers... :D I think it would be neat if we had some command like

bin/almanac gen-and-prime-ssh-key <public-key>

this command should fatal horribly on mac osx, letting you know it won't work, and then otherwise it should kind of collapse steps 2 and 3 from above.

Just to respond to this specifically, since I'm about to shoot a diff out for it:

  • If your ssh-keygen works, we're just doing the whole thing inline, so you don't have to do any work -- everything works automatically, and you don't have to run any scripts. This is everywhere except OSX.
  • If you're on OSX, ssh-keygen is busted, so writing a CLI script that says "ssh-keygen is busted" wouldn't be very helpful. The automatic thing is going to fail with a message "ssh-keygen is busted, go run xxxxx" instead, and then you'll have to go run that. I think there's no real way around this.