Page MenuHomePhabricator

Fully separate live credentials from development repositories
Closed, ResolvedPublic

Description

Currently, some keys live in the rCORE repository. These keys are all either inactive outside the cluster or imply no elevated level of access, but we will have some keys in the future which do not have these properties (e.g., stripe API keys, S3 API keys) and it would be nice to have stronger technical access barriers to even the useless/inactive keys: a design intern working on the instances portal can't use client.key, but also shouldn't have access to it.

For now, I'm going to make a separate key store and have rCORE contain "development" credentials. In production, it will deploy to point at the production keystore instead.

Revisions and Commits

Event Timeline

epriestley raised the priority of this task from to Normal.
epriestley updated the task description. (Show Details)
epriestley added a project: Phacility.
epriestley moved this task to Do After Launch on the Phacility board.
epriestley added a subscriber: epriestley.

Making the deploy.key available in order to make the other production keys available will take a bit of work. I think we have to pick it up as a side effect of deploying through the bastion. I was originally thinking I'd just lock to the cluster range, but the visible remote address won't be on the VPC.

epriestley added a commit: Restricted Diffusion Commit.Feb 27 2015, 5:44 PM

The only important key we have which is active outside the cluster (production host identity key) is now separated, and deployment credentials have been made fully separable. (SSL keys are also active outside the cluster, but deployed manually at the moment.) We can improve segmentation here over time, but we're in stable shape now.

epriestley renamed this task from Separate live credentials from development repositories to Fully separate live credentials from development repositories.Feb 28 2015, 10:38 PM
epriestley moved this task from v1 Open Beta to Do After Launch on the Phacility board.

I am fully separating these credentials now in preparation for deploying sbuild with rCORE. Specifically:

  • I am generating new, separated client.key, device.key and deploy.key keys in rKEYSTORE.
  • I'll add these keys to instances-client on admin.phacility.com, the device record on admin.phacility.com, and the deploy agent on this host, respectively.
  • I'll push all hosts in all tiers to switch to the new keys.
  • I'll remove the old keys.

(I may also make regeneration of the development version of these keys mandatory in sandboxes at some point, although the risk this creates is virtually nonexistent.)

I will also limit access to rCORE on sbuild hosts, but an attacker should gain no foothold by having access to rCORE.

  • I've generated new keys (rKEYSTORE851c37c6).
  • I've added the public keys to the relevant users/devices.
  • I've realigned rCORE to always read keys from conf/keystore/ and am now going to start cycling services. If I got things right, they'll transparently pick up the new keys.

This does make the availability of the deploy key (per T7399#98823) a little tricky, but I'm just requiring it be locally available from the controlling host for now, which is reasonable and sufficient. This could be picked up on the bastion host in the future (at least, when deploying non-bastion hosts).

Assuming cycling goes cleanly it should also be straightforward to swap these keys in the future if we have a need to.

epriestley added a commit: Restricted Diffusion Commit.Sep 21 2015, 1:25 PM

I've pushed some non-production and semi-production services (aux, sbuild, bastion, admin) but sorting out deployment keys took a little longer than I anticipated (I don't want to put the general cluster deployment key on the sbuild tier at all) and we're edging into normal working hours for a larger portion of Phacility users so I'm going to hold off on more service cycling until tomorrow.

I'm resuming work here. Next steps are:

  • Deploy new keys everywhere.
  • Remove old keys.

No issues since yesterday. I expect this to be fairly straightforward.

epriestley added a commit: Restricted Diffusion Commit.Sep 22 2015, 12:29 PM
epriestley added a commit: Restricted Diffusion Commit.

I think all the pushing is complete now. Ran into two issues:

  • The deploy user on meta.phacility.com also needs a copy of the deploy.pub key to authorize deploying the corp tier. I added the new key.
  • Synchronizing daemons.phacility.net device keys to instances isn't trivial, and how this works should change.

On the second issue:

  • Each instance has a list of trusted keys. The new key should have been added, then keys swapped, then the old key removed.
  • What actually happened was that I swapped keys, realized all the services were out of sync, then deployed + sync'd them. This wasn't a huge deal but wasn't good.

Instead, it should work like this:

  • services sync should sync what's on admin.phacility.com, not what's on disk.
  • services sync should be able to sync multiple keys.
  • There should be a remote sync/host sync command which synchronizes all instances on a host.

If we had these, this would have gone more smoothly.

I'm now going to remove old keys.

This part seems to have gone cleanly:

  • I removed the old SSH key from deploy on secure.phabricator.com.
  • I removed the old SSH key from instance-client on admin.phacility.com.
  • I removed the old SSH key from daemons.phacility.net on admin.phacility.com.
  • I removed the old SSH key from deploy on meta.phacility.com.

I tested web requests and repository requests to instances, and deployed hosts in the aux and corp tiers to vet those changes.

Progress:

  • rCORE no longer contains any live SSH key material (nor are any keys in the repository history live).

Next steps:

  • rSAAS still contains key material which needs to be extracted and cycled at some point, but this is straightforward and I do not plan to deploy this repository to sbuild.
  • rCORE still contains some live API key material.
  • The services sync changes above need to happen before this can be a routine operation, because there currently no clean way to cycle trusted keys on instance services.

I'm going to stop here for today and let this settle. I'll probably extract the rSAAS key material and API key material tomorrow and leave the services sync stuff for the future.

I'm going to work on separating the rSAAS key material and remaining rCORE key material now.

The rSAAS material provides administrative access to the bastion host, and should be straightforward to cycle. I don't expect it to disrupt anyone.

The rCORE key material is Mailgun API keys, and it looks like we can't have two active at the same time. They're also per-account global values. So I may just set those up to cycle now, then do the actual cycle with the push on Saturday to limit disruption.

Diffusion added a commit: Restricted Diffusion Commit.Sep 24 2015, 11:27 AM
epriestley added a commit: Restricted Diffusion Commit.Sep 24 2015, 11:29 AM
Diffusion added a commit: Restricted Diffusion Commit.Sep 24 2015, 11:42 AM
epriestley added a commit: Restricted Diffusion Commit.Sep 24 2015, 11:49 AM
epriestley added a commit: Restricted Diffusion Commit.Sep 24 2015, 11:53 AM
Diffusion added a commit: Restricted Diffusion Commit.Sep 24 2015, 12:11 PM
epriestley added a commit: Restricted Diffusion Commit.Sep 24 2015, 12:12 PM
  • I've separated key material from rSAAS and cycled the key. It was always ephemeral, but now it is managed in a more standard way.
  • I've separated all API key material from rCORE, but not cycled it. There were a couple more keys than I found earlier (ReCaptcha, SES + S3 for this host).

I'm generally unsatisfied with how key deployment currently works. I think rKEYSTORE should probably be bastion-only and all deployment of key material should be ad-hoc from the bastion based on tier requirements. Although I believe all material is deployed correctly now and effectively minimally available, it's not completely consistent and there's no way to just say "put keys X and Y on this tier" without putting the entire keystore there.

So near-term plans are now:

  • Cycle the API keys so nothing in the rCORE history is live (likely on Saturday morning).

Then eventually;

  • Make key deployment a part of the deploy process and remove rKEYSTORE from hosts.
  • Fix services sync to make key cycling repeatable with minimal disruption.
Diffusion added a commit: Restricted Diffusion Commit.Sep 26 2015, 1:48 PM
Diffusion added a commit: Restricted Diffusion Commit.Sep 26 2015, 1:57 PM

(Testing S3 / SES keys on this host...)

shibe.jpg (266×400 px, 29 KB)

API keys are now all cycled (except ReCAPTCHA, which doesn't seem to be cyclable and is a ~zero-value target anyway) and everything seems to have swapped over cleanly.

Mailgun is actually easier to cycle than I thought: although it only has a "cycle" button, the old key remains active for 24 hours after you cycle, so cycling was seamless.

We should be cleared on the operational front to begin deploying an sbuild tier attached to the secure tier.

epriestley added a commit: Restricted Diffusion Commit.Sep 28 2015, 11:33 AM

I deployed the sbuild and saux tiers after this, and they've been running in production for some time.

Key management is still a little less granular than I'd like. Generally, you either get rKEYSTORE or you don't, which leads to two problems:

  • Some keys are deployed more widely than they really should be, since a host needs rKEYSTORE and it gets some extra keys.
  • Some keys are deployed in an ad-hoc way because they shouldn't be on all the hosts that have rKEYSTORE.

These problems would both be fixed by letting tiers specify exactly which keys they need and deploying only those keys, but that's a little bit more involved than what we're doing right now. Overall I'm satisfied with where things are from a security standpoint, they could just be a little cleaner from a maintenance/purity standpoint.