Page MenuHomePhabricator

Evaluate support for AWS IAM Roles in S3 Client
Open, LowPublic

Description

AWS allows you to authenticate with IAM Roles, which are temporary credentials stored on each host and accessible to any process on the host by performing an HTTP request to a secret local server.

My understanding is that a simple IAM workflow might work like this:

  • An operator goes to the AWS console and says that web001.mycompany.com has permission to access S3.
  • When software on web001.mycompany.com goes to access S3, it first makes a local HTTP request to the local instance metadata service in AWS, at http://169.254.169.254/.
    • This service is automatic and not authenticated. Any process on the host may access it and retrieve credentials and any other data it exposes.
  • After retrieving a credential from the instance metadata service, it uses it to sign requests.

We do not currently support this. Specifically, Phabricator does not have the code required to make the call to the instance metadata service.


Earlier Description

We should write our own S3 client instead of using an external. S3 has a fairly simple API.

Original Description

It would be great if IAM roles could be used to provide credentials for S3 instead of explicitly providing an S3 access key. I think that in order to do this, the S3 library would likely need to be upgraded/replaced. I'm not sure if the S3 library supports IAM roles or not.

Event Timeline

joshuaspence raised the priority of this task from to Needs Triage.
joshuaspence updated the task description. (Show Details)
joshuaspence added projects: Phabricator, Files.
joshuaspence added a subscriber: joshuaspence.
joshuaspence triaged this task as Wishlist priority.May 25 2014, 10:25 PM
joshuaspence raised the priority of this task from Wishlist to Low.Jun 4 2014, 7:04 PM

Increasing the priority for this task because it is something that we would really like to use.

I've opened an issue on the amazon-s3-php-class project with regards to supporting IAM credentials. See https://github.com/tpyo/amazon-s3-php-class/issues/84.

Another option is the official AWS PHP SDK which, I believe, already has support for the use of IAM credentials. Unfortunately, however, the SDK requires PHP v5.3.3+.

After discussion with @epriestley, it's probably best to eventually provide a native S3 client, rather than relying on an external library. There already exists a PhutilAWSFuture class, which could probably be extended to support S3. My rough plan for this is as follows:

  1. D9781: Add a `PhutilAWSS3Future` class
  2. D9782: Add support for instance profile credentials
  3. Create a PhutilAWSS3Client class which can act as a drop-in replacement for the current S3 library.
joshuaspence renamed this task from Update S3 library to support IAM Roles to Write a native S3 client.Sep 21 2014, 9:23 AM
joshuaspence updated the task description. (Show Details)
joshuaspence added a project: libphutil.

I have half a diff for this (D10530) that I cab probably finish off. The main issue I had was with signing the requests and also the difficulty associated with testing.

Support for streaming files (vs fire-and-forget requests) -- which I think was discussed somewhere -- is no longer very important after T7149. We can now store files of arbitrary size in engines that only expose the existing writeEntireFile($big_chunk_of_data) API without major downsides.

It would still be nice to get IAM support.

I think there are two different issues with IAM.

One is using IAM Users, which let you create credentials with reduced access (for example, only access to S3). This is a sensible step in limiting access, but I believe IAM user credentials have the same format as primary credentials an should already work properly.

The second is using IAM Roles, which I'm not as thrilled about. We've already had one vulnerability which would have exposed them, see:

I think putting credentials in an unauthenticated container that is vulnerable to SSRF isn't a great idea as a general policy. The existence of this container is in no way obvious, and per the HackerOne report it sounds like other software has the same issue (not considering SSRF to be a high-risk vector, although it is hugely high-risk inside EC2).

Even after T6755, we are not completely immune to SSRF if the attacker can get enough stars to align.

Why are you interested in using IAM roles? Is it primarily because they're slightly easier to use than distributing IAM user credentials?

How do you feel about the SSRF risk?

Perhaps notable is that this was filed this ~a year before I learned about this magic service and the associated SSRF risk, and production deployment of IAM roles would have been vulnerable until then if we'd implemented it in 2014.

(The case for doing this is weaker if it isn't for IAM Roles, but the S3.php thing doesn't support signature v4 (which AWS now requires in some regions) and the official SDK doesn't support older PHP, and having a first-party, future-based client will perhaps makes things easier eventually with T5544.)

epriestley renamed this task from Write a native S3 client to Evaluate support for AWS IAM Roles in S3 Client.Jan 10 2016, 2:38 PM
epriestley updated the task description. (Show Details)

I've realigned this around support for IAM Roles, because we now use a first-party client after D14982.

I believe you can implement IAM Roles entirely as an extension after that lands:

  • Copy/Paste PhabricatorS3FileStorageEngine to create PhabricatorS3WithIAMRoleFileStorageEngine.
  • Have the new engine pull credentials off the instance metadata service instead of from configuration.
  • Use addHeader() to explicitly add the additional X-Amz-Security-Token header required when using temporary credentials.

I'm not 100% sure this will work, but from my read of the documentation I think it should. It doesn't look like the Security-Token header needs to be specially involved in the signing process.

Roughly, here's how I currently evaluate IAM Roles against traditional credentials:

Pros

  • More robust against some attacks (e.g., that steal physical hardware or capture backups) because there is no key on disk and credentials rotate automatically.
  • May be slightly easier to manage? (I'm not sure about this since other types of keys need to be managed traditionally anyway, and it seems easier to use one approach to manage everything than two different approaches?)

Cons

  • Less robust against SSRF attacks.
  • More difficult to test (you can only make calls from a host inside AWS).

I'm more concerned about SSRF attacks than about attacks which could access credentials on hosts. I don't think ease of management is particularly compelling (I likely would not use IAM Roles in the cluster if we did support them, even if I was not concerned about the security risk, because then I'd need to manage two credential mechanisms instead of just one, which is more complicated). And, as the upstream, I have a strong bias against brining things up which make testing more difficult.

I'm concerned about the SSRF risk because we were vulnerable to it for a very long time, discussed above. I did not realize the instance metadata service existed, and believed the SSRF risk to generally be low, because it was implausible to me that anyone would reasonably deploy an unauthenticated HTTP GET service which spits out credentials to any caller. Obviously, I was mistaken.

While we've fixed this as well as we can, the fix is highly imperfect and there are a lot of ways to punch through it. Beyond "the stars must align" attacks on SSL with DNS control, we are limited in our ability to blacklist calls in repositories and OAuth. We don't control the connections git or hg make and can't protect them from DNS timing attacks technically, and many reasonable configurations rely on cloning from private IP space or accessing OAuth providers in private IP space.

Broadly, SSRF protection involves building up a blacklist of all "bad / unsafe" services, which is very difficult to get right and fails in a way that makes things insecure. This is made more challenging because we have legitimate needs to make some kinds of calls into private IP space.

AWS specifically warns about the SSRF risk:

Warning
If you use services that use instance metadata with IAM roles, ensure that you don't expose your credentials when the services make HTTP calls on your behalf. The types of services that could expose your credentials include HTTP proxies, HTML/CSS validator services, and XML processors that support XML inclusion.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials

I think AWS is being fairly understated about the risk here.

I'd like to see a more compelling argument for IAM Role support being valuable (and that value overwhelming the security concerns) before considering it for the upstream, and I'd generally prefer this not come upstream. I'm open to client adjustments that support development of a third-party storage engine that uses IAM Roles, per above (and believe no changes are actually required), provided they are not unreasonably extensive or complicated (I don't think they will be).

For me, the big advantage to using IAM roles is that the credentials rotate automatically, every hour.

With IAM users, the credentials end up in a file. A lot of people could end up with those credentials. Either by getting access to the backups as you mention, or just by SSHing into the box and looking at the configuration. Any of these people could save these credentials, maybe for a long time, maybe long after their employment has been terminated.

Since IAM role credentials are always rotating, anyone who wants to abuse those credentials (possibly to avoid using their own credentials which would get them caught) only has an hour at most. It's pretty easy to see who SSHed into the server in the past hour. Not so easy to see who might have had access to the filesystem, ever. We recognize that we must necessarily trust some number of system administrators to not abuse their powers. We try to deter abuse by preserving accountability. Service credentials that never rotate undermine accountability.

Also, when a credential is compromised, if we are using IAM roles we already have in place the mechanism to rotate that credential. It's not fun as a system administrator to learn at 2 AM that a credential has been compromised and then have to figure out where it's been used, and if you can rotate it without knocking out an entire cluster of servers that all use it, and to figure out how to update the credentials for a bunch of services that you didn't write.

It is far easier to tell everyone to use IAM roles. It's not difficult to audit an AWS account for compliance with this policy.

Since IAM role credentials are always rotating, anyone who wants to abuse those credentials (possibly to avoid using their own credentials which would get them caught) only has an hour at most. It's pretty easy to see who SSHed into the server in the past hour. Not so easy to see who might have had access to the filesystem, ever.

The IAM role vunerability descibred above does not necessarily require SSH access to a machine. If you can get software to perform a HTTP GET request on your behalf to the instance metadata IP address (and Phabricator supports configuring HTTP GET requests in a number of places, although there is a blacklist as @epriestley described above), then you can potentially gain access to credentials without having SSH access to the machine that has the IAM role assigned to it.

In addition, once they have access to those credentials, they may be able to spin up new EC2 instances. In this case, the hour limit provides no protection because once you spin up a new instance you can pretty much do whatever you like.

I think you missed the point. The attacker in this instance is an employee, or a contractor, or an ex-employee. If you have static credentials sitting in files on your servers, all your ex-employees probably have them, too. And they can use them until you rotate them, which is pretty much never, in practice.

You can use a policy like this to restrict a key to requests originating from a particular VPC:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1457549907000",
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:sourceVpc": "vpc-a908f83e"
                }
            },
            "Resource": [
                "arn:aws:s3:::phacility-example/*"
            ]
        }
    ]
}

Ex-employees presumably do not have access to hosts in your VPC.

That's a good point, and certainly makes me feel less bad about putting IAM user credentials in a file on the host.

However, I'd still want to rotate those credentials, if for no other reason to demonstrate I can. A problem we've had with IAM user credentials is that employees innocently copy them and use them for other things that need the same capabilities. Then the credential gets compromised, and we need to rotate it. But with the credential in use in many places, rotating the credential breaks many things. It's not always easy to figure out what will be broken, and it's not always easy to distribute the new credential to all the things that need it. Trying to solve for these unknowns only after the credentials are compromised is the opposite of a good idea.

Of course we ask people not to copy credentials, but they do it anyway. Often they are "just testing" and then they forget. Periodically rotating keys is the solution: if someone copies a credential and it rotates every night, it will only work for a day at most. Thus the situation tends to sort itself out. We periodically audit all of our AWS credentials to find any that aren't being rotated. Usually these audits find nothing now that we have automation for creating EC2 instances with associated IAM roles.

Even if restricted to a particular VPC, a compromised credential can be used for evil. There are a lot of hosts in a VPC, and most of them I do not administer, so I am doubtful of their security. Many of them have no AWS credentials at all, so a compromise would be limited to that host. That's no longer the case if the attacker has AWS credentials from some other host (the most likely scenario being a former employee).

One solution to this would be to get the credentials from the standard location of ~/.aws/credentials. Then I could use a tool like awsrotate in cron to rotate the key. This is how we handle the rotation of developer's and administrator's credentials. It would complicate my installation of Phabricator since I'd now need to:

  1. make sure cron is running (it isn't, currently)
  2. install awsudo and awsrotate
  3. configure a job in cron
  4. implement monitoring to assure that cron job is running
  5. run Phabricator as a user that has a home directory

Phabricator would also need to read the new credentials from ~/.aws/credentials when it changes. I haven't checked if it does that or not.

Alternately, Phabricator could support IAM roles. If the SSRF vulnerability has been fixed, and it sounds like it has, then this isn't introducing any new security problems that I see. It's fair to dislike it philosophically, but the fact remains that AWS will always support IAM roles, nearly every AWS-using program in existence supports them too, and even if Phabricator does not support them, you'll still need to consider SSRF attacks since you can't know what other services are running in Phabricator's network, and exposing the EC2 metadata isn't great, even if it does not contain credentials.

As a general note here, we've been vulnerable to credential theft from the local service throughout this discussion, and still are until T12701 resolves: attackers can create a Harbormaster build plan which sends requests to 169.254.169.254, then read credentials from the output of the "failed build".

Users can also currently add http://169.254.169.254/ as a Git repository, although I'm not certain if they can actually capture credentials by doing this.

As a general note here, we've been vulnerable to credential theft from the local service throughout this discussion, and still are until T12701 resolves: attackers can create a Harbormaster build plan which sends requests to 169.254.169.254, then read credentials from the output of the "failed build".

Isn't that a general problem with running builds on instances within your otherwise-trusted network? Ideally the instances running Harbormaster builds would be isolated from other infrastructure, and would use a different IAM role with limited (ideally none) IAM access.

No. It's not that the builds can access credentials on the build hosts -- that part is a general problem -- it's that the Phabricator application hosts can be made to divulge their credentials (which are necessarily the same as the S3 credentials, because the same hosts must access S3) by instructing them to run builds that use 169.254.169.254 as a "build server".

Shouldn't this behavior be blocked by the default settings of security.outbound-blacklist? 169.254.0.0/16 appears in that list, but as you say, a Harbormaster fetch to http://169.254.169.254 succeeds as expected

Harbormaster and Diffusion do not use the outbound blacklist, since it would prevent users from interacting with local-network build hosts (for example, a local install of Jenkins) or local-network repositories (for example, a local install of GitHub Enterprise). We currently consider these use cases to be common / valuable.

We could subject Harbormaster to the outbound blacklist, since we control the HTTP requests.

We could make an effort to defuse the attack in Diffusion, but I think it would ultimately not be effective. I believe it is impossible for us to blacklist hosts in Diffusion because, e.g., we can not pre-resolve DNS for git clone, so a DNS timing attack can (at least in theory) evade any protection we erect. We could prevent http (vs https) and maybe git by default, as HTTPS is difficult/impossible to retarget with DNS.

Either change would break existing installs, and make Phabricator more difficult to install/configure.

There may also be a similar attack possible by adding address in local network space as a JIRA install, outbound HTTP hook (unlikely a problem today, but maybe in the future), Phabricator OAuth install, LDAP server, etc. Today, we're generally balancing things as "by default, it's OK for administrators to register services in local network space".

FWIW, I am using a custom file and mail engine that utilises the AWS SDK and supports instance profile credentials. See P2082 and P2083. You need to install and require the AWS SDK before using these extensions.