amckinley (Austin McKinley)Administrator
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Feb 20 2011, 8:41 PM (338 w, 5 d)
Roles
Administrator
Availability
Available

Recent Activity

Fri, Jul 21

amckinley added a comment to T12847: A Pathway Towards Private Clusters.

Are we going to let customers start with a single EC2 instance, or require them to have at least some form of HA? My plan is to always create the VPC/subnet infrastructure assuming that the customer will have a second AZ (and ALBs for example require you to listen on at least two different subnets).

Fri, Jul 21, 4:32 PM · Ops, Phacility

Thu, Jul 20

amckinley is attending E1584: Spider-Man: Homecoming.
Thu, Jul 20, 1:12 PM · Phacility

Jul 19 2017

amckinley added a comment to T12847: A Pathway Towards Private Clusters.

If we're using ELBs for this (which it looks like we may be able to, per above?) I don't think (?) we can do any path-based stuff.

Jul 19 2017, 4:41 PM · Ops, Phacility

Jul 18 2017

amckinley added a comment to T12847: A Pathway Towards Private Clusters.

We should enumerate all the ports we're going to open up so we can do the security groups sooner rather than later. My list:

Jul 18 2017, 7:44 PM · Ops, Phacility
amckinley added a comment to T12927: Private Clusters: VPN Notes.

Skipping questions where you have the right answer:

Jul 18 2017, 5:09 PM · Clusters, Ops

Jul 14 2017

amckinley created P2066 output.txt.
Jul 14 2017, 7:59 PM
amckinley created P2065 subnetter.py.
Jul 14 2017, 7:58 PM
amckinley added a comment to T12847: A Pathway Towards Private Clusters.

bastion host per subnet

Is "subnet" a typo or am I thinking of the wrong subnets? Not 4 bastions per customer, right? Don't the bastions have to be in the customer VPCs?

Jul 14 2017, 12:40 AM · Ops, Phacility
amckinley added a comment to T12847: A Pathway Towards Private Clusters.

So having done a bunch more reading/thinking about private cluster isolation, I'm starting to lean towards "one VPC per customer per region" instead of "one subnet per customer per region".

Jul 14 2017, 12:18 AM · Ops, Phacility

Jul 13 2017

amckinley created Image Macro "howneatisthat".
Jul 13 2017, 7:34 PM
amckinley added a comment to T12847: A Pathway Towards Private Clusters.

AWS doesn't have a single device which can both listen for TCP on 22 (only ELB) and terminate SSL for websockets (only ALB).

That's not correct. We are routing our websocket traffic for Aphlict through an ELB that has been setup using SSL instead of TCP as the protocol. In Terraform, it looks like this:

Jul 13 2017, 6:51 PM · Ops, Phacility
amckinley added a comment to T12847: A Pathway Towards Private Clusters.

We should also make a decision about whether or not we want to use "dedicated" AWS instances: https://aws.amazon.com/ec2/purchasing-options/dedicated-instances/

Jul 13 2017, 4:31 PM · Ops, Phacility

Jul 12 2017

amckinley added a comment to T12856: Evaluate various "infrastructure-as-code" products.

Terraform Review
Pros:

  • Written in Python, which everyone knows is superior to Ruby in every way

Huh? Terraform is written in Golang, not Python.

Jul 12 2017, 10:24 PM · Ops, Phacility

Jul 5 2017

amckinley added a comment to T12856: Evaluate various "infrastructure-as-code" products.

@hach-que thanks for the suggestion. I've used CloudFormation with some success previously. My biggest problem is occasionally having a stack just get "stuck" while deploying, and taking forever to timeout on a failure. I've also seen very ambiguous error messages when some random resource fails to deploy, but these are all just anecdotes. I'm working on a demo CF config to give it a try.

Jul 5 2017, 10:49 PM · Ops, Phacility

Jul 3 2017

amckinley accepted D18180: Document the need to restart Phabricator after performing a restore.

I guess there's no way to detect that this bad state and warn the user about it explicitly?

Jul 3 2017, 5:36 PM

Jun 30 2017

amckinley added a comment to T12856: Evaluate various "infrastructure-as-code" products.

Maybe we should just use CloudFormation ¯\_(ツ)_/¯

Jun 30 2017, 8:30 PM · Ops, Phacility
amckinley added a comment to T12856: Evaluate various "infrastructure-as-code" products.

First, a disclaimer that I'm not by any means wedded to terraform. From playing with the other tools, terraform looked like the clearest pathway towards my goal of "minimizing AWS UI clicks", so I decided to see what the terraform config for a private cluster would look like and play with the tool. See example terraform configs here: P2063 P2064. I agree with the general thrust of your comment, which I interpret as "this all sounds terrifying and we probably shouldn't do it", but I wanted to at least get a feeling for the tool and build a proof of concept.

Jun 30 2017, 7:58 PM · Ops, Phacility
amckinley edited P2064 example_customer.tf.
Jun 30 2017, 6:33 PM
amckinley updated the title for P2064 example_customer.tf from uber.tf to example_customer.tf.
Jun 30 2017, 6:29 PM
amckinley created P2064 example_customer.tf.
Jun 30 2017, 3:53 PM
amckinley created P2063 shared.tf.
Jun 30 2017, 3:53 PM

Jun 29 2017

amckinley added a comment to T12847: A Pathway Towards Private Clusters.

And here's the official AWS docs for connecting VPCs using OpenSWAN: https://aws.amazon.com/articles/5472675506466066

Jun 29 2017, 8:26 PM · Ops, Phacility
amckinley added a comment to T12847: A Pathway Towards Private Clusters.

So it occurs to me that we could have one "shared" public subnet that contains: 1 shared Internet Gateway, 1 shared NAT Gateway, and then per-install VPN nodes. We should be able to setup routing tables for the private subnets that point to that install's VPN node. Since routing is enforced at the network layer (routing tables on actual EC2 nodes are bogus), that would enforce traffic separation. In theory, installs could saturate the IGW or NAT gateway, but I'm pretty sure those resources should autoscale since they're managed by EC2. A NAT gateway with no traffic going through it is $32/month, and as far as I can tell, an Internet Gateway is free, but eventually we'd be saving some money.

Jun 29 2017, 7:55 PM · Ops, Phacility
amckinley added a comment to T12847: A Pathway Towards Private Clusters.

Another AWS gotcha learned the hard way: nodes in a "private" subnet can still be part of an "internet-facing" ELB, but the trick is that you have to attach the ELB to the "public" subnet that contains the IGW and the NAT Gateway, and the nodes in the "private" subnet need a route to the NAT Gateway.

Jun 29 2017, 6:11 PM · Ops, Phacility

Jun 28 2017

amckinley added a comment to T12847: A Pathway Towards Private Clusters.

Note that we can not send SSH traffic over an ALB -- they only speak HTTP/HTTPS. However, we can send it over v1.5 VPC ELB, apparently (v1 classic ELBs do not let you listen on 22).

Jun 28 2017, 11:45 PM · Ops, Phacility
amckinley created E1560: Jury Duty.
Jun 28 2017, 10:17 PM · Phacility
amckinley added a comment to T12879: Put crontabs in VCS and deploy them during provisioning.

On secure, I could only find a crontab for the ubuntu user, which is as follows:

0 6 * * * /core/bin/host backup
0 7 * * * /core/bin/host prune --force
0 8 * * * /core/conf/util/generate-documentation
0 9 * * * /core/conf/util/generate-symbols
Jun 28 2017, 9:40 PM · Phacility, Ops
amckinley added a comment to T12856: Evaluate various "infrastructure-as-code" products.

Having played with a few more of these tools and thinking about the problem, I'm starting to lean towards "use Terraform or CloudFormation for orchestrating AWS infrastructure, and leave the provisioning stuff in rCORE as-is". The basic flow for provisioning private instances could be:

Jun 28 2017, 9:37 PM · Ops, Phacility
amckinley created T12879: Put crontabs in VCS and deploy them during provisioning.
Jun 28 2017, 6:54 PM · Phacility, Ops

Jun 27 2017

amckinley committed rP8e5afb56af20: Drop interactive login from sshd example (authored by amckinley).
Drop interactive login from sshd example
Jun 27 2017, 7:51 PM
amckinley closed D18167: Drop interactive login from sshd example by committing rP8e5afb56af20: Drop interactive login from sshd example.
Jun 27 2017, 7:51 PM
amckinley created T12876: Improve initial arc land experience when destination repo is empty.
Jun 27 2017, 7:48 PM · Arcanist, Feature Request
amckinley created D18167: Drop interactive login from sshd example.
Jun 27 2017, 7:10 PM
amckinley accepted D18166: Convert cluster/projects config options to newer modular structure.
Jun 27 2017, 5:29 PM
amckinley accepted D18165: Convert Maniphest custom config to new config types.
Jun 27 2017, 3:26 PM
amckinley accepted D18164: Move "wild" config types to new code.
Jun 27 2017, 2:51 PM

Jun 26 2017

amckinley added inline comments to D18160: Move 'set' config option type to new structure.
Jun 26 2017, 9:37 PM
amckinley accepted D18160: Move 'set' config option type to new structure.

Just remove unused variable and this is g2g.

Jun 26 2017, 6:43 PM
amckinley accepted D18159: Convert 'class' config options to new validation.
Jun 26 2017, 6:14 PM
amckinley accepted D18158: Convert "bool" config values to new modular system.
Jun 26 2017, 5:10 PM
amckinley accepted D18157: Convert the "list<string>" and "list<regex>" Config option types.
Jun 26 2017, 5:05 PM
amckinley accepted D18156: Convert "enum" and "string" config options to new modular option types.
Jun 26 2017, 5:02 PM
amckinley accepted D18155: Begin modularizing config options in a more modern way.

Edited config.json to have an invalid value, verified that the error was detected and config was repaired.

Jun 26 2017, 4:33 PM

Jun 23 2017

amckinley accepted D18152: Degrade more gracefully when ProfileMenu dashboards fail to render.

Wording nit.

Jun 23 2017, 7:17 PM
amckinley accepted D18132: Let PhabricatorSearchCheckboxesField survive saved query data with mismatched types.
Jun 23 2017, 7:13 PM

Jun 21 2017

amckinley added a comment to T12857: Temporary directory fullness can cause daemon issues?.

I think we should have our crontabs in version control regardless of whether or not we add tmpreaper to them, so I'll make a task for that.

Jun 21 2017, 7:26 PM · Diffusion, Ops, Daemons, Phacility
amckinley added a comment to T12857: Temporary directory fullness can cause daemon issues?.

This should do the trick. It runs off atime by default. We could just set the time period to several days if we wanted to. Alternatively, if the filenames for extremely long-running jobs are predictable, there's a --protect '<shell_pattern>' argument we could use to avoid cleaning up those files.

Jun 21 2017, 6:04 PM · Diffusion, Ops, Daemons, Phacility
amckinley added a comment to T12857: Temporary directory fullness can cause daemon issues?.

Should we just add a crontab entry to clean /tmp to paper this over until we get it fixed for real?

Jun 21 2017, 5:27 PM · Diffusion, Ops, Daemons, Phacility

Jun 20 2017

amckinley accepted D18142: Let "<select />" EditEngine fields canonicalize saved defaults.
Jun 20 2017, 11:47 PM

Jun 19 2017

amckinley updated the task description for T12856: Evaluate various "infrastructure-as-code" products.
Jun 19 2017, 7:55 PM · Ops, Phacility
amckinley added a comment to T12856: Evaluate various "infrastructure-as-code" products.

Just out of curiosity, why Salt is not a candidate? I think it is very comparable to the others.

Jun 19 2017, 7:55 PM · Ops, Phacility
amckinley accepted D18135: Improve validation errors for changing task priorities.
Jun 19 2017, 7:44 PM
amckinley accepted D18134: Allow numeric constants to act as aliases for task priorities in the web UI <select />.
Jun 19 2017, 7:34 PM
amckinley added a comment to T12856: Evaluate various "infrastructure-as-code" products.

Terraform Review
Pros:

  • Simple declarative syntax
  • Intended to be invoked first with a dry run, which shows in detail all the resources that will be rebuilt or altered
  • Very focused on "immutable" infrastructure. Instead of manipulating resources in place, prefers to spin up new resources with the new config and spin down the old ones. For the stateful hosts, we might need to be clever about mounting and unmounting EBS volumes when instances get rebuilt.
Jun 19 2017, 7:27 PM · Ops, Phacility
amckinley created T12856: Evaluate various "infrastructure-as-code" products.
Jun 19 2017, 7:14 PM · Ops, Phacility
amckinley added a comment to T12124: Counterintuitive priority setting via maniphest.edit conduit call.

Ahhh, that makes more sense. I parsed that as saying "we could migrate the code" instead of the saved objects. If you want to add the glue code now and then assign it back to me after, I'll work on the migration in my Copious Free Time™.

Jun 19 2017, 6:22 PM · Maniphest, Conduit, Bug Report
amckinley added a comment to T12124: Counterintuitive priority setting via maniphest.edit conduit call.

If you're happy long-term with accepting all three types, I'm ok with that as a permanent fix. I'm planning on working through this week, just with weird/inconsistent hours.

Jun 19 2017, 6:10 PM · Maniphest, Conduit, Bug Report
amckinley added a comment to T12124: Counterintuitive priority setting via maniphest.edit conduit call.

Are saved queries/EditEngine forms things we could write migrations for? That might be even more brittle (for example, what to do about saved queries for priorities that no longer exist on the install), but accepting three different priority specifications of two different types seems a little too Postel's Principle for me.

Jun 19 2017, 5:56 PM · Maniphest, Conduit, Bug Report

Jun 16 2017

amckinley added a comment to T12847: A Pathway Towards Private Clusters.

what network level safeguard

Jun 16 2017, 6:14 PM · Ops, Phacility
amckinley added a comment to T12847: A Pathway Towards Private Clusters.

To back up a bit, how much human touching are you envisioning for bring up a new private cluster? In my head I was thinking we'd at least have to manually deal with requesting the cert, but it looks like There's An API For That, so I think it's at least possible to make the whole thing work magically. It would be something like filling out the form triggers the creation of a new terraform (or whatever) config which then gets deployed.

Jun 16 2017, 4:56 PM · Ops, Phacility
amckinley added a comment to T12847: A Pathway Towards Private Clusters.

That we have VPC <-> VPC VPNs instead of subnet <-> subnet VPNs?

Jun 16 2017, 4:12 PM · Ops, Phacility
chad awarded T12847: A Pathway Towards Private Clusters a Baby Tequila token.
Jun 16 2017, 4:32 AM · Ops, Phacility
amckinley added a comment to T12847: A Pathway Towards Private Clusters.

Okie dokie, I'll start evaluating.

Jun 16 2017, 2:03 AM · Ops, Phacility
amckinley closed T12848: Currently at max EC2 instance launch count as Resolved.

All done.

Jun 16 2017, 1:52 AM · Ops, Phacility
amckinley added a comment to T12848: Currently at max EC2 instance launch count.

AWS support case ID is 2243430271.

Jun 16 2017, 1:48 AM · Ops, Phacility
amckinley created T12848: Currently at max EC2 instance launch count.
Jun 16 2017, 1:46 AM · Ops, Phacility

Jun 15 2017

amckinley added a comment to T12847: A Pathway Towards Private Clusters.

I'm concerned that this may turn a 2-month project into a 6-month project if we write something ourselves, or a 12-month project if we use Ansible, Chef or Puppet.

Jun 15 2017, 11:49 PM · Ops, Phacility
amckinley added a comment to T12847: A Pathway Towards Private Clusters.

I'll also suggest that we should implement some kind of "infrastructure as code" tool before going too far down the road of setting this stuff up. I'd really like to get to the point where no one ever has to push buttons in the AWS console unless something is really broken. I don't have super strong feelings about Ansible vs chef vs puppet, but I'd recommend against CFEngine and SaltStack.

Jun 15 2017, 10:24 PM · Ops, Phacility
amckinley added a comment to T12847: A Pathway Towards Private Clusters.

As far as requesting certs goes, our "Request Certificate" button in the AWS console will work correctly by sending an email to the domain's owner:

Jun 15 2017, 9:35 PM · Ops, Phacility
amckinley edited the content of 2017 Week 23 (Early June).
Jun 15 2017, 5:16 PM

Jun 14 2017

amckinley added a comment to T12124: Counterintuitive priority setting via maniphest.edit conduit call.

@aurelijus, D18111 is landed now in HEAD if you'd like to test this. Let us know if you have any issues.

Jun 14 2017, 10:10 PM · Maniphest, Conduit, Bug Report
amckinley added a comment to T12846: arc unit uses the config of the current installation when running tests.

Philosophically I'm happy with that. Just as an exercise though, I came up with:

Jun 14 2017, 10:08 PM · Bug Report
amckinley committed rP8008ade9af46: Use keywords instead of ints to update task priority in ManiphestEditEngine (authored by amckinley).
Use keywords instead of ints to update task priority in ManiphestEditEngine
Jun 14 2017, 9:43 PM
amckinley closed T12124: Counterintuitive priority setting via maniphest.edit conduit call as Resolved by committing rP8008ade9af46: Use keywords instead of ints to update task priority in ManiphestEditEngine.
Jun 14 2017, 9:43 PM · Maniphest, Conduit, Bug Report
amckinley closed D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine by committing rP8008ade9af46: Use keywords instead of ints to update task priority in ManiphestEditEngine.
Jun 14 2017, 9:43 PM
amckinley updated the diff for D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.
  • correct collation
Jun 14 2017, 9:42 PM
amckinley added a comment to D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.

@epriestley can you sanity check my migration real quick before I land this?

Jun 14 2017, 9:04 PM
amckinley added a comment to T12846: arc unit uses the config of the current installation when running tests.

That makes sense. Probably not much to be done for it (other than potentially doing a config override for maniphest.priorities prior to running ManiphestTaskTestCase).

Jun 14 2017, 8:31 PM · Bug Report
amckinley updated the diff for D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.
  • change valiation, add migration
Jun 14 2017, 8:30 PM
amckinley added a comment to D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.

As compelling as having status == 🐫🐪 would be, I'm also envisioning issues with parsing email/bot commands if we start allowing !, quote characters, etc. I'm going to change the validation to 1-64 alphanumeric characters that aren't purely numeric and leave it at that.

Jun 14 2017, 7:56 PM
amckinley created T12846: arc unit uses the config of the current installation when running tests.
Jun 14 2017, 7:51 PM · Bug Report
amckinley created T12845: Convert remaining ancient configuration type classes to modern modular type classes.
Jun 14 2017, 7:41 PM · Infrastructure, Config
amckinley added a comment to D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.

And just to be clear, you're saying all three of these strings should be following the same validation rules, right?

Jun 14 2017, 7:37 PM
amckinley added a comment to D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.

And I can't come up with any real reason not to let users use "🐐🐐🐫🐪" as their primary stored key in the database, so I think weakening the validation is otherwise fine.

Jun 14 2017, 7:30 PM

Jun 13 2017

amckinley added inline comments to D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.
Jun 13 2017, 9:29 PM
amckinley added a comment to D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.

I couldn't figure out how to test the changes to these classes (but I could probably get email working enough to test email commands and herald actions):

Jun 13 2017, 8:34 PM
amckinley added a comment to D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.

This is every callsite that I could find (by grepping for ManiphestTaskPriorityTransaction::TRANSACTIONTYPE). I heavily tested the situation where a user wants to edit a task that's had its priority removed from the config. Also, see attached screenshot of the new keyword validation code in action:

Jun 13 2017, 8:14 PM
amckinley updated the diff for D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.
  • fixing more callsites
  • removing isNewObject() check in ManiphestTaskPriorityTransaction which was breaking assigning priorities to new tasks
Jun 13 2017, 8:12 PM
amckinley updated the diff for D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.
  • added new constant for representing priority keywords that dont exist any longer
  • added validation for priority keywords thats identical to status constant validation (alphanumeric, 1-12 characters)
  • marked priority keywords as required
Jun 13 2017, 7:44 PM

Jun 12 2017

amckinley awarded T12798: Decommission cluster host `repo012` a Baby Tequila token.
Jun 12 2017, 6:36 PM · Ops, Phacility
amckinley added a comment to D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.

I'll probably go down the road of making the narrowest fix: new magic keyword constant that ManiphestTaskPriority treats as a no-op, with a nice comment explaining the weird behavior.

Jun 12 2017, 6:30 PM
amckinley added a comment to D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.

We need a new UserShotOwnFootDetectorDaemon. How hard could it be to enumerate all possible bad states and test for them?

Jun 12 2017, 6:11 PM
amckinley added a project to E1541: Austin in Vegas: Phacility.
Jun 12 2017, 6:04 PM · Phacility
amckinley created E1541: Austin in Vegas.
Jun 12 2017, 6:03 PM · Phacility
amckinley added a comment to D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.

Yeah, I'm not a fan of the way I changed the "current status no longer exists" code. In general I'm starting to feel like this change is too big and already too magical. Are there other transactions that work similarly to how ManiphestPriorityTransactions would work after this diff? Where constants get turned into different representations of the same constants when they get serialized?

Jun 12 2017, 5:37 PM

Jun 9 2017

amckinley updated the diff for D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.
  • more callsites for priority changing
Jun 9 2017, 11:31 PM
amckinley added a comment to D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.

There are a few other callsites that create task priority transactions that still need to be cleaned up. Hopefully nothing else sets task priorities without going through the transactions layer.

Jun 9 2017, 10:58 PM
amckinley created D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.
Jun 9 2017, 10:52 PM
amckinley added a revision to T12124: Counterintuitive priority setting via maniphest.edit conduit call: D18111: Use keywords instead of ints to update task priority in ManiphestEditEngine.
Jun 9 2017, 10:52 PM · Maniphest, Conduit, Bug Report
amckinley added a comment to T12124: Counterintuitive priority setting via maniphest.edit conduit call.

With only those fields:

{
  "data": [
    {
      "name": "Open",
      "value": "open",
      "special": "default",
      "closed": false
    },
    {
      "name": "Resolved",
      "value": "resolved",
      "special": "closed",
      "closed": true
    },
    {
      "name": "Wontfix",
      "value": "wontfix",
      "closed": true
    },
    {
      "name": "Invalid",
      "value": "invalid",
      "closed": true
    },
    {
      "name": "Duplicate",
      "value": "duplicate",
      "special": "duplicate",
      "closed": true
    },
    {
      "name": "Spite",
      "value": "spite",
      "closed": true
    }
  ]
}
Jun 9 2017, 7:44 PM · Maniphest, Conduit, Bug Report