==Overview and Structure==
We support "private" clusters, which are isolated installations of Phabricator that are unique to a specific customer. A "cluster" is a collection of resources in AWS, setup in a way that more-or-less mimics the SaaS environment. Customers are isolated from each other by creating unique [[ https://aws.amazon.com/vpc/ | VPCs ]] for each cluster.
==Features==
* Dedicated hardware
* Possibility of custom code (not currently implemented)
* Multi-region/multi-AZ installs for performance/availability
==Requirements==
Customers need a minimum install size of two AWS hosts in two different AZs within the same region. Scaling to more than two AZs doesn't make much sense compared to scaling to additional regions, so our infrastructure will always live in `some-region-1a` and `some-region-1b`
==Admin VPCs==
In each supported AWS region, we have an "admin" VPC that contains a bastion host and has routes into each private cluster VPC via [[ http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-peering.html | AWS VPC peering ]]. This exists mostly to allow Phacility personnel access to private cluster resources, without requiring a dedicated bastion host in each cluster.
==Inter-VPC Connectivity==
AWS does not make this as easy as we would like. For VPCs that live within the same region, AWS provides "VPC peering" for establishing connectivity, but there is a hard limit of 125 peering connections per VPC. Some day we'll run into this while peering connections between admin and customer VPCs, but today is not that day. For VPCs that live in separate regions, we roll our own connectivity by creating VPN instances in each VPC and establishing routing between them.
==Subnetting==
Because we already use `172.30.0.0/16` for SaaS customers, all private cluster infrastructure lives in `10.0.0.0/8`. In general, each VPC is assigned a distinct `/24`. `/24` was chosen because 256 hosts should be enough for any customer, and because it's a nice round number. Each `/24` is further subdivided into public and private subnets, with most of the space reserved as spare, as a favor to our future selves who might some day need the space and prefer not to re-IP the entire universe.
This layout is somewhat inefficient, because the amount of gear that needs to live in the public subnet is fixed and small. However, I'd rather have equal-sized subnets than try to pack IPs as efficiently as possible.
===Customer Subnets===
Each customer gets a `/24` per region they want hardware in, with two public and two private subnets. The smallest subnet that can be created within a VPC is a `/28` (16 hosts), but that seems a little tight, so we use `/27`s with the following layout, given an example customer that receives the subnet `10.1.2.0/24` in `us-west-1`:
| Availability Zone | Name | Subnet
| `us-west-1a` | `public-1` | `10.1.2.0/27`
| `us-west-1b` | `public-2` | `10.1.2.32/27`
| `us-west-1a` | `private-1` | `10.1.2.64/27`
| `us-west-1b` | `private-2` | `10.1.2.96/27`
| N/A | Unused | `10.1.2.128/25`
Visually: {F5169361}
Note that we don't actually create the "spare" subnet, because why bother?
===Admin VPCs===
Admin VPCs only have public subnets, and therefore have more unused space, but are otherwise identical. For improved bastion availability, we create one bastion per AZ. Admin VPC subnets are hard-coded and laid out as follows:
| Availability Zone | Subnet
| `us-west-1a` | `10.0.0.0/27`
| `us-west-1b` | `10.0.0.32/27`
| `us-east-1a` | `10.0.1.0/27`
| `us-east-1b` | `10.0.1.32/27`
==DNS==
==Load Balancing==
For each customer region, we create two load balancers: a "classic" Elastic LoadBalancer for terminating SSH via TCP on port 22, and an "application" LoadBalancer for terminating HTTPS on port 443 for web, HTTP on port for redirecting to HTTPS, and notifications on port 22280. We use AWS IAM certificates for terminating HTTPS at the edge, so we don't have to store our actual TLS certificates on the hosts.
==Instances==
A customer must have at least two instances in two distinct AZs within a single region. These are `m4.large` instances, running Ubuntu 16.04. Each instance runs the entire suite of Phabricator services, including MySQL, Phabricator, and Aphlict.
==Storage==
S3 + ELBs