Page MenuHomePhabricator

Build "Almanac", a service/host/device directory
Closed, ResolvedPublic

Description

See https://secure.phabricator.com/chatlog/channel/6/?at=156708 for naming discussion, if anyone objects or comes up with something better before this lands. Leading alternatives were "Startchart", "Proclaim" and "Fleet"/"Armada".

Almanac is a service/host/device directory, similar to SMC at Facebook or the DNS system. Major objects are something like:

  • Hosts/Devices -- a physical (or maybe virtual) device which may have services on it, like box001.datacenter.company.com.
  • Services -- a named service, like www.company.com.

Rough, a service is defined as a group of devices. For example, the www.company.com service might be the HTTP load balancers lb001.company.com, lb002.company.com, and lb003.company.com.

Clients can ask it DNS-like questions like:

  • How do I connect to www.company.com?
  • How do I connect to mysql.phabricator.company.com, for writing?

Administrators can use the web UI to change the answers to these questions (e.g., bring servers up / down, move hosts in/out of service pools, change host attributes to control masters/slaves).

In the short term, we want to be able to deploy Phabricator across multiple machines easily. Some installs are nearing the effective one-machine scalability limit when using hosted repositories, and we want to define a large service cluster for SAAS. Putting this information inside Phabricator will let us do this in a reasonable way, and solve tangential issues like allowing machines to identify one another for authentication.

In the mid term, we want to deploy SAAS, and this tooling can make it easier to manage hosts, let us build Phage, etc. Drydock and Harbormaster can also benefit from centralized host management, the ability to define pools, etc.

In the long term, this could be a general purpose-service.

A tangential but reasonable feature is to support host monitoring/detection (like Reticle) so we can show if a service is up or down and report basics like deployed software versions, at a minimum. Eventually this could be more full-featured.

Event Timeline

epriestley claimed this task.
epriestley raised the priority of this task from to Normal.
epriestley updated the task description. (Show Details)
epriestley added a project: Almanac.
epriestley added subscribers: epriestley, chad, btrahan, hach-que.

I already ordered "Almanac 2016 Launch" T-Shirts, so we're stuck on that name.

Phacility Grand Strategic Plan:

  1. Build new beta applications and increase scope.
  2. Never finish anything or make any money.
  3. ???
  4. Bankruptcy.
In T5833#5, @chad wrote:

I already ordered "Almanac 2016 Launch" T-Shirts, so we're stuck on that name.

Are you serious?

If not, I would recommend to use the archaic English (and French still current) spelling almanach with an h, so we can get the best alternative to -ph- name adding this h.

Some discussion in IRC starting here: https://secure.phabricator.com/chatlog/channel/6/?at=159258

Rough outline is:

  • Device: A physical (or virtual or whatever, but approximately physical) device. Usually a server, but maybe a router or load balancer. (And maybe an employee cell phone or laptop if you just want a reasonable way to keep track of them.)
  • Network: Basically a namespace for addresses. Most stuff will probably be on "the public internet", but if you have VPNs/VPCs this can let Almanac be smart about two devices with IP address "10.0.0.3" not being a conflict if they're in the NYC and SF corpnets. Especially initially, the UI should smooth this over as much as possible.
  • Interface: Describes a way to connect to a device, usually a <network, address> pair. Devices have interfaces. Most devices will have one interface, but, may have more than one if they have public / private IPs.
  • Service: Some high-level business service like "Phabricator MySQL".
  • Binding: Binds services to interfaces, letting you say "these are the places to connect to if you need to talk to Phabricator MySQL: A (read/write), B (readonly), C (readonly), D (disabled/maintenance)"

The specific functionality I want in v0 is support for these SAAS use cases:

Instance Management: When a web machine hosting multiple colocated Phabricator instances receives a request to something.site.com, it should be able to identify which instance (if any) it should execute by querying Almanac. In particular, if a user requests xyz.site.com, the host should be able to query Almanac to answer these questions:

  • Does xyz.site.com exist, or should we 404?
  • Where are the bootstrapping services (notably, the database) for the xyz instance?
  • What additional settings should be configured for this install before executing the main codepath (e.g., readonly mode, service notices, other forced config settings).

Roughly, this means that each instance will have an AlmanacService entry.

Repository Location: When an instanced Phabricator or clustered Phabricator needs to make a Conduit service call, it should be able to use Almanac to figure out which host it should connect to. Specifically, something like:

  • The PhabricatorRepository lists a servicePHID.
  • The servicePHID points to an AlmanacService which describes which hosts a repository is located on.
  • (We probably need to push this down to installs rather than having them talk to the mothership, which might open up some sync questions.)

Device Identity and Trust: This is partially implemented, but Almanac should provide a framework for managing device keys and establishing device trust in clusters and instanced environments. For example:

  • When a daemon makes a Conduit request for repository information, it should be able to sign the request with the calling device key and authenticate to the remote host.

We're planning to put this into production shortly. There's some followup work in T6741 but the remaining part is nonblocking.