Allow the Aphlict server to scale across multiple hosts
Closed, ResolvedPublic

Description

Currently there is no way to scale the Aphlict server... it has to be run on a single host.

joshuaspence updated the task description. (Show Details)
joshuaspence raised the priority of this task from to Needs Triage.
joshuaspence added a subscriber: joshuaspence.
epriestley renamed this task from The Aphlict server doesn't scale well to Allow the Aphlict server to scale across multiple hosts.Jan 9 2015, 2:31 PM
epriestley triaged this task as Low priority.

I believe no install will encounter issues with this for a very long time, but we can scale it like this:

  • Let the server connect to another Aphlict server and listen for all notifications, then re-publish them as thought it received them from Phabricator.
  • (Maybe) Add a config option to let clients connect to a random node from a list. All installs large enough to need this might just be, e.g., terminating SSL at an LB or reverse proxy which can handle this for free anyway, though. Assuming we don't have other tech in the stack, we'd randomly tell clients to connect to notification1.example.com, notification2.example.com, etc.
  • Then, to have N servers, launch one master and N sub-servers listening to it. Notifications flow from Phabricator to the master to the sub-servers to the clients.
  • When N is very small, the master could still handle clients too. For larger N, it could be dedicated.

While I haven't run Node in a real production scenario, I assume (based on some slides I saw once that assert Node is "web scale") that the scalability limit of a single server is on the order of a thousand or more simultaneous clients, which is a good bit larger than any install that exists today.

In the short term, I'm planning to improve support for scaling the server down (T7012).

Does that solve the scaling issue though? Essentially, all Aphlict servers need to be aware of (I.e. store in memory) all clients and all subscriptions. Realistically, I wouldn't expect this to cause any issues for any installs today, but possibly as Aphlict receives more usage it might?

As an aside, I still think it would be nice to send the notification contents rather than just the key (to avoid the extra request to /notification/individual/). Possibly this would be easier with WebSockets than it would have been with Flash.

The master server would only know about the subservers.

The subservers would only know about 1/Nth of the clients.

(Likely, the subservers would just listen to everything, not relay subscriptions to the master, so it wouldn't need to manage subscriptions.)

Notifications will still pass through every server, but I believe no install will ever emit so many notifications that it outscales a single server on the push pathway. If we do get there, likely the application or use case or design constraints have changed somehow and there will be some sensible way to partition the traffic.

Would this task be available for paid prioritization?

Yes, but are you sure you need it? What are you seeing which indicates a notification server scaling issue?

Actually, I think you are correct.

So I setup an elastic load balancer in front of our Phabricator install and I plan to setup multiple app servers behind the LB. Because I am sending the websocket traffic over port 443 this means that the notifications will be routed through the load balancer as well, which I figured meant that I need Aphlict to scale across all web hosts. A colleague pointed out to me, however, that the web hosts can just proxy the websocket traffic back to a single notifications host. Based on this, you can probably disregard my request.

Yeah, this is what we do in the Phacility cluster. We have every instance sharing one notification server with software instancing right now and the box is <1% utilized.

Oh, actually, our setup isn't quite like yours. There's one notification LB which sends traffic to one box. Instances just have nlb.phacility.com configured. So no web-tier proxying.

eadler added a project: Restricted Project.Jan 9 2016, 12:49 AM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.
epriestley moved this task from Backlog to vNext on the Aphlict board.Apr 13 2016, 1:47 PM

I'm currently thinking about making this configuration push-based instead of pull-based: each server is told about other servers it should publish to when it starts up. When it receives a notification, it publishes to all those servers (with reasonable mechanisms in place to prevent cycles).

This makes it somewhat harder to change the shape of the notification cluster at runtime (you need to restart masters to add new repeater servers, if any install ever gets to that level) but easier to configure in general and much easier to configure when focused on availability, which I suspect is where ~100% of the focus will be for a very, very long time. Basically, you'd just list every server you run in the configuration:

"cluster": [
  {
    "host": "notify001.west.company.com",
    "port": ...
  },
  {
    "host": "notify002.east.company.com",
    "port": ...
  }
]

The server would figure out which of these hosts is the host itself, and not bother sending itself any notifications. Otherwise, this should just be a clean multi-master setup which is pretty hard to get wrong and easy to understand/configure.

epriestley raised the priority of this task from Low to Normal.Apr 14 2016, 4:13 PM
epriestley closed this task as Resolved.Apr 15 2016, 1:24 PM
epriestley claimed this task.

This is now in production and appears to be working properly.

This is a very low-budget cluster implementation with a focus on ease of configuration and improved availability over scalability, and will only work well up to a small number of servers: if you put 100 of them in production and follow the configuration instructions to create a fully connected mesh, you'll get message retransmissions on the order of O(N^3) and everything will burn itself out.

However, I expect all installs which pursue this today to run something like 2-4 servers (say, one per zone, in 2-4 zones), and the overhead is small at those levels.

If you have extreme availability requirements you can get up to a slightly larger N by adjusting the topology and putting the servers in a ring or ladder or some fancy 4-space array. If you don't have a fully connected mesh then losing multiple nodes can segment the network, leaving clients with two different views of notifications. But none of this actually matters: getting some notifications is still good, and even if the whole thing goes down, users just won't get notifications until they reload the page.

More than anything, this approach makes it really easy to run websocket traffic through the same loadbalancer you're running normal web traffic through and have it hit the same servers if you have 2-4 web nodes, which is common (and the setup we run here). If you're outscaling this and putting 8-10 webservers into production, separate notification servers first and run one per zone (or two if you're insanely paranoid about availability).