Page MenuHomePhabricator

Support Aphlict clustering
ClosedPublic

Authored by epriestley on Apr 14 2016, 4:10 PM.
Tags
None
Referenced Files
F11033220: D15711.id37854.diff
Sun, Aug 14, 1:21 PM
F11032321: D15711.id.diff
Sun, Aug 14, 7:49 AM
Unknown Object (File)
Wed, Aug 3, 7:52 AM
Unknown Object (File)
Tue, Aug 2, 12:29 PM
Unknown Object (File)
Mon, Aug 1, 11:34 PM
Unknown Object (File)
Fri, Jul 29, 4:55 AM
Unknown Object (File)
Thu, Jul 28, 10:47 PM
Unknown Object (File)
Mon, Jul 25, 9:06 AM
Subscribers
None

Details

Summary

Ref T6915. This allows multiple notification servers to talk to each other:

  • Every server has a list of every other server, including itself.
  • Every server generates a unique fingerprint at startup, like "XjeHuPKPBKHUmXkB".
  • Every time a server gets a message, it marks it with its personal fingerprint, then sends it to every other server.
  • Servers do not retransmit messages that they've already seen (already marked with their fingerprint).
  • Servers learn other servers' fingerprints after they send them a message, and stop sending them messages they've already seen.

This is pretty crude, and the first message to a cluster will transmit N^2 times, but N is going to be like 3 or 4 in even the most extreme cases for a very long time.

The fingerprinting stops cycles, and stops servers from sending themselves copies of messages.

We don't need to do anything more sophisticated than this because it's fine if some notifications get lost when a server dies. Clients will reconnect after a short period of time and life will continue.

Test Plan
  • Wrote two server configs.
  • Started two servers.
  • Told Phabricator about all four services.
  • Loaded Chrome and Safari.
  • Saw them connect to different servers.
  • Sent messages in one, got notifications in the other (magic!).
  • Saw the fingerprinting stuff work on the console, no infinite retransmission of messages, etc.

(This pretty much just worked when I ran it the first time so I probably missed something?)

Screen Shot 2016-04-14 at 9.09.55 AM.png (705×1 px, 120 KB)

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

epriestley retitled this revision from to Support Aphlict clustering.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: chad.

For N > 2 we'll get some additional retransmission, where node 1 sends to nodes 2 + 3 and nodes 2/3 send to each other. The volume/cost of these message is so low that I think that's fine for now, and at N=4 (which is huge) we only get 12 transmissions, or 3 extra trasmissions per server.

A simple improvement would be to just mark the message with all the nodes the origin is transmitting to and assume success.

To get a slightly bigger N, you can use this configuration option as-is to arrange nodes in a tree or a ring or whatever else instead of a big crosslinked mess.

At some value of N where this is handling way more data than any current install ever has we'll have to make it smarter, but I think this has vast headroom over what any install that exists today actually needs.

chad edited edge metadata.
This revision is now accepted and ready to land.Apr 14 2016, 8:21 PM
This revision was automatically updated to reflect the committed changes.