Page Menu
Home
Phabricator
Search
Configure Global Search
Log In
Files
F18917084
D15711.id37852.diff
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
22 KB
Referenced Files
None
Subscribers
None
D15711.id37852.diff
View Options
diff --git a/src/applications/aphlict/management/PhabricatorAphlictManagementWorkflow.php b/src/applications/aphlict/management/PhabricatorAphlictManagementWorkflow.php
--- a/src/applications/aphlict/management/PhabricatorAphlictManagementWorkflow.php
+++ b/src/applications/aphlict/management/PhabricatorAphlictManagementWorkflow.php
@@ -76,6 +76,7 @@
array(
'servers' => 'list<wild>',
'logs' => 'optional list<wild>',
+ 'cluster' => 'optional list<wild>',
'pidfile' => 'string',
));
} catch (Exception $ex) {
@@ -193,7 +194,7 @@
'admin'));
}
- $logs = $data['logs'];
+ $logs = idx($data, 'logs', array());
foreach ($logs as $index => $log) {
PhutilTypeSpec::checkMap(
$log,
@@ -219,6 +220,54 @@
}
}
+ $peer_map = array();
+
+ $cluster = idx($data, 'cluster', array());
+ foreach ($cluster as $index => $peer) {
+ PhutilTypeSpec::checkMap(
+ $peer,
+ array(
+ 'host' => 'string',
+ 'port' => 'int',
+ 'protocol' => 'string',
+ ));
+
+ $host = $peer['host'];
+ $port = $peer['port'];
+ $protocol = $peer['protocol'];
+
+ switch ($protocol) {
+ case 'http':
+ case 'https':
+ break;
+ default:
+ throw new PhutilArgumentUsageException(
+ pht(
+ 'Configuration file specifies cluster peer ("%s", at index '.
+ '"%s") with an invalid protocol, "%s". Valid protocols are '.
+ '"%s" or "%s".',
+ $host,
+ $index,
+ $protocol,
+ 'http',
+ 'https'));
+ }
+
+ $peer_key = "{$host}:{$port}";
+ if (!isset($peer_map[$peer_key])) {
+ $peer_map[$peer_key] = $index;
+ } else {
+ throw new PhutilArgumentUsageException(
+ pht(
+ 'Configuration file specifies cluster peer "%s" more than '.
+ 'once (at indexes "%s" and "%s"). Each peer must have a '.
+ 'unique host and port combination.',
+ $peer_key,
+ $peer_map[$peer_key],
+ $index));
+ }
+ }
+
$this->configData = $data;
$this->configPath = $full_path;
diff --git a/src/applications/notification/client/PhabricatorNotificationClient.php b/src/applications/notification/client/PhabricatorNotificationClient.php
--- a/src/applications/notification/client/PhabricatorNotificationClient.php
+++ b/src/applications/notification/client/PhabricatorNotificationClient.php
@@ -19,6 +19,9 @@
public static function tryToPostMessage(array $data) {
$servers = PhabricatorNotificationServerRef::getEnabledAdminServers();
+
+ shuffle($servers);
+
foreach ($servers as $server) {
try {
$server->postMessage($data);
diff --git a/src/docs/user/cluster/cluster.diviner b/src/docs/user/cluster/cluster.diviner
--- a/src/docs/user/cluster/cluster.diviner
+++ b/src/docs/user/cluster/cluster.diviner
@@ -132,6 +132,11 @@
naturally somewhat resistant to data loss: every clone of a repository includes
the entire history.
+Repositories may become a scalability bottleneck, although this is rare unless
+your install has an unusually heavy repository read volume. Slow clones/fetches
+may hint at a repository capacity problem. Adding more repository hosts will
+provide an approximately linear increase in capacity.
+
For details, see @{article:Cluster: Repositories}.
@@ -146,6 +151,13 @@
at least one host remains alive. Daemons are stateless, so spreading daemons
across multiple hosts provides no resistance to data loss.
+Daemons can become a bottleneck, particularly if your install sees a large
+volume of write traffic to repositories. If the daemon task queue has a
+backlog, that hints at a capacity problem. If existing hosts have unused
+resources, increase `phd.taskmasters` until they are fully utilized. From
+there, adding more daemon hosts will provide an approximately linear increase
+in capacity.
+
For details, see @{article:Cluster: Daemons}.
@@ -157,11 +169,37 @@
With multiple web hosts, you can transparently survive the loss of any subset
of hosts as long as at least one host remains alive. Web hosts are stateless,
-so putting multiple hosts in service provides no resistance to data loss.
+so putting multiple hosts in service provides no resistance to data loss
+because no data is at risk.
+
+Web hosts can become a bottleneck, particularly if you have a workload that is
+heavily focused on reads from the web UI (like a public install with many
+anonymous users). Slow responses to web requests may hint at a web capacity
+problem. Adding more hosts will provide an approximately linear increase in
+capacity.
For details, see @{article:Cluster: Web Servers}.
+Cluster: Notifications
+======================
+
+Configuring multiple notification hosts is simple and has no pre-requisites.
+
+With multiple notification hosts, you can survive the loss of any subset of
+hosts as long as at least one host remains alive. Service may be breifly
+disrupted directly after the incident which destroys the other hosts.
+
+Notifications are noncritical, so this normally has little practical impact
+on service availability. Notifications are also stateless, so clustering this
+service provides no resistance to data loss because no data is at risk.
+
+Notification delivery normally requires very few resources, so adding more
+hosts is unlikely to have much impact on scalability.
+
+For details, see @{article:Cluster: Notifications}.
+
+
Overlaying Services
===================
@@ -172,14 +210,14 @@
In planning a cluster, consider these blended host types:
-**Everything**: Run HTTP, SSH, MySQL, repositories and daemons on a single
-host. This is the starting point for single-node setups, and usually also the
-best configuration when adding the second node.
+**Everything**: Run HTTP, SSH, MySQL, notifications, repositories and daemons
+on a single host. This is the starting point for single-node setups, and
+usually also the best configuration when adding the second node.
-**Everything Except Databases**: Run HTTP, SSH, repositories and daemons on one
-host, and MySQL on a different host. MySQL uses many of the same resources that
-other services use. It's also simpler to separate than other services, and
-tends to benefit the most from dedicated hardware.
+**Everything Except Databases**: Run HTTP, SSH, notifications, repositories and
+daemons on one host, and MySQL on a different host. MySQL uses many of the same
+resources that other services use. It's also simpler to separate than other
+services, and tends to benefit the most from dedicated hardware.
**Repositories and Daemons**: Run repositories and daemons on the same host.
Repository hosts //must// run daemons, and it normally makes sense to
@@ -208,8 +246,8 @@
This section provides some guidance on reasonable ways to scale up a cluster.
The smallest possible cluster is **two hosts**. Run everything (web, ssh,
-database, repositories, and daemons) on each host. One host will serve as the
-master; the other will serve as a replica.
+database, notifications, repositories, and daemons) on each host. One host will
+serve as the master; the other will serve as a replica.
Ideally, you should physically separate these hosts to reduce the chance that a
natural disaster or infrastructure disruption could disable or destroy both
@@ -230,7 +268,7 @@
onto its own host).
After separating databases, separating repository + daemon nodes is likely
-the next step.
+the next step to consider.
To improve **availability**, add another copy of everything you run in one
datacenter to a new datacenter. For example, if you have a two-node cluster,
diff --git a/src/docs/user/cluster/cluster_notifications.diviner b/src/docs/user/cluster/cluster_notifications.diviner
new file mode 100644
--- /dev/null
+++ b/src/docs/user/cluster/cluster_notifications.diviner
@@ -0,0 +1,172 @@
+@title Cluster: Notifications
+@group intro
+
+Configuring Phabricator to use multiple notification servers.
+
+Overview
+========
+
+WARNING: This feature is a very early prototype; the features this document
+describes are mostly speculative fantasy.
+
+You can run multiple notification servers. The advantages of doing this
+are:
+
+ - you can completely survive the loss of any subset so long as one
+ remains standing; and
+ - performance and capacity may improve.
+
+This configuration is relatively simple, but has a small impact on availability
+and does nothing to increase resitance to data loss.
+
+
+Clustering Design Goals
+=======================
+
+Notification clustering aims to restore service automatically after the loss
+of some nodes. It does **not** attempt to guarantee that every message is
+delivered.
+
+Notification messages provide timely information about events, but they are
+never authoritative and never the only way for users to learn about events.
+For example, if a notification about a task update is not delivered, the next
+page you load will still show the notification in your notification menu.
+
+Generally, Phabricator works fine without notifications configured at all, so
+clustering assumes that losing some messages during a disruption is acceptable.
+
+
+How Clustering Works
+====================
+
+Notification clustering is very simple: notification servers relay every
+message they receive to a list of peers.
+
+When you configure clustering, you'll run multiple servers and tell them that
+the other servers exist. When any server receives a message, it retransmits it
+to all the severs it knows about.
+
+When a server is lost, clients will automatically reconnect after a brief
+delay. They may lose some notifications while their client is reconnecting,
+but normally this should only last for a few seconds.
+
+
+Configuring Aphlict
+===================
+
+To configure clustering on the server side, add a `cluster` key to your
+Aphlict configuration file. For more details about configuring Aphlict,
+see @{article:Notifications User Guide: Setup and Configuration}.
+
+The `cluster` key should contain a list of `"admin"` server locations. Every
+message the server receives will be retransmitted to all nodes in the list.
+
+The server is smart enough to avoid sending messages in a cycle, or sending
+messages to itself, so you can safely list every server you run (including the
+server itself) in the `cluster` list of every other server. You do not need to
+configure servers in an acyclic graph or only list //other// servers: just
+list everything.
+
+A simple example with two servers might look like this:
+
+```lang=json, name="aphlict.json (Cluster)"
+{
+ ...
+ "cluster": [
+ {
+ "host": "notify001.mycompany.com",
+ "port": 22281,
+ "protocol": "http"
+ },
+ {
+ "host": "notify002.mycompany.com",
+ "port": 22281,
+ "protocol": "http"
+ }
+ ]
+ ...
+}
+```
+
+
+Configuring Phabricator
+=======================
+
+To configure clustering on the client side, add every service you run to
+`notification.servers`. Generally, this will be twice as many entries as
+you run actual servers, since each server runs a `"client"` service and an
+`"admin"` service.
+
+A simple example with the two servers above (providing four total services)
+might look like this:
+
+```lang=json, name="notification.servers (Cluster)"
+[
+ {
+ "type": "client",
+ "host": "notify001.mycompany.com",
+ "port": 22280,
+ "protocol": "https"
+ },
+ {
+ "type": "client",
+ "host": "notify002.mycompany.com",
+ "port": 22280,
+ "protocol": "https"
+ },
+ {
+ "type": "admin",
+ "host": "notify001.mycompany.com",
+ "port": 22281,
+ "protocol": "http"
+ },
+ {
+ "type": "admin",
+ "host": "notify001.mycompany.com",
+ "port": 22282,
+ "protocol": "http"
+ }
+]
+```
+
+If you put all of the `"client"` servers behind a load balancer, you would
+just list the load balancer and let it handle pulling nodes in and out of
+service.
+
+```lang=json, name="notification.servers (Cluster + Load Balancer)"
+[
+ {
+ "type": "client",
+ "host": "notify-lb.mycompany.com",
+ "port": 22280,
+ "protocol": "https"
+ },
+ {
+ "type": "admin",
+ "host": "notify001.mycompany.com",
+ "port": 22281,
+ "protocol": "http"
+ },
+ {
+ "type": "admin",
+ "host": "notify001.mycompany.com",
+ "port": 22282,
+ "protocol": "http"
+ }
+]
+```
+
+Notification hosts do not need to run any additional services, although they
+are free to do so. The notification server generally consumes few resources
+and is resistant to most other loads on the machine, so it's reasonable to
+overlay these on top of other services wherever it is convenient.
+
+
+Next Steps
+==========
+
+Continue by:
+
+ - reviewing notification configuration with
+ @{article:Notifications User Guide: Setup and Configuration}; or
+ - returning to @{article:Clustering Introduction}.
diff --git a/src/docs/user/configuration/notifications.diviner b/src/docs/user/configuration/notifications.diviner
--- a/src/docs/user/configuration/notifications.diviner
+++ b/src/docs/user/configuration/notifications.diviner
@@ -77,6 +77,8 @@
- `servers`: //Required list.// A list of servers to start.
- `logs`: //Optional list.// A list of logs to write to.
+ - `cluster`: //Optional list.// A list of cluster peers. This is an advanced
+ feature.
- `pidfile`: //Required string.// Path to a PID file.
Each server in the `servers` list should be an object with these keys:
@@ -99,10 +101,20 @@
- `path`: //Required string.// Path to the log file.
+Each peer in the `cluster` list should be an object with these keys:
+
+ - `host`: //Required string.// The peer host address.
+ - `port`: //Required int.// The peer port.
+ - `protocol`: //Required string.// The protocol to connect with, one of
+ `"http"` or `"https"`.
+
+Cluster configuration is an advanced topic and can be omitted for most
+installs. For more information on how to configure a cluster, see
+@{article:Clustering Introduction} and @{article:Cluster: Notifications}.
+
The defaults are appropriate for simple cases, but you may need to adjust them
if you are running a more complex configuration.
-
Configuring Phabricator
=======================
diff --git a/src/view/page/PhabricatorStandardPageView.php b/src/view/page/PhabricatorStandardPageView.php
--- a/src/view/page/PhabricatorStandardPageView.php
+++ b/src/view/page/PhabricatorStandardPageView.php
@@ -539,8 +539,9 @@
if ($servers) {
if ($user && $user->isLoggedIn()) {
- // TODO: We could be smarter about selecting a server if there are
- // multiple options available.
+ // TODO: We could tell the browser about all the servers and let it
+ // do random reconnects to improve reliability.
+ shuffle($servers);
$server = head($servers);
$client_uri = $server->getWebsocketURI();
diff --git a/support/aphlict/server/aphlict_server.js b/support/aphlict/server/aphlict_server.js
--- a/support/aphlict/server/aphlict_server.js
+++ b/support/aphlict/server/aphlict_server.js
@@ -81,7 +81,8 @@
require('./lib/AphlictAdminServer');
require('./lib/AphlictClientServer');
-
+require('./lib/AphlictPeerList');
+require('./lib/AphlictPeer');
var ii;
@@ -173,7 +174,26 @@
}
}
+var peer_list = new JX.AphlictPeerList();
+
+debug.log(
+ 'This server has fingerprint "%s".',
+ peer_list.getFingerprint());
+
+var cluster = config.cluster || [];
+for (ii = 0; ii < cluster.length; ii++) {
+ var peer = cluster[ii];
+
+ var peer_client = new JX.AphlictPeer()
+ .setHost(peer.host)
+ .setPort(peer.port)
+ .setProtocol(peer.protocol);
+
+ peer_list.addPeer(peer_client);
+}
+
for (ii = 0; ii < aphlict_admins.length; ii++) {
var admin_server = aphlict_admins[ii];
admin_server.setClientServers(aphlict_clients);
+ admin_server.setPeerList(peer_list);
}
diff --git a/support/aphlict/server/lib/AphlictAdminServer.js b/support/aphlict/server/lib/AphlictAdminServer.js
--- a/support/aphlict/server/lib/AphlictAdminServer.js
+++ b/support/aphlict/server/lib/AphlictAdminServer.js
@@ -22,6 +22,7 @@
properties: {
clientServers: null,
logger: null,
+ peerList: null
},
members: {
@@ -79,8 +80,7 @@
++self._messagesIn;
try {
- self._transmit(instance, msg);
- response.writeHead(200, {'Content-Type': 'text/plain'});
+ self._transmit(instance, msg, response);
} catch (err) {
self.log(
'<%s> Internal Server Error! %s',
@@ -139,14 +139,32 @@
/**
* Transmits a message to all subscribed listeners.
*/
- _transmit: function(instance, message) {
- var lists = this.getListenerLists(instance);
+ _transmit: function(instance, message, response) {
+ var peer_list = this.getPeerList();
- for (var ii = 0; ii < lists.length; ii++) {
- var list = lists[ii];
- var listeners = list.getListeners();
- this._transmitToListeners(list, listeners, message);
+ message = peer_list.addFingerprint(message);
+ if (message) {
+ var lists = this.getListenerLists(instance);
+
+ for (var ii = 0; ii < lists.length; ii++) {
+ var list = lists[ii];
+ var listeners = list.getListeners();
+ this._transmitToListeners(list, listeners, message);
+ }
+
+ peer_list.broadcastMessage(instance, message);
}
+
+ // Respond to the caller with our fingerprint so it can stop sending
+ // us traffic we don't need to know about if it's a peer. In particular,
+ // this stops us from broadcasting messages to ourselves if we appear
+ // in the cluster list.
+ var receipt = {
+ fingerprint: this.getPeerList().getFingerprint()
+ };
+
+ response.writeHead(200, {'Content-Type': 'application/json'});
+ response.write(JSON.stringify(receipt));
},
_transmitToListeners: function(list, listeners, message) {
diff --git a/support/aphlict/server/lib/AphlictPeer.js b/support/aphlict/server/lib/AphlictPeer.js
new file mode 100644
--- /dev/null
+++ b/support/aphlict/server/lib/AphlictPeer.js
@@ -0,0 +1,80 @@
+'use strict';
+
+var JX = require('./javelin').JX;
+
+var http = require('http');
+var https = require('https');
+
+JX.install('AphlictPeer', {
+
+ construct: function() {
+ },
+
+ properties: {
+ host: null,
+ port: null,
+ protocol: null,
+ fingerprint: null
+ },
+
+ members: {
+ broadcastMessage: function(instance, message) {
+ var data;
+ try {
+ data = JSON.stringify(message);
+ } catch (error) {
+ return;
+ }
+
+ // TODO: Maybe use "agent" stuff to pool connections?
+
+ var options = {
+ hostname: this.getHost(),
+ port: this.getPort(),
+ method: 'POST',
+ path: '/?instance=' + instance,
+ headers: {
+ 'Content-Type': 'application/json',
+ 'Content-Length': data.length
+ }
+ };
+
+ var onresponse = JX.bind(this, this._onresponse);
+
+ var request;
+ if (this.getProtocol() == 'https') {
+ request = https.request(options, onresponse);
+ } else {
+ request = http.request(options, onresponse);
+ }
+
+ request.write(data);
+ request.end();
+ },
+
+ _onresponse: function(response) {
+ var peer = this;
+ var data = '';
+
+ response.on('data', function(bytes) {
+ data += bytes;
+ });
+
+ response.on('end', function() {
+ var message;
+ try {
+ message = JSON.parse(data);
+ } catch (error) {
+ return;
+ }
+
+ // If we got a valid receipt, update the fingerprint for this server.
+ var fingerprint = message.fingerprint;
+ if (fingerprint) {
+ peer.setFingerprint(fingerprint);
+ }
+ });
+ }
+ }
+
+});
diff --git a/support/aphlict/server/lib/AphlictPeerList.js b/support/aphlict/server/lib/AphlictPeerList.js
new file mode 100644
--- /dev/null
+++ b/support/aphlict/server/lib/AphlictPeerList.js
@@ -0,0 +1,86 @@
+'use strict';
+
+var JX = require('./javelin').JX;
+
+JX.install('AphlictPeerList', {
+
+ construct: function() {
+ this._peers = [];
+
+ // Generate a new unique identify for this server. We just use this to
+ // identify messages we have already seen and figure out which peer is
+ // actually us, so we don't bounce messages around the cluster forever.
+ this._fingerprint = this._generateFingerprint();
+ },
+
+ properties: {
+ },
+
+ members: {
+ _peers: null,
+ _fingerprint: null,
+
+ addPeer: function(peer) {
+ this._peers.push(peer);
+ return this;
+ },
+
+ addFingerprint: function(message) {
+ var fingerprint = this.getFingerprint();
+
+ // Check if we've already touched this message. If we have, we do not
+ // broadcast it again. If we haven't, we add our fingerprint and then
+ // broadcast the modified version.
+ var touched = message.touched || [];
+ for (var ii = 0; ii < touched.length; ii++) {
+ if (touched[ii] == fingerprint) {
+ return null;
+ }
+ }
+ touched.push(fingerprint);
+
+ message.touched = touched;
+ return message;
+ },
+
+ broadcastMessage: function(instance, message) {
+ var ii;
+
+ var touches = {};
+ var touched = message.touched;
+ for (ii = 0; ii < touched.length; ii++) {
+ touches[touched[ii]] = true;
+ }
+
+ var peers = this._peers;
+ for (ii = 0; ii < peers.length; ii++) {
+ var peer = peers[ii];
+
+ // If we know the peer's fingerprint and it has already touched
+ // this message, don't broadcast it.
+ var fingerprint = peer.getFingerprint();
+ if (fingerprint && touches[fingerprint]) {
+ continue;
+ }
+
+ peer.broadcastMessage(instance, message);
+ }
+ },
+
+ getFingerprint: function() {
+ return this._fingerprint;
+ },
+
+ _generateFingerprint: function() {
+ var src = '23456789abcdefghjkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ';
+ var len = 16;
+ var out = [];
+ for (var ii = 0; ii < len; ii++) {
+ var idx = Math.floor(Math.random() * src.length);
+ out.push(src[idx]);
+ }
+ return out.join('');
+ }
+ }
+
+});
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Nov 10 2025, 9:24 AM (6 w, 4 d ago)
Storage Engine
blob
Storage Format
Encrypted (AES-256-CBC)
Storage Handle
8655456
Default Alt Text
D15711.id37852.diff (22 KB)
Attached To
Mode
D15711: Support Aphlict clustering
Attached
Detach File
Event Timeline
Log In to Comment