Page MenuHomePhabricator

Rewrite Aphlict to use Websockets
ClosedPublic

Authored by epriestley on Jan 2 2015, 7:59 AM.

Details

Summary

Fixes T6559. No more flash, use Websockets. This is less aggressive than the earlier version, and retains more server logic.

  • Support "wss".
  • Make the client work.
  • Remove "notification.user" entirely.
  • Seems ok?
Test Plan

In Safari, Firefox and Chrome, saw the browsers connect. Made a bunch of comments/updates and saw notifications.

Notable holes in the test plan:

  • Haven't tested "wss" yet. I'll do this on secure.
  • Notifications are too fast now, locally. I get them after I hit submit but before the page reloads.
  • There are probably some other rough edges, this is a fairly big patch.

Diff Detail

Repository
rP Phabricator
Branch
websockets2
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 3692
Build 3702: [Placeholder Plan] Wait for 30 Seconds

Event Timeline

joshuaspence retitled this revision from to [DRAFT] Rewrite Aphlict to use Websockets.
joshuaspence updated this object.
joshuaspence edited the test plan for this revision. (Show Details)

At least tentatively, I'd like to keep Flash around, and support WebSockets in addition to Flash. The underlying Flash protocol (i.e., really basic JSON with no weird websocket handshaking) is better for non-browser clients, and Flash has some advantages and desirable characteristics when multiple browser windows are open simultaneously.

I'm also much more interested in upstreaming this if the websocket implementation is <1K lines (and, ideally, like 300) than if it's, say, socket.io.

Provided the websocket implementation isn't an enormously gigantic dependency I'm open to upstreaming support, though.

My thinking along those lines were as follows:

  1. I wanted to keep the message format unaltered. This should make it easier to change between Flash and WebSockets.
  2. Although I am currently writing this in NodeJS, I want to rewrite it using PHP. phpwebsocket is one such implementation.

phpwebsocket is one such implementation.

This uses the old, more insane version of the protocol with Key1 and Key2 headers, and doesn't seem to work in modern browsers.

I was able to get this working pretty easily, though:

https://github.com/ghedipunk/PHP-Websockets

Oh, and per T6559 I'm now onboard with WebSockets-only.

epriestley added a reviewer: joshuaspence.

I am a sneaky revision thief.

Yeah, I'm working on this right now but it's hard to pull off any more small pieces.

epriestley edited edge metadata.
  • Less aggressive patch with fewer server-side changes.
  • Support "wss://" (probably).
  • Make the client stuff work (probably).
epriestley retitled this revision from [DRAFT] Rewrite Aphlict to use Websockets to Rewrite Aphlict to use Websockets.Jan 7 2015, 12:06 AM
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
.gitignore
35

Using npm install -g ws didn't work for me. No idea. This enables a reasonable workaround.

resources/celerity/packages.php
72–76

Package update, plus a couple of dashboard resources.

src/applications/aphlict/management/PhabricatorAphlictManagementWorkflow.php
83–84

Oh, unused -- I'll remove.

src/applications/feed/PhabricatorFeedStoryPublisher.php
128–130 ↗(On Diff #27052)

This fixes a bug where you would not receive notifications about T123 if you had it open unless you were also subscribed to it.

src/applications/notification/controller/PhabricatorNotificationIndividualController.php
11–12 ↗(On Diff #27052)

Actually, I think this violates policies and I need to revert it and move it to another diff. I'll get this out of here.

support/aphlict/server/lib/AphlictListener.js
52

WebSockets do framing now.

webroot/rsrc/externals/javelin/lib/Leader.js
91–93 ↗(On Diff #27052)

I needed an "always call after election" method.

webroot/rsrc/externals/javelin/lib/Websocket.js
124–128 ↗(On Diff #27052)

Fixes a bug where we didn't set the delay properly.

  • Remove unused code.
  • Revert subscription/policy stuff for now. This is still a bug, but needs a more nuanced approach.
  • Rebase to lose the minor tweaks to WebSocket and Leader.
btrahan edited edge metadata.

The "notifications are too fast" thing seems like a problem that you may want to fix before submitting / within a few hours of submitting. Also, does this mean there's a bunch of browsers we no longer support? (Yay in my book... :D) Otherwise though...

shipitquick

This revision is now accepted and ready to land.Jan 7 2015, 5:46 PM
joshuaspence edited edge metadata.

This looks good to me. One potential issue (I think) is that users will blindly update their install and wonder why Aphlict no longer works. We could feasibly add a setup check that checks whether ws is installed, but I'm not sure if this would provide much value because we don't really know on which host Aphlict is running (it doesn't have to be on the web app host). I think that the best that we can do here is add a try/catch to the require('ws') call and provide installation instructions if the package is missing.

We could possibly also add some more information to PhabricatorExtraConfigSetupCheck to say "Hey, you have `notification.user

src/docs/user/configuration/notifications.diviner
34

There's only one port now.

support/aphlict/server/aphlict_server.js
5

Maybe we should try/catch this and provide installation instructions if it is missing. Otherwise, users who blindly update may not understand why Aphlict stopped working.

16–17

Sort of ugly, maybe just cert and key would suffice.

99–102

Maybe include ws._socket.remotePort as well?

This looks good to me. One potential issue (I think) is that users will blindly update their install and wonder why Aphlict no longer works. We could feasibly add a setup check that checks whether ws is installed, but I'm not sure if this would provide much value because we don't really know on which host Aphlict is running (it doesn't have to be on the web app host). I think that the best that we can do here is add a try/catch to the require('ws') call and provide installation instructions if the package is missing.

We could possibly also add some more information to PhabricatorExtraConfigSetupCheck to say "Hey, you have notification.user set... you should unset this and also install ws. This won't provide any value for users who run the Flash-based Aphlict server as root.

Something else worth mentioning (although I'm not sure if/what we want to do with this) is that it is possible to route the WebSockets traffic through nginx. In my testing, I was using the following nginx configuration:

/etc/nginx/conf.d/connection_upgrade.conf
map $http_upgrade $connection_upgrade {
    default upgrade;
    '' close;
}
/etc/nginx/conf.d/websockets.conf
upstream websockets {
    ip_hash;
    server 127.0.0.1:22280;
}
/etc/nginx/sites-enabled/phabricator.local.conf
server {
  listen *:80;
  server_name phabricator.local;

  root /usr/src/phabricator/webroot;
  index index.php;

  location /index.php {
    include /etc/nginx/fastcgi_params;

    fastcgi_pass php_pool;
    fastcgi_index index.php;
  }

  location = /favicon.ico {
    try_files $uri =204;
  }

  location /ws/ {
    proxy_pass http://websockets;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_read_timeout 999999999;
  }

  location / {
    root /usr/src/phabricator/webroot;
    index index.html index.htm index.php;

    rewrite ^/(.*)$ /index.php?__path__=/$1 last;
  }

}

I think that this solution is nice because we should be able to terminate SSL at the nginx layer. It is also a step in the right direction in terms of scaling the Aphlict server. On the downside, it is a bit more advanced and could be difficult to support (also, I have no idea about Apache).

Just some personal experience with ws vs. wss:
In our websites we've noticed that a few Anti-Virus programs which tend to proxy http sessions (e.g. Kaspersky) are unable to do the websocket upgrade.
Sometimes the HTTP polling fallback (don't know if you use something like that) works, sometimes it'll break completely.

Because of this we've enforced wss everywhere (with a proxy/lb like nginx or similar in front).
Unfortunately this also forces the original site to be https.
See: https://developer.mozilla.org/en-US/docs/WebSockets/Writing_WebSocket_client_applications

WebSockets should not be used in a mixed content environment; that is, you shouldn't open a non-secure WebSocket connection from a page loaded using HTTPS or vice-versa. In fact, some browsers explicitly forbid this, including Firefox 8 and later.

Yeah, let me sneak in a fix before this for the self-notifications. I'm just going to try this rule:

  • If you took the action, ignore the notification.

That will get this case wrong:

  • You have two tabs open, both showing T123. You make a comment on one tab.
  • I think the ideal behavior is that the first tab shows nothing, and the second tab shows the "page updated, click to reload" button but not the "you made a comment" notification.
  • I can't come up with an easy way to do that offhand without also showing the "page updated, click to reload" button on the first tab.
  • Solution is maybe something like "don't show the 'click to reload' bubble while there are any requests active", but that's a little bit involved.

I think I may also need to fix T6713 before this can land, since it's pretty bad locally.

I think there miiight be an argument for trying to install ws ourselves if it's missing, but for now I'll just try/catch and we can see how much of a headache that causes.

epriestley edited edge metadata.
  • Fix documentation.
  • Try/Catch the "ws" module.
  • Expand documentation and walk through the "ws" install.

A couple of other notes:

  • notification.debug currently has no effect. I'll plan to either make it do something or remove it, eventually.
  • Deleting aphlict_test_client.php might make sense? It's nearly useless now.

I just applied this on my dev install and noticed an issue. It seems that with the default value for notification.client-uri and notification.server-uri, the Aphlict server will only listen for WebSocket connections on localhost. I was able to fix this by listening on 0.0.0.0 instead.

I just applied this on my dev install and noticed an issue. It seems that with the default value for notification.client-uri and notification.server-uri, the Aphlict server will only listen for WebSocket connections on localhost. I was able to fix this by listening on 0.0.0.0 instead.

A couple of other notes:

  • notification.debug currently has no effect. I'll plan to either make it do something or remove it, eventually.
  • Deleting aphlict_test_client.php might make sense? It's nearly useless now.

+1 to removing both.

We can kill off the externals/vegas directory now as well.

  • Remove notification.debug.
  • Remove aphlict_test_client.php.
  • Remove externals/vegas/.
  • Add a --test flag to test configuration.
  • Run in test mode before forking.
  • Fail explicitly when started as root.
  • Remove some more sudo bin/aphlict advice.
  • Don't bind the notification server to a specific host (i.e., listen on any interface). The push server remains bound.
  • Now that we hide your own notifications, send the test notification from the Notifications application instead of from the viewer.

Yeah, let me sneak in a fix before this for the self-notifications. I'm just going to try this rule:

  • If you took the action, ignore the notification.

That will get this case wrong:

  • You have two tabs open, both showing T123. You make a comment on one tab.
  • I think the ideal behavior is that the first tab shows nothing, and the second tab shows the "page updated, click to reload" button but not the "you made a comment" notification.
  • I can't come up with an easy way to do that offhand without also showing the "page updated, click to reload" button on the first tab.
  • Solution is maybe something like "don't show the 'click to reload' bubble while there are any requests active", but that's a little bit involved.

I think I may also need to fix T6713 before this can land, since it's pretty bad locally.

Looks like its not too bad to tell if a given tab is the focused tab? (So silence stuff on focused tab...)

http://www.thefutureoftheweb.com/blog/detect-browser-window-focus

  • Add check for unwritable PID file.
  • Default PID file to less-rooty location.

eh, maybe that's a more complicated mess than I thought. See http://greensock.com/forums/topic/9059-cross-browser-to-detect-tab-or-window-is-active-so-animations-stay-in-sync-using-html5-visibility-api/ for a more modern, much crazier looking "solution" to knowing which tab has focus.

Looks like its not too bad to tell if a given tab is the focused tab? (So silence stuff on focused tab...)

If no tab is focused (e.g., you have Phabricator open but Facebook is the active tab) we still want to play notification sounds (depending on user pref / product design choices) and send desktop notifications.

I believe tabs lose focus when you switch to another application, as well. For example, if I have Phabricator open and then switch to email, the tab blurs.

These events are useful for "pause a game the user is playing" but all tabs are blurred at the time when desktop notifications / sounds are most important (to alert the user to a background event).

Oh, just for the bubble.

The problem is we don't know which tab the notification came from. If we get a notification that T2 is updated, and T2 is in the current tab, we do want to show it if the notification came from a different tab. We also want to show it in open tabs which don't happen to be active.

One possible fix is just to have each tab generate a random ID, and send it whenever the user does anything, and attach it to the notification. Then when a tab gets a notification back, it just checks if it was the original responsible tab. That will require a fair chunk of work, but seems like the only approach that doesn't involve a lot of race conditions, at last that I can come up with.

There are definitely some other product behaviors we can refine here, too.

A big one is that right now, you don't get a notification about something that you're looking at unless you're also subscribed to it. Fixing this isn't too tough but cascaded enough that I ripped it out of an early version of this diff.

In general, this diff shouldn't actually change any behaviors, but I expect things will be faster and more reliable now, so any sketchy behaviors which previously went unnoticed by being slow/unreliable may be sucky/painful now.

This revision was automatically updated to reflect the committed changes.