Page MenuHomePhabricator

Live notifications do not work behind apache SSL reverse proxy
Closed, ResolvedPublic

Description

This report contains several issues that might be caused by a common problem so I refrained from opening multiple bug reports for it. I will split it up if requested.

Version Information

phabricator 3f9e8e675bd94a481974c65cbf4933a6a97e907c (Mon, Apr 11)
arcanist 737f5c0df976fe2b3178aac6ab7feb3d3621d99e (Sat, Apr 9)
phutil a46f7b7e5d8073a91e89f7facc9fea62baaca7e8 (Sun, Apr 10)
sprint df6e9dee03e4dd0da551199df33b6d04bf096d32 (Mon, Apr 11)

Setup (Steps to Reproduce?)

I am running this docker image of phabricator behind an apache reverse proxy to have a unified address with SSL/TLS for my phabricator and other services. As you can see above, I also added the sprint extension but I don't think that this matters.

notification.enabled is set to true.

Initially, I had problems with the settings of notification.client-uri and notification.server-uri that I solved as follows:

Now I started aphlict in debug mode with parameter --client-port=22280 as it would not bind to that port due to notification.client-uri (see T9293#134595) and watched the debug output there when browsing/sending test notifications.

Firefox does not connect

On Firefox the notification area says "Setting Up Client" forever and nothing is visible in aphlict's output.
On this site the notifications work in the same Firefox.

Listeners accumulate (-> memory leak?)

On Chromium the notification area says "Connected" and aphlict says:

[Tue Apr 12 2016 12:10:40 GMT+0000 (UTC)] <Listener/1/> Connected from 172.18.0.1.
[Tue Apr 12 2016 12:10:40 GMT+0000 (UTC)] <Listener/1/> Received message: {"command":"subscribe","data":["PHID-USER-daleu2naqv4ajmptqerr"]}
[Tue Apr 12 2016 12:10:40 GMT+0000 (UTC)] <Listener/1/> Subscribed to: ["PHID-USER-daleu2naqv4ajmptqerr"]

where 172.18.0.1 is the IP of my apache proxy as seen by the docker container.

When I reload, I get

[Tue Apr 12 2016 12:11:24 GMT+0000 (UTC)] <Listener/1/> Disconnected.
[Tue Apr 12 2016 12:11:27 GMT+0000 (UTC)] <Listener/2/> Connected from 172.18.0.1.
[Tue Apr 12 2016 12:11:27 GMT+0000 (UTC)] <Listener/2/> Received message: {"command":"subscribe","data":["PHID-USER-daleu2naqv4ajmptqerr"]}
[Tue Apr 12 2016 12:11:27 GMT+0000 (UTC)] <Listener/2/> Subscribed to: ["PHID-USER-daleu2naqv4ajmptqerr"]

so the first listener is disconnected and a new one is opened. This leads to an increasing listener number and to the warning:

(node) warning: possible EventEmitter memory leak detected. 11 close listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at WebSocketServer.addListener (events.js:239:17)
    at WebSocketServer.<anonymous> (/srv/phabricator/phabricator/support/aphlict/server/lib/AphlictClientServer.js:87:13)
    at emitOne (events.js:77:13)
    at WebSocketServer.emit (events.js:169:7)
    at /srv/phabricator/phabricator/support/aphlict/server/node_modules/ws/lib/WebSocketServer.js:79:14
    at completeHybiUpgrade2 (/srv/phabricator/phabricator/support/aphlict/server/node_modules/ws/lib/WebSocketServer.js:262:5)
    at completeHybiUpgrade1 (/srv/phabricator/phabricator/support/aphlict/server/node_modules/ws/lib/WebSocketServer.js:287:13)
    at WebSocketServer.handleHybiUpgrade (/srv/phabricator/phabricator/support/aphlict/server/node_modules/ws/lib/WebSocketServer.js:315:3)
    at WebSocketServer.handleUpgrade (/srv/phabricator/phabricator/support/aphlict/server/node_modules/ws/lib/WebSocketServer.js:152:26)
    at Server.<anonymous> (/srv/phabricator/phabricator/support/aphlict/server/node_modules/ws/lib/WebSocketServer.js:77:12)
(node) warning: possible EventEmitter memory leak detected. 11 error listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at WebSocketServer.addListener (events.js:239:17)
    at WebSocketServer.<anonymous> (/srv/phabricator/phabricator/support/aphlict/server/lib/AphlictClientServer.js:92:13)
    at emitOne (events.js:77:13)
    at WebSocketServer.emit (events.js:169:7)
    at /srv/phabricator/phabricator/support/aphlict/server/node_modules/ws/lib/WebSocketServer.js:79:14
    at completeHybiUpgrade2 (/srv/phabricator/phabricator/support/aphlict/server/node_modules/ws/lib/WebSocketServer.js:262:5)
    at completeHybiUpgrade1 (/srv/phabricator/phabricator/support/aphlict/server/node_modules/ws/lib/WebSocketServer.js:287:13)
    at WebSocketServer.handleHybiUpgrade (/srv/phabricator/phabricator/support/aphlict/server/node_modules/ws/lib/WebSocketServer.js:315:3)
    at WebSocketServer.handleUpgrade (/srv/phabricator/phabricator/support/aphlict/server/node_modules/ws/lib/WebSocketServer.js:152:26)
    at Server.<anonymous> (/srv/phabricator/phabricator/support/aphlict/server/node_modules/ws/lib/WebSocketServer.js:77:12)

Subscribers are not counted

When visiting /notification/status/ I can see that notifications have been issued (messages.in > 0) but none have been delivered (messages.out == 0). Furthermore, even when there are Listeners in the aphlict.log that have been subscribed and have not yet disconnected, the clients.active and clients.total stay at zero.

Live notifications are not displayed

Now when I send a test notification I get this output from aphlict:

[Tue Apr 12 2016 12:10:45 GMT+0000 (UTC)] Received notification (/ws/): {"key":"6272641019226701868","type":"notification","subscribers":["PHID-USER-daleu2naqv4ajmptqerr"]}

I don't see any notification in the browser (just testing Chromium here as Firefox does not connect).
When I reload the page the notification area shows that I have new notifications.

Cause?

As the notification is working on this site with the same browser that does not even connect to the web socket on mine, I believe that there is an error in my setup. I'd like to find it and probably that should become part of the documentation as the setup was quite a hassle so far.

Event Timeline

Yeah, you'll want to open one bug report per issue. Also, installations via third party installers are not supported and aren't considered a valid means to reproduce the issue. You may use a Phacility test instance if you want an easy, clean, reproducible installation quickly built.

As the notification is working on this site with the same browser that does not even connect to the web socket on mine, I believe that there is an error in my setup. I'd like to find it and probably that should become part of the documentation as the setup was quite a hassle so far.

Our documentation isn't meant to be a detailed howto for every single os/distribution or setup, just cover the basics as they relate to Phabricator itself. If the issue isn't specific to Phabricator, it's likely better covered by documentation elsewhere on the web.

Yeah, you'll want to open one bug report per issue.

That would be one ticket for each of

  • Firefox does not connect
  • listeners accumulate
  • subscribers are not counted
  • live notifications are not displayed

Should each of them have the (same) version and steps to reproduce introduction or should I link to this issue for that?
IMHO, apart from the accumulating listeners the other issue are probably only worth one.

Also, installations via third party installers are not supported and aren't considered a valid means to reproduce the issue. You may use a Phacility test instance if you want an easy, clean, reproducible installation quickly built.

I was under the impression that the docker image actually does a default installation as per installation guide?
I think that image is maintained by @hach-que.

Our documentation isn't meant to be a detailed howto for every single os/distribution or setup, just cover the basics as they relate to Phabricator itself. If the issue isn't specific to Phabricator, it's likely better covered by documentation elsewhere on the web.

There were several issues that have been reported as bugs and are fixed but not published or being worked around but are not mentioned in the documentation. I'd hoped to at least find mentions of these issues in the documentation especially since they sometimes arise directly from steps taken as described in the documentation. I can very well imagine that this is a corner case that is specific to Phabricator's or Aphlict's handling of apache proxied websockets.

Yeah that Docker instance is maintained by me. It's not intended to sit behind any kind of load balancing though (unless you're routing WebSockets directly to the client). Otherwise I have no idea if your Apache load balancer is interfering with the connection.

Otherwise I have no idea if your Apache load balancer is interfering with the connection.

I presume it is. I setup apache to proxy /ws/ to the exposed port 22280 of the phabricator docker container.

Some packet sniffing on the (docker network) bridge between apache and the docker container showed that the websocket is connected and the subscribe request is sent (as documented by the log excerpts above) but no response is made from aphlict even after triggering a test notification that is shown in the log with the correct subscriber ID.

How should I proceed in diagnosing this?

You probably need to configure Apache to correctly route the web sockets.

Alternatively, add another IP to your host and just route the Phabricator container out of a dedicated IP. That's the simplest way of getting this working.

Okay, I found the culprit in the apache configuration after debugging what happens in aphlict.

The problem was this:

<Location "/ws/">
ProxyPass        ws://phabricator.container:22280/ nocanon
</Location>

which should have been

<Location "/ws/">
ProxyPass        ws://phabricator.container:22280/ws/ nocanon
</Location>

(Note the additional ws/ at the end of the URL.)

This resulted in listeners requesting path / while the admin side added the path /ws/ so they got two different listener lists. I did not notice this discrepancy in the logs up until now, maybe it should be more prominent or the log format should be explained in the documentation?

After changing this I get live notifications on Chrome and the warning about too many listeners is gone and the subscribers show up on the /notification/status/ page.

However, on Firefox I still get "Setting Up Client". I will investigate that later.

Today the live notifications work in Firefox as well. I believe that it did not work yesterday due to caching.

I've marked D15705 as fixing this, since it makes our interpretation of the path unambiguous: we would no longer interpret /ws/ as a request for namespace "ws".

It sounds like everything else got sorted out or resolved on its own.