Page MenuHomePhabricator

Update preamble documentation to consider cluster mode and loadbalancers in series
Closed, ResolvedPublic

Description

I'm trying to put together a clustered setup but have ran into an issue with diffusion application not able to retrieve the repo information within the web gui. It's a simple cluster setup with only 3 host in the cluster. I registered all the host devices in Almanac to be clustered, created cluster repo service, binded the interfaces to the repo service. The Cluster Repository Server shows that all three host are in synced with each other, and i tried having a client do a pull and push to the clustered repo which seems to be working...but nothing seems to be showing up in diffusion when i click on the specific repo. I have whitelisted the cluster hosts in cluster.addresses. In particular, when i look in the Daemon application, i see that PhabricatorRepositoryPullLocalDaemon task have been failing 40-ish times so far. My setup is that all the EC2 host in the cluster is behind a LoadBalancer.

Is there something else that i'm missing in configuring the cluster setup to work? In particular, what should cluster.instance setting hold since the description for this setting wasn't very clear.

phabricator f3d8e3832c96040b6319f5f599aa877c2d773c25 (Sun, Aug 14)
arcanist c9337c2ade7c76edc98d27c216ab97fc1e40d01c (Sat, Aug 6)
phutil b6f4e866fdb2c41de23c8e635c7803a09a44e9f4 (Sun, Aug 14)

Unable to Retrieve Paths
[HTTP/500] Internal Server Error Exception: Unable to test remote address against cluster whitelist: REMOTE_ADDR is not defined.
Unable to Retrieve History
[HTTP/500] Internal Server Error Exception: Unable to test remote address against cluster whitelist: REMOTE_ADDR is not defined.
Unable to Load Tags
[HTTP/500] Internal Server Error Exception: Unable to test remote address against cluster whitelist: REMOTE_ADDR is not defined.
Unable to Load Branches
[HTTP/500] Internal Server Error Exception: Unable to test remote address against cluster whitelist: REMOTE_ADDR is not defined.

Event Timeline

cluster.instance is only used when serving multiple separate instances from a single cluster (for example, when hosting Phabricator for multiple clients), and should normally be left alone. This setting is used in the Phacility cluster to make x.phacility.com, y.phacility.com, etc., work correctly. This is sort of confusing because we use "cluster" to refer both to instancing and redundancy/scale; both are "clusters" in some sense but we don't generally expect users to care about instancing.

REMOTE_ADDR stores the address of the connecting host, and should be provided by PHP:

http://php.net/manual/en/reserved.variables.server.php

I am not sure how it is possible for it to become undefined, except possibly by meddling with $_SERVER in, e.g., preamble.php or that you are using nginx but have omitted this line from your configuration:

fastcgi_param  REMOTE_ADDR        $remote_addr;

We need to be able to determine the address of the connecting host in order to test if it's in the cluster address block, so we need to figure out why REMOTE_ADDR isn't defined.

  • Which webserver are you using?
  • Is it nginx, without that fastcgi_param?
  • Are you mangling REMOTE_ADDR in phabricator/support/preamble.php?

we are using Apache webserver, and yes we are mangling the REMOTE_ADDRESS in preamble as recommended in the configure_preamble article section "adjusting client ips" because we are behind a load balancer.

<?php

$_SERVER['REMOTE_ADDR'] = $_SERVER['HTTP_X_FORWARDED_FOR'];
$_SERVER['HTTPS'] = true;

(T8850 is vaguely related.)

In a cluster configuration, intracluster requests won't go through the load balancer, so you'll likely need to do something more like this to use X-Forwarded-For conditionally:

if (isset($_SERVER['HTTP_X_FORWARDED_FOR'])) {
  $forwarded_for = $_SERVER['HTTP_X_FORWARDED_FOR'];
  if ($forwarded_for) {
    // This may be a list of IPs, like "1.2.3.4, 4.5.6.7", if the
    // request the load balancer received also had this header. In
    // particular, this happens routinely with requests received through
    // the CloudFront CDN.

    // We only care about or trust the last IP in the list: the others are
    // controlled by the client.
    $forwarded_for = explode(',', $forwarded_for);
    $forwarded_for = last($forwarded_for);
    $forwarded_for = trim($forwarded_for);
    $_SERVER['REMOTE_ADDR'] = $forwarded_for;
  }
}

We should recommend something more in this vein in the documentation.

(Note that this is only safe if you're sure that the request is coming from a load balancer, although that may always be true in your environment.)

thanks Evan, that did the trick!

epriestley renamed this task from diffusion unable to retrieve repo information when in cluster repo mode to Update preamble documentation to consider cluster mode and loadbalancers in series.Aug 16 2016, 10:40 PM
epriestley claimed this task.
epriestley triaged this task as Normal priority.
epriestley edited projects, added Documentation; removed Bug Report.

This should be fixed in HEAD of master, and in the documentation on this host whenever it regenerates (every other rainy Tuesday during a full moon).

Thanks for the report, and let us know if you run into anything else.