See PHI1737. I'm running into an issue in production which I'm having difficulty reproducing locally. Roughly, submitting a particular form generates a CSRF exception for a recently-imported instance. This works fine on other instances and locally. Debugging the parts of the workflow which are easily reachable from the CLI hasn't yielded fruit.
Much of this workflow isn't easily inspectable from the CLI, but there's currently no way to run Phacility production in a debuggable context. This is mostly by design, but makes the tiny fraction of problems which are data-dependent and resist local reproduction harder to understand.
I'd like to provide a workflow to pull a reproduction case into a debuggable environment, like this:
- the environment is some phantom web-debug host which is not in any LB pool;
- the ports are glued together with ssh -L 80:web-debug001:80 via a bastion host;
- then you can stop the local webserver, start the tunnel, poke your hosts file, and should be able to use an actual browser to review behavior and nano on the web-debug host to affect behavior.
Notes:
- AllowTcpForwarding must allow local. Enabling this allows any user who can connect to a bastion forward through to any port on any cluster host, and effectively grants them permission to make outbound connections from the bastion to any host the bastion can reach. Today, this is fine (all users with access to the bastion are allowed to establish sessions on the bastion and initiate outbound connections) but in the future it might be appropriate to tie this permission to user role permissions more tightly. This can be accomplished by specifying options in AuthorizedKeys.
- If AllowTcpForwarding prevents forwarding, the failure mode seems to be implicit (the host listens and accepts the connection but immediately resets it) rather than an explicit error like "You aren't allowed to forward ports because AllowTcpForwarding is off.".
$ curl http://127.0.0.1:17000/ curl: (7) Failed to connect to 127.0.0.1 port 17000: Connection refused
$ curl http://127.0.0.1:17000/ curl: (56) Recv failure: Connection reset by peer
$ curl 127.0.0.1:17000 <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> ...
This is a little weird, but whatever?
- There's no UI feedback from ssh that a connection is supporting tunnels. As a UI affordance to reduce surprises, bin/remote tunnel (or whatever) could perhaps exec sh -c 'echo "this is a tunnel" ; sleep 86400 ; exit' or similar to make the connection an explicit tunnel with a convenience timeout.
- To forward local 80, we need to sudo ... -- ssh ..., and the messaging when you don't is a bit weird. This may also require an explicit --identity, since the default is affected by use of sudo. Phabricator could also be modified to respect HTTP port numbers arriving in Host: ... headers, but I'm not wildly excited about this.
- The web-debug role should enable opcache.validate_timestamps.
- The web-debug role should probably enable Phabricator application-level debug flags (darkconsole, etc).
HTTPS
Phacility web hosts run with security.require-https. Normally, HTTPS is terminated by the LB and preamble marks the request as HTTPS-on-the-client. When forwarding raw 80:80, the client is not HTTPS and the request is not marked as HTTPS (this request is still secure, since the external part is over SSH and the internal part is inside the VPC).
This creates a problem when Phabricator tries to figure out if it can set the secure flag on the cookie, and it refuses to set a non-HTTPS cookie with security.require-https enabled. The "real" fix here is probably to configure web-debug hosts in a special way that disables security.require-https.
Since I'm just using an extra web host as a web-debug host for now, I'm going to fake my way through this for the moment.