Details
We're currently using AWS ALB (the newer LB) to terminate SSL for HTTPS, and since the same hostname is being used for git it forwards git as well:
22 (TCP) forwarding to 22 (TCP)
- idle timeout: 3600 seconds
- cross-zone load balancing: enabled
We use Jenkins for builds, and we get git pull errors sporadically. My current guess is that we trigger a bunch of parallel builds, and there's some connection management weirdness going on:
ERROR: Error fetching remote repo 'origin' 09:35:21 hudson.plugins.git.GitException: Failed to fetch from git@ourphabricator.com:diffusion/AST/aurelia-staging.git 09:35:21 at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:797) 09:35:21 at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1051) 09:35:21 at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1082) 09:35:21 at hudson.scm.SCM.checkout(SCM.java:495) 09:35:21 at hudson.model.AbstractProject.checkout(AbstractProject.java:1278) 09:35:21 at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:604) 09:35:21 at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) 09:35:21 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529) 09:35:21 at hudson.model.Run.execute(Run.java:1720) 09:35:21 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) 09:35:21 at hudson.model.ResourceController.execute(ResourceController.java:98) 09:35:21 at hudson.model.Executor.run(Executor.java:401) 09:35:21 Caused by: hudson.plugins.git.GitException: Command "git fetch --tags --progress git@ourphabricator.com:diffusion/AST/aurelia-staging.git +refs/heads/*:refs/remotes/origin/*" returned status code 128: 09:35:21 stdout: 09:35:21 stderr: ssh_exchange_identification: Connection closed by remote host 09:35:21 fatal: Could not read from remote repository.
I suspected idle timeout at first, but bumped that to 3600s, and we're still seeing it. Has anyone else run git over ssh and behind ELB/ALB with success? Is there anything else that needs to be configured?
Answers
Empirically, it seems like MaxStartups 1024 in /etc/ssh/sshd_config helps, but we're still seeing errors when a bunch of builds start at once:
ERROR: Error fetching remote repo 'origin' 00:00:00.788 hudson.plugins.git.GitException: Failed to fetch from git@phab-host.com:diffusion/R/our-repo.git 00:00:00.788 at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:797) 00:00:00.788 at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1051) 00:00:00.788 at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1082) 00:00:00.788 at hudson.scm.SCM.checkout(SCM.java:495) 00:00:00.789 at hudson.model.AbstractProject.checkout(AbstractProject.java:1278) 00:00:00.789 at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:604) 00:00:00.789 at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) 00:00:00.789 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529) 00:00:00.789 at hudson.model.Run.execute(Run.java:1720) 00:00:00.789 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) 00:00:00.789 at hudson.model.ResourceController.execute(ResourceController.java:98) 00:00:00.789 at hudson.model.Executor.run(Executor.java:401) 00:00:00.790 Caused by: hudson.plugins.git.GitException: Command "git fetch --tags --progress git@phab-host.com:diffusion/R/our-repo.git +refs/heads/*:refs/remotes/origin/*" returned status code 128: 00:00:00.790 stdout: 00:00:00.790 stderr: ssh_exchange_identification: Connection closed by remote host 00:00:00.790 fatal: Could not read from remote repository. 00:00:00.790 00:00:00.790 Please make sure you have the correct access rights 00:00:00.790 and the repository exists.
It does look different, though!
I don't know of any other magic config options.
Since you're still seeing ssh_exchange_identification I think sshd is still killing you for some reason. I would not expect that error to be associated with idle timeouts or anything we're doing beyond sshd, except maybe initial auth stuff.
You could try running sshd with -d -d -d in the foreground (or maybe there are equivalent sshd_config options for the normal logs, I just don't know them offhand) and it might give you more insight. (Be careful about stopping sshd or running it in the foreground if you rely on SSH to administrate the machine, though, of course.)