We are experiencing slow cloning of large repositories on Phacility. We are based in Australia and have a 1Gbps down / 1Gbps up synchronous internet connection.
Reproduction steps:
- Create a new repository on a Phacility instance (for this test, I created the test instance test-fiycs7awfp4z.phacility.com).
- Obviously, set up your SSH keys and everything.
- Wait like, 5 minutes apparently for Phacility to actually create the new repo before you can clone it.
- Use the attached script in the repository to generate some large commits with big files.
- For comparison purposes, optionally push the repository to GitHub over SSH. I have pushed the same repo to both the Phacility instance above and GitHub here: https://github.com/hach-que/BigRepo.
- Push the repository to Phacility over SSH.
- Clone a new copy from Phacility over SSH and observe the slow speeds.
- For comparison, do the same with the GitHub repository, over both SSH and HTTPS.
Actual results:
Keep in mind these results are from Australia. If you are not based in Australia, you should probably spin up an EC2 instance in the Sydney region to do comparison tests.
Test repository size based on push: 400.13 MiB
Actual repository size that I was attempting to clone prior to filing this report: 1.2 GiB
Push speed (not important as we don't often push large files, but do need to clone them; here for informational purposes):
Phacility (SSH) | GitHub (SSH) |
8.31MiB/s (wat?) | 2.46MiB/s |
Pull speed:
Phacility (SSH) | GitHub (SSH) | GitHub (HTTPS) |
99 KiB/s | 98 KiB/s | 3.87MiB/s |
I did notice that GitHub HTTPS pull got faster as the transfer went, starting out at 64KiB/s and gradually accelerating over the whole transfer up to around 2-5MiB/s. This was not the case for SSH, which roughly stayed around the same speed.
Additionally for reference, here are some results from speedtest.net which show the connectivity speed:
Melbourne Server | California Server |
94.46 Mbps down | 20.77 Mbps down |
93.51 Mbps up | 96.65 Mbps up |
This demonstrates the actual internet connection to either region is not the limiting factor in clone speed.
Expected results:
I expected that Phacility should be able to serve repository data at least as fast as GitHub - HTTPS for some reason appears to be much faster on GitHub, but Phacility doesn't offer HTTPS cloning. The difference is pretty drastic too, we're talking a couple of minutes vs hours in clone time.
I can't explain why HTTPS got faster though. This was the first clone, and the data is random, so it's unlikely that caching or cached requests played any role here. It's possible that Git's HTTPS protocol is just naturally faster at transferring large files, but that also seems unlikely.
Attached script:
#!/bin/bash for ((a=0;$a<20;a=$[$a+1])); do for ((i=0;$i<10;i=$[$i+1])); do dd if=/dev/urandom of=$i.bin bs=1048576 count=2 git add $i.bin done git commit -m "Change binary files (commit #$a)" done
Command raw output for reference:
Push GitHub SSH:
jrhod@DESKTOP-4MQ2MPG /d/Projects/big-repo (master) $ git push git@github.com:hach-que/BigRepo.git master:master Counting objects: 240, done. Delta compression using up to 8 threads. Compressing objects: 100% (240/240), done. Writing objects: 100% (240/240), 400.13 MiB | 2.46 MiB/s, done. Total 240 (delta 1), reused 0 (delta 0) remote: Resolving deltas: 100% (1/1), done. To github.com:hach-que/BigRepo.git * [new branch] master -> master
Push Phacility SSH:
jrhod@DESKTOP-4MQ2MPG /d/Projects/big-repo (master) $ git push ssh://test-fiycs7awfp4z@vault.phacility.com/diffusion/1/big-repo.git master:master # Push received by "web.phacility.net", forwarding to cluster host. # Waiting up to 120 second(s) for a cluster write lock... # Acquired write lock immediately. # Waiting up to 120 second(s) for a cluster read lock on "repo007.phacility.net"... # Acquired read lock immediately. # Device "repo007.phacility.net" is already a cluster leader and does not need to be synchronized. # Ready to receive on cluster host "repo007.phacility.net". Counting objects: 240, done. Delta compression using up to 8 threads. Compressing objects: 100% (240/240), done. Writing objects: 100% (240/240), 400.13 MiB | 8.31 MiB/s, done. Total 240 (delta 1), reused 0 (delta 0) remote: Resolving deltas: 100% (1/1), done. # Released cluster write lock. To ssh://vault.phacility.com/diffusion/1/big-repo.git * [new branch] master -> master
Pull GitHub HTTPS:
jrhod@DESKTOP-4MQ2MPG /d/Projects $ git clone https://github.com/hach-que/BigRepo big-repo-github Cloning into 'big-repo-github'... remote: Counting objects: 240, done. remote: Compressing objects: 100% (239/239), done. remote: Total 240 (delta 1), reused 240 (delta 1), pack-reused 0 Receiving objects: 100% (240/240), 400.13 MiB | 3.87 MiB/s, done. Resolving deltas: 100% (1/1), done.
Pull GitHub SSH:
jrhod@DESKTOP-4MQ2MPG /d/Projects $ git clone git@github.com:hach-que/BigRepo.git big-repo-github-2 Cloning into 'big-repo-github-2'... remote: Counting objects: 240, done. remote: Compressing objects: 100% (239/239), done. remote: Total 240 (delta 1), reused 240 (delta 1), pack-reused 0 Receiving objects: 100% (240/240), 400.13 MiB | 98.00 KiB/s, done. Resolving deltas: 100% (1/1), done.
Pull Phacility SSH:
jrhod@DESKTOP-4MQ2MPG /d/Projects $ git clone ssh://test-fiycs7awfp4z@vault.phacility.com/diffusion/1/big-repo.git big-repo-phacility Cloning into 'big-repo-phacility'... # Fetch received by "web.phacility.net", forwarding to cluster host. # Waiting up to 120 second(s) for a cluster read lock on "repo007.phacility.net"... # Acquired read lock immediately. # Device "repo007.phacility.net" is already a cluster leader and does not need to be synchronized. # Cleared to fetch on cluster host "repo007.phacility.net". remote: Counting objects: 240, done. remote: Compressing objects: 100% (239/239), done. remote: Total 240 (delta 1), reused 240 (delta 1) Receiving objects: 100% (240/240), 400.13 MiB | 99.00 KiB/s, done. Resolving deltas: 100% (1/1), done.