Page MenuHomePhabricator

Support Git Large File Storage
Closed, ResolvedPublic

Assigned To
Authored By
hach-que
Apr 8 2015, 8:27 PM
Referenced Files
F1209330: lfs_trace.log
Apr 7 2016, 11:25 AM
F1182199: Screen Shot 2016-03-19 at 12.21.40 PM.png
Mar 19 2016, 8:20 PM
F1182530: Screen Shot 2016-03-19 at 1.19.31 PM.png
Mar 19 2016, 8:20 PM
Tokens
"Like" token, awarded by chbrossotaf."Love" token, awarded by cguenther."Like" token, awarded by salvian."Like" token, awarded by Grimeh."Haypence" token, awarded by chad."Mountain of Wealth" token, awarded by avivey."Like" token, awarded by r0bbie."Love" token, awarded by sweenzor."Love" token, awarded by kristo.mario."Love" token, awarded by tallpauley."Like" token, awarded by baszalmstra."Pterodactyl" token, awarded by frozendevil.

Description

Main article: https://github.com/blog/1986-announcing-git-large-file-storage-lfs

Basically this is a new implementation of large file support for Git, similar to git-media, git-annex or git-fat in the past. At a glance, this implementation and specification looks better than any previous implementation. Importantly this implementation is that it has no external dependencies on client binaries because it's written in Go, which is likely to lead to quick adoption. In contrast, git-media requires Ruby, git-fat requires Python and git-annex uses symlinks, which makes all 3 implementations difficult or impossible to use on Windows.

Another nicety of this implementation is that providing the server has the right endpoints, it requires no URL configuration for the large file storage. For HTTPS URLs, it appends "/info/lfs" to the remote URL and for SSH URLs it calls "git-lfs-authenticate" to determine authentication information (this allows the SSH authentication to provide an OAuth token which can then be used to download files over HTTPS from the same system). More details on this implementation are in the api.md reference document.

Important documents for implementation:

As someone who works in games development, this would be extremely useful in Phabricator, as large files often accumulate quickly and drastically increase the size of repositories.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

(This is paused since we're waiting on feedback -- if you give it a shot, let us know how things go.)

I'm on latest stable (fea2389), and I just tried the following

$ git clone ssh://git@phabricator.example.com:222/diffusion/18/lfs-test.git                                                                     [10:54:36]
Cloning into 'lfs-test'...
warning: You appear to have cloned an empty repository.
Checking connectivity... done.                                                                                                                       
$ cd lfs-test/                                                                                                                                  [10:56:39]
$ git lfs install                                                                                                                               [10:56:42]
Updated pre-push hook.
Git LFS initialized.                                                                                                         
$ git lfs track "*.jar"                                                                                                                         [10:57:05]
Tracking *.jar                                                                                                  
$ cp ../some/path/antlr-3.2.jar .                                                                                                               [10:57:19]
$ ls                                                                                                                                            [10:57:52]
antlr-3.2.jar                                                                                                       
$ git add antlr-3.2.jar                                                                                                                         [10:57:54]
$ git commit -m "testing"                                                                                                                       [10:57:57]
[master (root-commit) 2690d02] testing
 1 file changed, 3 insertions(+)
 create mode 100644 antlr-3.2.jar
$ git push -u origin master                                                                                                                     [10:58:04]
Git LFS: (1 of 1 files) 1.84 MB / 1.84 MB                                                                                                                
Counting objects: 3, done.
Delta compression using up to 6 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 329 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To ssh://git@phabricator.example.com:222/diffusion/18/lfs-test.git
 * [new branch]      master -> master
Branch master set up to track remote branch master from origin.

so far so good. however...

$ cd ..                                                                                                                                         [10:58:18]
$ rm -rf lfs-test/                                                                                                                              [10:59:45]
$ git clone ssh://git@phabricator.example.com:222/diffusion/18/lfs-test.git                                                                     [10:59:50]
Cloning into 'lfs-test'...
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), done.
Checking connectivity... done.                                                                                                                     
$ cd lfs-test/                                                                                                                                  [11:00:15]
$ ls                                                                                                                                            [11:00:19]
antlr-3.2.jar
$ cat antlr-3.2.jar                                                                                                                             [11:00:45]
version https://git-lfs.github.com/spec/v1
oid sha256:4c8737014e7ca0d2c85171edf37f5a26b2d8d8237c283357b81a3269b6848d38
size 1928009                                                                                         
$ git lfs fetch origin                                                                                                                          [11:05:19]
Fetching master
Git LFS: (2 of 1 files) 0 B / 1.84 MB                                                                                                                    
Repository or object not found: http://phabricator.example.com/file/data/w5mihitt7ijile3b7n44/PHID-FILE-kbhl75cnfm6nwks7ojmf/dosd5iqevvn3rune/lfs-4c8737014e7ca0d2c85171edf37f5a26b2d8d8237c283357b81a3269b6848d38
Check that it exists and that you have proper access to it
Warning: errors occurred                 

We have enabled file storage as local storage on disk. Any ideas on what I am doing wrong?

As an addendum, I can definitely find the zip file on the box...

$ 7za l 8468df03d4cb6e204267ac7ef67b | head

7-Zip (a) [64] 15.09 beta : Copyright (c) 1999-2015 Igor Pavlov : 2015-10-16
p7zip Version 15.09 beta (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz (206D7),ASM,AES-NI)

Scanning the drive for archives:
1 file, 1928009 bytes (1883 KiB)

Listing archive: 8468df03d4cb6e204267ac7ef67b

--
Path = 8468df03d4cb6e204267ac7ef67b
Type = zip
Physical Size = 1928009

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2009-09-23 12:02:26 D....            0            0  META-INF
2009-09-23 12:02:24 .....          151          125  META-INF/MANIFEST.MF
2009-09-23 11:51:26 D....            0            0  org
2009-09-23 11:51:28 D....            0            0  org/antlr
2009-09-23 11:51:28 D....            0            0  org/antlr/misc
2009-09-23 11:51:28 D....            0            0  org/antlr/codegen

We don't have an alternative domain setup for serving files, using nginx + php-fpm.

Hmm, I'm not immediately sure -- that all looks correct on your side to me. Let me add some kind of basic LFS administration page that shows Phabricator's view of LFS objects in the repository -- hopefully that will shed some light on where things are breaking down.

Oh, actually -- if you browse to the object page in Diffusion (that is, use the "Browse Repository" view to navigate to lfs-test/antlr-3.2.jar), can you download the file from there? That might at least provide a hint about how far things got.

After T10262, we have a slightly simpler model for file access, so it's vaguely possible that this is simply fixed in master if it's some kind of access-token-related issue. But I can't really come up with a plausible theory for what sort of issue it might be.

Can confirm that navigating to the file in diffusion will let me download it, and it really does give me back the right file.

If nothing obvious stands out to you, I'm happy to keep my eye out for when more code lands in stable. Very excited for this to land!

Oh, I think I might know what's wrong. If you repeat the same process with a viewable file (like a JPEG or PNG) does that work?

If it does, I'm pretty sure it's an issue with the POST check we have on file downloads to prevent users from doing <applet src="http://your.install.com/file/data/blah/thing.jar" /> and having it work. A fix would be to add a header to let LFS skip the check, which is simple and safe.

That could explain why it worked for me locally: I was mostly testing with images locally, and the check doesn't happen if the content is "viewable in the browser" (as with images) or if alternate file domains are configured (as on this host).

Running GIT_TRACE=1 git lfs fetch origin may also be illuminating.

Unfortunately, no dice. I removed the jar file, and added a jpeg. I have attached a trace of running GIT_TRACE=1 git lfs fetch origin.

Hmm, odd. That definitely isn't the POST issue (although I suspect that may be an issue in some cases). The behavior will hopefully be more clear once master promotes, and definitely easier to debug. I'll dig into it a bit on my side in the meantime, I have a few ideas of things to try now. Thanks for your help!

I'm using S3 as Files storage backend, phabricator runs behind nginx, and the diffusion is hosted over SSH (diffusion.allow-http-auth=false, HTTP is not configured anyway). I'm having an issue:

  • pushing and pulling small files (<2MB) tracked as LFS -> works flawlessly
  • pushing 11x 1 GB files works, but can't do git clone / git lfs fetch (got the same issue as @benjumanji, the lfs_trace log is similar, 404 errors when getting the lfs-* files)
  • pushing a 11GB file, doesn't work. forgot the error message, will add it later if needed

Do I need to setup HTTP? from (shortened) nginx log, I see that

git lfs fetch

is making a GET request with auth: none and no cookie

"GET /file/data/[shortened]/PHID-FILE-[shortened]/lfs-[shortened] HTTP/1.1" 404 "-"  "Basic bm9uZQ=="

this is the error log when doing checkout

Downloading [FILE] (220.06 MB)
Error downloading object: [FILE](3a142ce520db80fa69c0bb72cb2a67e5b749923a94adc20f9bcf9c28f98d6934)

Errors logged to /home/salvian/lfs-test/.git/lfs/objects/logs/20160422T163800.925027437.log
Use `git lfs logs last` to view the log.
error: external filter git-lfs smudge -- %f failed 2
error: external filter git-lfs smudge -- %f failed
fatal: [LFS_FILE]: smudge filter lfs failed

You shouldn't need to configure diffusion.allow-http-auth -- the auth token is basically the first [shortened] in the URI, so that appears to be working properly.

If it works for one file, my expectation is that it should work for everything. I can't immediately come up with a reason it would work for small files and fail for larger ones.

  • If you visit the GET URI directly in your browser, does it give you a more useful message? Or just a normal 404?
  • If you browse to one of the failing files in the web UI, does the "download from LFS" link work? If so, how does that link differ in structure from the one the CLI attempts to download?

Ah, is the auth token (first [shortened]) same for every user? It is in my case

  1. If I visit the URI while being logged in to the app, the browser redirects to the correct file, the download link works
  2. The download URI is the same as the logged URI
  3. Visiting the URI with private window returns normal 404 page
  • Is security.alternate-file-domain configured on your install?
  • When you say "redirect to the correct file", do you mean you get taken to the file detail page /F1234?
  • Also, do you have rP37b93f4 locally (April 7)? (You can grab the hashes in ConfigVersions if you aren't immediately sure and I can figure it out from there.)
  1. It's not configured
  2. Yes
  3. I'm not sure, these are my install versions (stable branch) :

(edit, it has rP37b93f4)

I think I know what's going on, although I'm unsure why the <2MB files would work, unless you happened to push those to a public ("Visible To: Public (No Login Required)") repository and pushed the large files to a private repository?

Let me see if I can reproduce things locally.

I think D15784 should fix the issue. It should land in master later today and be available in stable in about 24 hours. Let me know if you're still seeing issues once you update past that.

A workaround today is likely to configure security.alternate-file-domain, which is a Really Great Idea anyway, but we shouldn't have weird behavior like this that depends on your configuration settings.

Thanks!
Yes, we will consider setting up alternate file domain

Can confirm that on latest stable my lfs-test repository now succeeds with both jars and jpegs. Huzzah!

Great, I'm glad to hear things are working for you. Thanks for your help testing this out!

ktxxt removed a subscriber: ktxxt.
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 4 2016, 9:09 PM

We've been testing the LFS support for use with our game repository. Using a relatively small test repo we've had great results and will be going ahead and trying to use this for our main git repo (a new copy with an empty history that is). So thanks for getting support in! This is a killer feature as far as game dev is concerned.

Not sure if you're aware of this so I'll mention it, but one interesting thing I ran into is that it doesn't work via HTTP at all, we've been using SSH since day 1 because of the scalability limits T4369, but testing HTTP on a new repo with a single lfs-tracked binary file of a couple hundred kilobytes, git push hangs immediately after entering the password and refuses to push anything:

E:\dev\lfs>git push
11:31:12.300794 git.c:350               trace: built-in: git 'push'
11:31:12.301296 run-command.c:336       trace: run_command: 'git-remote-http' 'origin' 'http://brandon@10.21.7.10/diffusion/LFS'
11:31:12.857238 run-command.c:336       trace: run_command: 'bash' '-c' 'cat >/dev/tty && read -r -s line </dev/tty && echo "$line" && echo >/dev/tty'
Password for 'http://brandon@10.21.7.10':
11:31:15.770164 run-command.c:336       trace: run_command: '.git/hooks/pre-push' 'origin' 'http://brandon@10.21.7.10/diffusion/LFS'
11:31:15.859170 git.c:564               trace: exec: 'git-lfs' 'pre-push' 'origin' 'http://brandon@10.21.7.10/diffusion/LFS'
11:31:15.859672 run-command.c:336       trace: run_command: 'git-lfs' 'pre-push' 'origin' 'http://brandon@10.21.7.10/diffusion/LFS'
trace git-lfs: run_command: 'git' version
trace git-lfs: run_command: git rev-list --objects --stdin --
trace git-lfs: run_command: git cat-file --batch-check
trace git-lfs: run_command: git cat-file --batch
trace git-lfs: run_command: 'git' config -l
trace git-lfs: tq: running as batched queue, batch size of 100
trace git-lfs: tq: sending batch of size 1
trace git-lfs: api: batch 1 files
trace git-lfs: HTTP: POST http://brandon@10.21.7.10/diffusion/LFS.git/info/lfs/objects/batch
trace git-lfs: HTTP: 200

The driving install dropped off the grid a while ago, so this is no longer prioritized.

Unknown Object (User) added a subscriber: Unknown Object (User).Nov 16 2016, 10:24 AM

This doesn't add any compatibility with mirroring, does it?

IE have LFS files in repo, mirror it to github.

No, there's no specific support for LFS mirroring yet.

Is this going to be enabled in Phacility soon? We just realised that Phacility doesn't appear to have it on, and this is a major blocker for us moving from our own instance to Phacility:

$ GIT_CURL_VERBOSE=1 GIT_TRACE=1 git push phacility master:master
15:48:27.736074 git.c:350               trace: built-in: git 'push' 'phacility' 'master:master'
15:48:27.742075 run-command.c:336       trace: run_command: 'C:\Program Files (x86)\GitExtensions\PuTTY\plink.exe' 'redpoint@vault.phacility.com' 'git-receive-pack '\''/source/minute-of-mayhem.git'\'''
# Push received by "web.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "repo003.phacility.net"...
# Acquired read lock immediately.
# Device "repo003.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "repo003.phacility.net".
15:48:52.371819 run-command.c:336       trace: run_command: '.git/hooks/pre-push' 'phacility' 'ssh://redpoint@vault.phacility.com/source/minute-of-mayhem.git'
15:48:52.440816 git.c:563               trace: exec: 'git-lfs' 'pre-push' 'phacility' 'ssh://redpoint@vault.phacility.com/source/minute-of-mayhem.git'
15:48:52.441816 run-command.c:336       trace: run_command: 'git-lfs' 'pre-push' 'phacility' 'ssh://redpoint@vault.phacility.com/source/minute-of-mayhem.git'
trace git-lfs: run_command: 'git' version
trace git-lfs: run_command: 'git' config -l
trace git-lfs: tq: running as batched queue, batch size of 100
trace git-lfs: pre-push: refs/heads/master d95ee882970158aa93e35c9dac0ff79e03efeeff refs/heads/master 0000000000000000000000000000000000000000
trace git-lfs: run_command: git rev-list --objects d95ee882970158aa93e35c9dac0ff79e03efeeff --not --remotes=phacility --
trace git-lfs: run_command: git cat-file --batch-check
trace git-lfs: run_command: git cat-file --batch
trace git-lfs: tq: sending batch of size 100
trace git-lfs: ssh: redpoint@vault.phacility.com git-lfs-authenticate source/minute-of-mayhem.git upload
trace git-lfs: ssh: redpoint@vault.phacility.com failed, error: exit status 1, message: Exception: Git LFS is not enabled for this repository.
trace git-lfs: tq: sending batch of size 100
trace git-lfs: ssh: redpoint@vault.phacility.com git-lfs-authenticate source/minute-of-mayhem.git upload
FATAL ERROR: Server unexpectedly closed network connection

We have no plans to move this forward or enable it in the Phacility cluster in the near future.

So our only option is to convert all our LFS objects into normal Git objects, increasing both our download times, and increasing your storage and bandwidth costs? Like our Git LFS objects for the entire repository total around 800MB, and without Git LFS, our build server will have to download that from Phacility every single time it does a build.

This has been working and stable in non-Phacility instances for almost a year, I don't understand the reason to not enable it in Phacility.

Some feedback on this for if it's ever prioritised and picked up again, we've been using this internally for an Unreal Engine 4 based game project's binary assets for about 7 months now, we haven't run into any issues. Works great for us.

A customer is hitting an issue where pushes to staging areas do not push LFS objects. I'd guess this is an LFS upstream problem based on a hazy recollection of events here.

This comment was removed by lazytiger.

See Planning for information on planning and timelines.

If you'd like to discuss Phabricator, please do so in the community forum (see: Discourse), not here.

@epriestley I've pinged you via pm about prioritization.

I think the current state of things is:

  • We have an outstanding support issue (PHI204) about timeouts with large files. This may be https://github.com/git-lfs/git-lfs/issues/2636. I currently can't reproduce this, but I'd like to resolve this before unprototyping.
  • Per above, pushes to staging areas did not push LFS in the past. We should figure out what the state of the world is now and try to resolve it in the upstream, and document anything we can't work around or resolve.
  • Drydock operations should become LFS-aware, or limitations should be documented.
  • I'm uneasy about enabling this in the Phacility production cluster without limits, particularly for the various instance types that pay us nothing or nearly nothing. This implies at least a way to disable LFS through configuration, and maybe some more sophisticated quota/limit system. Or we could get rid of the cheap tiers, accept the cost of the existing instances on the "money fire" pricing plans, and just enable this globally. My inclination is to make some effort to quantify the behavior of the $5 tier and get rid of it unless there's evidence that it's truly converting at >4x the rate of the $20 tier. (But this was launched without any A/B testing and at the same time as other major changes so we'll never really know from the data we have currently.)
  • @Grimeh, above, reports that this doesn't work over HTTP. I believe it does, but we should verify this.
  • https://discourse.phabricator-community.org/t/git-lfs-fails-with-large-images/584 suggests that we may be mishandling large image files in the general case. We should be applying limits similar to those in PhabricatorFileImageTransform to dimension queries, and executing them in a memory-efficient way (e.g., with getimagesize(), not imagecreatefromstring()) if we aren't already. But the particulars of that report are also odd, and this isn't exactly an LFS issue.

These are also concerns, but I think we can deal with them after this leaves prototype:

  • Limited admin/monitoring tools. Most of these can be built on top of existing data later, and aren't necessarily important.
  • Limited debugging tools (e.g., generate a PUT'able URI for an arbitrary upload to debug issues like PHI204) and logging of the actual LFS traffic.
  • No Herald support, but fine to build later if we need it.
  • UI in Diffusion and Differential could be improved, but is serviceable for now.

I don't see this increasing storage usage - just shifting it. Right now because Phacility doesn't support LFS, all those large files are going straight into Git, which not only ends up on an EBS SSD, but also has to be cloned every single time which increases bandwidth usage. At least with LFS those costs are going to be from S3 (which I believe is cheaper than EBS SSD per GB), and bandwidth would be down as new clones no longer need to pull every version of a large file in the Git history.

The reason GitHub can get away with charging additional for LFS is that they also have a hard 1GB size limit on repositories, so you can't just put all your large files directly into the repo on their service. Phacility has no such repository limit though, so if you limit LFS to only paying customers then I think the free tier users will just end up storing them in Git directly which will cost you more.

For the record we already offload LFS to our own server in Australia, which dramatically speeds up upload and download of large files for us. So while we're on the free tier, our LFS usage does not cost Phacility anything as we pay for the cloud storage ourselves.

Our only interest in this support is that if Phacility enables this, then the SSH idle timeout will need to be lifted dramatically to support direct LFS users (because git push holds the SSH connection open while LFS pushes files over HTTPS). This would resolve the issue we have pushing large files on slow connections as SSH would no longer timeout (we have a workaround for now but it's cludgy).

epriestley added a revision: Restricted Differential Revision.Dec 13 2017, 1:45 PM

@Grimeh, above, reports that this doesn't work over HTTP. I believe it does, but we should verify this.

For completeness, I can not reproduce this.

$ git push --force http://hector@local.phacility.com/source/locktopia.git HEAD:lfs
Password for 'http://hector@local.phacility.com': 
Password for 'http://hector@local.phacility.com': 
Git LFS: (1 of 1 files, 2 skipped) 126.94 KB / 126.94 KB, 114.00 KB skipped                                                                                          
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 367 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To http://local.phacility.com/source/locktopia.git
   1d29259..f8738ba  HEAD -> lfs

Per above, pushes to staging areas did not push LFS in the past. We should figure out what the state of the world is now and try to resolve it in the upstream, and document anything we can't work around or resolve.

This still appears to be a problem with modern LFS (git-lfs/2.3.4). D18829 did not push LFS blob data to the staging area.

We have an outstanding support issue (PHI204) about timeouts with large files.

As general context here, this has proven resistant to reproduction and I'm not sure it's a problem with our implementation, versus a configuration/environment problem. We're still working on figuring it out, but I don't think it needs to bock anything here.

Drydock operations should become LFS-aware, or limitations should be documented.

Drydock is still not LFS-aware but I think we can cross this bridge when we come to it.

I'm uneasy about enabling this in the Phacility production cluster

D18826 explicitly documents this as not supported in the production cluster.

https://discourse.phabricator-community.org/t/git-lfs-fails-with-large-images/584 suggests that we may be mishandling large image files in the general case.

D18830 should improve this.

T10604 and T4369 do not impact LFS (broadly, they apply only to POST requests, and LFS uses PUT to upload data).

See T13032 for a summary of upgrade guidance and known issues.