Page MenuHomePhabricator

Detect LFS by looking for tracks in ".gitattributes" instead of using "ls-tree"
ClosedPublic

Authored by epriestley on Apr 29 2020, 11:21 PM.
Tags
None
Referenced Files
F14073782: D21190.diff
Thu, Nov 21, 2:25 AM
Unknown Object (File)
Sat, Nov 16, 5:05 PM
Unknown Object (File)
Thu, Nov 7, 7:28 AM
Unknown Object (File)
Oct 21 2024, 4:16 PM
Unknown Object (File)
Oct 13 2024, 8:54 PM
Unknown Object (File)
Oct 9 2024, 1:23 PM
Unknown Object (File)
Oct 2 2024, 2:05 AM
Unknown Object (File)
Sep 29 2024, 6:39 AM
Subscribers

Details

Summary

See PHI1718. See also https://discourse.phabricator-community.org/t/arc-diff-fails-due-to-git-cmd-fails/3680/.

Currently, arc diff detects Git LFS with git ls-files -z -- ':(attr:filter=lfs)' magic. This is an accurate test, but does not work on older Git.

Try a simpler, dumber test and see if that will work. If this also has issues, we can try this stuff:

  • do version detection;
  • pipe the whole tree to git check-attr;
  • try a command like git lfs ls-files instead, which is probably a wrapper on one of these other commands.
Test Plan
  • In a non-LFS repository, ran "arc diff" and saw the repository detect as non-LFS.
  • In an LFS repository, ran "arc diff" and saw the repository detect as LFS.

Diff Detail

Repository
rARC Arcanist
Branch
land1
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 24280
Build 33446: Run Core Tests
Build 33445: arc lint + arc unit

Event Timeline

This revision was not accepted when it landed; it landed in state Needs Review.Apr 29 2020, 11:21 PM
epriestley requested review of this revision.
This revision was automatically updated to reflect the committed changes.

@ptarjan, just a heads up about this in case this modified test won't work in your environment for some reason.

Thanks for the heads up. Indeed this will break us (if I'm reading this right). .gitattributes is not required to be at the root level and we actually don't have one there, we put it in lots of subdirectories. Can you do a find or something like that? And not every one of our .gitattributes will contain lfs files so I guess you have to cat them all and return true if any have it?.

I think we can do this to get the correct answer across Git versions:

$ git ls-files -z -- | git check-attr --stdin -z [--all | filter] --

I'd guess this is significantly cheaper than find, but I'm slightly concerned that any approach which requires listing an entire tree may not scale terribly well. Even the :(attr:filter=lfs) approach seems like it's asking for trouble in millions-of-files repositories, but perhaps it's not really an issue. It's fast in phabricator/, but the repository only has ~10K files.

Conceivably you could add *.this-is-an-lfs-repository filter=lfs to /.gitattributes (or just put filter=lfs in a comment to trick this test) although that's not great. Does upstream git lfs support writing subdirectory .gitattributes files, or are you writing them manually? I don't immediately see a way to make git lfs track write to a non-root attributes file.

Bleh.

I think we can do this to get the correct answer across Git versions:

This seems to take ~200ms in phabricator/ to do safely/correctly/portably, and is horribly complex:

// Try this more complex test. We're going to run this command:
//
//   $ git ls-files -z -- | git check-attr --stdin -z filter --
//
// ...and then stop if we find a file using "filter=lfs". This is
// complicated to do in a way that checks errors properly and works
// on Windows.

$files_future = $this->buildLocalFuture(
  array(
    'ls-files -z --',
  ));

$attr_future = $this->buildLocalFuture(
  array(
    'check-attr --stdin -z filter --',
  ));

$files_future->setResolveOnError(false);
$attr_future->setResolveOnError(false);

// Keep the stdin pipe open.
$attr_future->write('', true);

$futures = array(
  $files_future,
  $attr_future,
);

$output = '';

$iterator = id(new FutureIterator($futures))
  ->setUpdateInterval(0.01);

// The output format is "<path> \0 <attribute> \0 <value> \0". We are
// looking for any file with "lfs" as the value, so we only care about
// every third element.

$idx = 0;

foreach ($iterator as $future) {
  // Pipe any output prouduced by "ls-files" to "check-attr".
  if ($files_future) {
    $stdout = $files_future->readStdout();
    if (strlen($stdout)) {
      $attr_future->write($stdout, true);
    }
  }

  if ($attr_future) {
    $stdout = $attr_future->readStdout();
    $output .= $stdout;

    while (true) {
      $pos = strpos($output, "\0");

      if ($pos === false) {
        break;
      }

      // echo sprintf(
      //   "READ: <%s>\n",
      //   substr($output, 0, $pos + 1));

      if ($idx === 2) {
        if (!strncmp($output, "lfs\0", 4)) {
          return true;
        }
      }

      $output = substr($output, $pos + 1);
      $idx = ($idx + 1) % 3;

      $pos = strpos($output, "\0");
    }
  }

  if ($future === $attr_future) {
    // If the "check-attr" subprocess exits, there are no LFS files in
    // the repository and we're done.
    return false;
  }

  if ($future === $files_future) {
    // Close stdin on the "check-attr" subprocess.
    $attr_future->write('', false);
    $files_future = null;
  }
}

So I'm likely to do this:

  • Try the original test.
  • Fall back to the .gitattributes test.
  • Everyone else can choose between upgrading git or adding a dummy filter=lfs comment to .gitattributes.

D21192 restores the old (accurate) test with fallback-on-error to "assume no LFS", since the install in PHI1718 doesn't currently have a need for this feature, so disabling it at least unbreaks everyone.

I expect arc to have more caching and configuration primitives fairly soon, so there may be other ways out of this, by using configuration to compensate for accuracy problems and/or caching to compensate for performance problems.