Page MenuHomePhabricator

Add a `phutil_fnmatch` function
ClosedPublic

Authored by joshuaspence on Jan 20 2015, 11:15 AM.
Tags
None
Referenced Files
F12825767: D11444.id27594.diff
Thu, Mar 28, 9:07 AM
F12821815: D11444.id27500.diff
Thu, Mar 28, 6:56 AM
F12821813: D11444.id27596.diff
Thu, Mar 28, 6:56 AM
F12821811: D11444.id27513.diff
Thu, Mar 28, 6:56 AM
F12821810: D11444.id27595.diff
Thu, Mar 28, 6:56 AM
F12821803: D11444.id27597.diff
Thu, Mar 28, 6:55 AM
Unknown Object (File)
Mon, Mar 25, 11:34 AM
Unknown Object (File)
Sun, Mar 24, 10:35 AM
Subscribers

Details

Summary

Ref T5105. This function is an improved version of fnmatch. Internally, this function converts a glob pattern into a regular expression and is based on Symfony\Component\Finder\Glob, with influence from editorconfig-core-js.

The PHP fnmatch() is inadequate for our use case for a variety of reasons:

  • Lack of error handling (fnmatch('{', '') does not throw an exception).
  • Lack of support for {} groupings (fnmatch('{a,b}', 'a') returns false).
  • **/* only matches in subdirectories (fnmatch('**/file', $x) matches $x = 'subdir/file' but does not match $x = 'file').
NOTE: At the moment, phutil_fnmatch() doesn't support $flags, whereas fnmatch does. In the future, we may wish to implement these flags. Currently, $flags is basiically equivalent to 0.
Test Plan

Added unit tests.

Diff Detail

Repository
rPHU libphutil
Branch
master
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 3995
Build 4008: [Placeholder Plan] Wait for 30 Seconds

Event Timeline

joshuaspence retitled this revision from to Add a `glob_to_regex` function.
joshuaspence updated this object.
joshuaspence edited the test plan for this revision. (Show Details)
joshuaspence added a reviewer: epriestley.
joshuaspence retitled this revision from Add a `glob_to_regex` function to Add a `phutil_glob_to_regex` function.Jan 20 2015, 8:23 PM
joshuaspence updated this object.
joshuaspence updated this object.
  • Allow ** to matches paths not in a subdirectory
  • More tests

Alternatively, we could create a phutil_fnmatch function and hide the whole glob -> regex conversion.

src/utils/utils.php
1133–1134

This was based on editorconfig-core-js, although their implementation is different. To me, this seems more correct.

epriestley edited edge metadata.

I think I found at least some stuff. I was just reasoning through things so I might be wrong about some/most/all of it.

src/utils/__tests__/PhutilUtilsTestCase.php
619

Although technically "less pure", I think it would be helpful for these tests to show the expected regexp in addition to having pass/fail test patterns. That would, e.g., either have caught the ** issue or made it easy for me to convince myself I was mistaken.

src/utils/utils.php
1133–1134

Their implementation seems very odd, I don't understand it either.

(This approach arguably gets \**/log.txt wrong, I guess? Probably not worth fixing.)

1143

Maybe it would be cleaner to structure this like:

if ($escaping) {
  $escape = $long_list; // Full list with everything that gets escaped.
} else {
  $escape = $short_list; // Just the stuff that's always escaped.
}

if (in_array($char, $escape)) { ... }

Then all of the subclauses wouldn't need additional conditions.

This would apply unnecessary escaping to commas, but that seems fine.

1144

I would expect [ and ] to also get escaped.

The delimiter, # must be escaped too (or you can use ( and ) as delimiters without additional escaping).

1150–1154

Won't this convert ** to .*[^/]*? (That is, process both stars?) While I don't think it produces the wrong behavior, I'd expect an $i++ after the .* case.

1184

There's no check for $in_curlies being 0 here, so I'd expect this to produce an invalid regular expression for an input like {. It would be cleaner to throw an error here.

1184

There's no check for $escaping here, so I'd expect this to produce the wrong result for an input with a terminal backslash, like asdf\.

This revision now requires changes to proceed.Jan 22 2015, 8:50 PM
src/utils/utils.php
1144

Oh! Does [a-b] have the same behavior in globs as it does in regexps? Ignore me, then.

Yeah, so this function was basically lifted out of Symfony... I didn't try to understand it too much. Let me play around a bit.

src/utils/__tests__/PhutilUtilsTestCase.php
619

Are you suggesting that we make assertions on the actual regex string? Or just output it in the failure message?

src/utils/utils.php
1133–1134

I suppose we could do preg_replace('@((?<!\\)\*){2}@', '{,*/,**/}', $glob) instead?

1143

Seems legit

src/utils/__tests__/PhutilUtilsTestCase.php
619

I'd make assertions, e.g. have the "expect" data be:

array(
  '/expected regexp/',
  array('matching', 'cases'),
  array('nonmatching', 'cases'),
)

This is less pure and I definitely don't feel strongly about it, but I think it would make the behavior of the function easier to understand by looking at the tests.

Generally, I just like having as much visibility into the behavior as I can reasonably get for functions like this, and it seems straightforward and reasonable to test the regexps themselves, even though they're technically an implementation detail and can theoretically vary without making the function implementation incorrect.

src/utils/__tests__/PhutilUtilsTestCase.php
619

Yuck! But okay, I can see where you are coming from.

What are your thoughts on changing this function t be phutil_fnmatch()?

I like that it returns the regexp (but my preference for that is because I like writing those yucky raw-regexp sorts of tests). So my mild preference would be to leave this as is, and implement phutil_fnmatch() in terms of it (just call this, then apply the regexp)? But I really don't feel strongly about this and am happy with just an fnmatch() version if you don't want to test the regexps.

joshuaspence edited edge metadata.

Change to phutil_fnmatch

joshuaspence retitled this revision from Add a `phutil_glob_to_regex` function to Add a `phutil_fnmatch` function.Jan 22 2015, 9:24 PM
joshuaspence edited edge metadata.

Use preg_replace instead of str_replace

Attempting to consolidate some of this madness

Use (...) instead of #...# for regex

epriestley edited edge metadata.
epriestley added inline comments.
src/utils/utils.php
1164

Consider the explicit {$char} form for codebase consistency.

This revision is now accepted and ready to land.Jan 28 2015, 7:27 PM
joshuaspence edited edge metadata.

Change $char to {$char} in string

This revision was automatically updated to reflect the committed changes.