Ref T5105. Add a PhutilEditorConfig class for parsing EditorConfig files. Depends on D11443 and D11444.
Details
- Reviewers
epriestley - Group Reviewers
Blessed Reviewers - Maniphest Tasks
- T5105: Ensure that `ArcanistTextLinter` respects `.editorconfig` settings
- Commits
- rPHU797a903e2a42: Add an `.editorconfig` parser
Wrote some unit tests.
Diff Detail
- Repository
- rPHU libphutil
- Branch
- master
- Lint
Lint Passed Severity Location Code Message Advice src/parser/PhutilEditorConfig.php:103 XHP16 TODO Comment - Unit
Tests Passed - Build Status
Buildable 4136 Build 4149: [Placeholder Plan] Wait for 30 Seconds
Event Timeline
This isn't quite ready to be landed, but some feedback would be appreciated.
The idea is that PhutilEditorConfig provides a simple API around a hierarchy of .editorconfig files. An example of the intended use of this class is as follows:
$editorconfig = PhutilEditorConfig($root); $editorconfig->getConfig($path, 'trim_trailing_whitespace');
src/parser/PhutilEditorConfig.php | ||
---|---|---|
41–47 | I did this a while ago so I'm not 100% sure where it comes from but I am pretty sure this logic was copied from editorconfig-core-js. | |
52–53 | I'd also like to check here that the keys are valid, although this could get messy. | |
src/utils/utils.php | ||
1075–1083 ↗ | (On Diff #23226) | Ugly conditional to avoid writing a TempFile unnecessarily. |
1117 ↗ | (On Diff #23226) | This is messy, but I think ultimately we need something like this in order to handle the .editorconfig format properly. EditorConfig uses globs (see fnmatch.js) whereas PHP seems to work better with regular expressions. It might be possible to use fnmatch instead, which is probably cleaner. |
- Is src/parser/__tests__/editorconfig/.editorconfig empty? I think I have a bug in character set detection (unable to detect that an empty file is UTF8?).
- .ini parsing and glob-to-regex seem like reasonable approaches to me.
- The intended API seems reasonable/desirable/correct.
One big-picture issue outside of this diff is that it would be nice to be able to do this for repositories in Diffusion, too, so we could (for example) show the correct tab width when rendering files in the web UI. I suspect there's little we can do in this diff to prepare for that, though.
src/parser/PhutilEditorConfig.php | ||
---|---|---|
20–23 | One vague concern with this is that issuing find against a large working copy can be slow. Particularly, Facebook has very large working copies (many GB) with huge numbers of files, and has spent a great deal of engineering effort attempting to mitigate the costs of this. One cheap way to avoid paying this cost (at the cost of accuracy) would be to check for $root/.editorconfig before searching, under the theory that any project which uses these files will have a default at the root. I'm not sure if this is a reasonable assumption or not. We could also fill the cache lazily (maybe falling back to a full fill at, say, 2048 entries, under the assumption that if we're doing anything with more than 2048 paths we're probably doing something with the whole repository) but this is significantly complex. We could also let installs enumerate the locations of their .editorconfig files elsewhere in configuration, I guess. (I don't think any of this is worth pursuing for now.) Once this works, though, it would be nice to capture tab width information when sending source to Phabricator, which implies we might start running it all the time. | |
52–53 | Yeah, definitely feels messy. | |
81–82 | This is O(N^2) if you run, e.g., arc lint --everything, I think, since we have to iterate over each path in the repository for each path we're examining. It's probably worth the complexity cost to build a tree instead, by splitting the paths apart into components: src applications differential => array(...) diffusion => array(...) Then split the path being looked up apart, and walk down the tree. | |
src/utils/utils.php | ||
1086 ↗ | (On Diff #23226) | (Would be nice to have the path.) |
1167 ↗ | (On Diff #23226) | This should probably use \z (end of input) not $ (end of input, or before-newline-at-end-of-input). That is, /^abc$/ matches both "abc" and "abc\n". >>> orbital ~/devtools/arcanist $ php -r 'var_dump(preg_match("/^abc$/", "abc\n"));' int(1) >>> orbital ~/devtools/arcanist $ php -r 'var_dump(preg_match("/^abc\z/", "abc\n"));' int(0) |
src/parser/PhutilEditorConfig.php | ||
---|---|---|
20–23 | Yeah, I don't like the idea of only looking for $root/.editorconfig. | |
52–53 | I think that (possibly) a better approach would be to extract values that we understand from the .editorconfig and discard anything else. This means that we can safely support a subset of EditorConfig. | |
81–82 | Yeah, this seems better. | |
src/utils/utils.php | ||
1086 ↗ | (On Diff #23226) | Agreed.... I guess this function is probably overkill. I can't think of a case in which we would be parsing an INI string over an INI file (and parse_ini_file is available in all PHP versions that we support whereas parse_ini_string is only available from PHP 5.3). Maybe I'll just change this to phutil_parse_ini_file which can behave similarly to parse_ini_file but with better exceptions? |
1167 ↗ | (On Diff #23226) | You're probably right... I guess the assumption was that a glob wouldn't contain newlines. |
src/utils/utils.php | ||
---|---|---|
1086 ↗ | (On Diff #23226) | Oh, right. I think making it file oriented, or adding a $what_are_we_parsing parameter, or throwing some parser-specific exception are all reasonable. Or leave it as-is for now. Although I don't have any specific designs for it, it is nice to have a string-oriented API. |
Remove cacheing stuff. From some brief testing, it didn't seem to make much difference.
One edge-case inline.
src/parser/PhutilEditorConfig.php | ||
---|---|---|
85–93 | I'd expect this not to work -- I think switch in PHP just does ==, so the case 0: is unreachable. $ cat test.php <?php foreach (array(0, false) as $test) { switch ($test) { case 0: echo "Zero.\n"; break; case false: echo "False.\n"; break; } } $ php -f test.php Zero. Zero. |
A couple of thoughts that I am considering:
- Should this class be named PhutilEditorConfig or PhutilEditorConfigParser? It feels a little bit more than a parser since it has some sort of state-based logic as well.
- How should unknown properties be handled? For example, if charset is misspelled as charste should an exception be thrown or should it just be silently ignored? If this were truly a parser then I feel that the correct approach would be to throw an exception. In our use case, however, I feel that "silently ignore" is a better option.
- How should illegal values be handled? As above, but for illegal values. For example, what to do if end_of_line = derp is specified. I think that in this case, throwing an exception is possibly more appropriate.
Use "property" instead of "config" for consistency with http://editorconfig.org/#file-format-details