Page MenuHomePhabricator

Add an `.editorconfig` parser
ClosedPublic

Authored by joshuaspence on Jun 23 2014, 2:23 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Nov 28, 4:19 PM
Unknown Object (File)
Thu, Nov 24, 2:40 AM
Unknown Object (File)
Thu, Nov 24, 2:39 AM
Unknown Object (File)
Thu, Nov 24, 2:39 AM
Unknown Object (File)
Thu, Nov 24, 2:39 AM
Unknown Object (File)
Thu, Nov 24, 2:39 AM
Unknown Object (File)
Thu, Nov 24, 2:39 AM
Unknown Object (File)
Thu, Nov 24, 2:39 AM
Tokens
"Love" token, awarded by johnny-bit.

Details

Summary

Ref T5105. Add a PhutilEditorConfig class for parsing EditorConfig files. Depends on D11443 and D11444.

Test Plan

Wrote some unit tests.

Diff Detail

Repository
rPHU libphutil
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

joshuaspence retitled this revision from to Add an `.editorconfig` parser..
joshuaspence updated this object.
joshuaspence edited the test plan for this revision. (Show Details)
joshuaspence added a reviewer: epriestley.

This isn't quite ready to be landed, but some feedback would be appreciated.

The idea is that PhutilEditorConfig provides a simple API around a hierarchy of .editorconfig files. An example of the intended use of this class is as follows:

$editorconfig = PhutilEditorConfig($root);
$editorconfig->getConfig($path, 'trim_trailing_whitespace');
src/parser/PhutilEditorConfig.php
41–47

I did this a while ago so I'm not 100% sure where it comes from but I am pretty sure this logic was copied from editorconfig-core-js.

52–53

I'd also like to check here that the keys are valid, although this could get messy.

src/utils/utils.php
1075–1083 ↗(On Diff #23226)

Ugly conditional to avoid writing a TempFile unnecessarily.

1117 ↗(On Diff #23226)

This is messy, but I think ultimately we need something like this in order to handle the .editorconfig format properly. EditorConfig uses globs (see fnmatch.js) whereas PHP seems to work better with regular expressions.

It might be possible to use fnmatch instead, which is probably cleaner.

  • Is src/parser/__tests__/editorconfig/.editorconfig empty? I think I have a bug in character set detection (unable to detect that an empty file is UTF8?).
  • .ini parsing and glob-to-regex seem like reasonable approaches to me.
  • The intended API seems reasonable/desirable/correct.

One big-picture issue outside of this diff is that it would be nice to be able to do this for repositories in Diffusion, too, so we could (for example) show the correct tab width when rendering files in the web UI. I suspect there's little we can do in this diff to prepare for that, though.

src/parser/PhutilEditorConfig.php
20–23

One vague concern with this is that issuing find against a large working copy can be slow. Particularly, Facebook has very large working copies (many GB) with huge numbers of files, and has spent a great deal of engineering effort attempting to mitigate the costs of this.

One cheap way to avoid paying this cost (at the cost of accuracy) would be to check for $root/.editorconfig before searching, under the theory that any project which uses these files will have a default at the root. I'm not sure if this is a reasonable assumption or not.

We could also fill the cache lazily (maybe falling back to a full fill at, say, 2048 entries, under the assumption that if we're doing anything with more than 2048 paths we're probably doing something with the whole repository) but this is significantly complex.

We could also let installs enumerate the locations of their .editorconfig files elsewhere in configuration, I guess.

(I don't think any of this is worth pursuing for now.)

Once this works, though, it would be nice to capture tab width information when sending source to Phabricator, which implies we might start running it all the time.

52–53

Yeah, definitely feels messy.

81–82

This is O(N^2) if you run, e.g., arc lint --everything, I think, since we have to iterate over each path in the repository for each path we're examining.

It's probably worth the complexity cost to build a tree instead, by splitting the paths apart into components:

src
  applications
    differential => array(...)
    diffusion => array(...)

Then split the path being looked up apart, and walk down the tree.

src/utils/utils.php
1086 ↗(On Diff #23226)

(Would be nice to have the path.)

1167 ↗(On Diff #23226)

This should probably use \z (end of input) not $ (end of input, or before-newline-at-end-of-input).

That is, /^abc$/ matches both "abc" and "abc\n".

>>> orbital ~/devtools/arcanist $ php -r 'var_dump(preg_match("/^abc$/", "abc\n"));'
int(1)
>>> orbital ~/devtools/arcanist $ php -r 'var_dump(preg_match("/^abc\z/", "abc\n"));'
int(0)
src/parser/PhutilEditorConfig.php
20–23

Yeah, I don't like the idea of only looking for $root/.editorconfig.

52–53

I think that (possibly) a better approach would be to extract values that we understand from the .editorconfig and discard anything else. This means that we can safely support a subset of EditorConfig.

81–82

Yeah, this seems better.

src/utils/utils.php
1086 ↗(On Diff #23226)

Agreed.... I guess this function is probably overkill. I can't think of a case in which we would be parsing an INI string over an INI file (and parse_ini_file is available in all PHP versions that we support whereas parse_ini_string is only available from PHP 5.3).

Maybe I'll just change this to phutil_parse_ini_file which can behave similarly to parse_ini_file but with better exceptions?

1167 ↗(On Diff #23226)

You're probably right... I guess the assumption was that a glob wouldn't contain newlines.

In D9678#9, @epriestley wrote:
  • Is src/parser/__tests__/editorconfig/.editorconfig empty? I think I have a bug in character set detection (unable to detect that an empty file is UTF8?).

Yeah, these files are empty.

src/utils/utils.php
1086 ↗(On Diff #23226)

Oh, right. I think making it file oriented, or adding a $what_are_we_parsing parameter, or throwing some parser-specific exception are all reasonable. Or leave it as-is for now. Although I don't have any specific designs for it, it is nice to have a string-oriented API.

joshuaspence retitled this revision from Add an `.editorconfig` parser. to Add an `.editorconfig` parser.Oct 12 2014, 10:04 PM
joshuaspence edited edge metadata.

Remove cacheing stuff. From some brief testing, it didn't seem to make much difference.

epriestley edited edge metadata.

One edge-case inline.

src/parser/PhutilEditorConfig.php
85–93

I'd expect this not to work -- I think switch in PHP just does ==, so the case 0: is unreachable.

$ cat test.php 
<?php

foreach (array(0, false) as $test) {
  switch ($test) {
   case 0: echo "Zero.\n"; break;
   case false: echo "False.\n"; break;
  }
}
$ php -f test.php 
Zero.
Zero.
This revision is now accepted and ready to land.Jan 22 2015, 9:08 PM
joshuaspence edited edge metadata.
  • Rebase
  • Use if instead of switch

A couple of thoughts that I am considering:

  • Should this class be named PhutilEditorConfig or PhutilEditorConfigParser? It feels a little bit more than a parser since it has some sort of state-based logic as well.
  • How should unknown properties be handled? For example, if charset is misspelled as charste should an exception be thrown or should it just be silently ignored? If this were truly a parser then I feel that the correct approach would be to throw an exception. In our use case, however, I feel that "silently ignore" is a better option.
  • How should illegal values be handled? As above, but for illegal values. For example, what to do if end_of_line = derp is specified. I think that in this case, throwing an exception is possibly more appropriate.

More tests, general tidying and minor refactoring

joshuaspence edited edge metadata.

Just wanted another look at this as I refactored a bit. I'm happy with it now.

epriestley edited edge metadata.
This revision is now accepted and ready to land.May 17 2015, 2:41 PM
This revision was automatically updated to reflect the committed changes.