Changeset View
Standalone View
src/parser/PhutilEditorConfig.php
- This file was added.
<?php | |||||
/** | |||||
* Parser for [[http://editorconfig.org/ | EditorConfig]] files. | |||||
*/ | |||||
final class PhutilEditorConfig { | |||||
private static $knownProperties = array( | |||||
'charset' => array('latin1', 'utf-8', 'utf-8-bom', 'utf-16be', 'utf-16le'), | |||||
'end_of_line' => array('lf', 'cr', 'crlf'), | |||||
'indent_size' => 'int|string', // integer or "tab" | |||||
'indent_style' => array('space', 'tab'), | |||||
'insert_final_newline' => 'bool', | |||||
'max_line_length' => 'int', | |||||
'tab_width' => 'int', | |||||
'trim_trailing_whitespace' => 'bool', | |||||
); | |||||
private $root; | |||||
/** | |||||
* Constructor. | |||||
* | |||||
epriestley: One vague concern with this is that issuing `find` against a large working copy can be slow. | |||||
Not Done Inline ActionsYeah, I don't like the idea of only looking for $root/.editorconfig. joshuaspence: Yeah, I don't like the idea of only looking for `$root/.editorconfig`. | |||||
* @param string The root directory. | |||||
*/ | |||||
public function __construct($root) { | |||||
$this->root = $root; | |||||
} | |||||
/** | |||||
* Get the specified EditorConfig property for the specified path. | |||||
* | |||||
* @param string | |||||
* @param string | |||||
* @return wild | |||||
*/ | |||||
public function getProperty($path, $key) { | |||||
if (idx(self::$knownProperties, $key) === null) { | |||||
throw new InvalidArgumentException(pht('Invalid EditorConfig property.')); | |||||
} | |||||
$props = $this->getProperties($path); | |||||
switch ($key) { | |||||
case 'indent_size': | |||||
if (idx($props, 'indent_size') === null && | |||||
idx($props, 'indent_style') === 'tab') { | |||||
Not Done Inline ActionsI did this a while ago so I'm not 100% sure where it comes from but I am pretty sure this logic was copied from editorconfig-core-js. joshuaspence: I did this a while ago so I'm not 100% sure where it comes from but I am pretty sure this logic… | |||||
return 'tab'; | |||||
} else if (idx($props, 'indent_size') == 'tab' && | |||||
idx($props, 'tab_size') !== null) { | |||||
return idx($props, 'tab_size'); | |||||
} | |||||
break; | |||||
Not Done Inline ActionsI'd also like to check here that the keys are valid, although this could get messy. joshuaspence: I'd also like to check here that the keys are valid, although this could get messy. | |||||
Not Done Inline ActionsYeah, definitely feels messy. epriestley: Yeah, definitely feels messy. | |||||
Not Done Inline ActionsI think that (possibly) a better approach would be to extract values that we understand from the .editorconfig and discard anything else. This means that we can safely support a subset of EditorConfig. joshuaspence: I think that (possibly) a better approach would be to extract values that we understand from… | |||||
case 'tab_width': | |||||
if (idx($props, 'indent_size') !== null && | |||||
idx($props, 'tab_width') === null && | |||||
idx($props, 'indent_size') !== 'tab') { | |||||
return idx($props, 'indent_size'); | |||||
} | |||||
break; | |||||
} | |||||
return idx($props, $key); | |||||
} | |||||
/** | |||||
* Get the EditorConfig properties for the specified path. | |||||
* | |||||
* @param string | |||||
* @return map<string, wild> | |||||
*/ | |||||
public function getProperties($path) { | |||||
$configs = $this->getEditorConfigs($path); | |||||
$matches = array(); | |||||
foreach ($configs as $config_pair) { | |||||
list($path_prefix, $config) = $config_pair; | |||||
foreach ($config as $glob => $options) { | |||||
if (!$glob) { | |||||
continue; | |||||
Not Done Inline ActionsThis is O(N^2) if you run, e.g., arc lint --everything, I think, since we have to iterate over each path in the repository for each path we're examining. It's probably worth the complexity cost to build a tree instead, by splitting the paths apart into components: src applications differential => array(...) diffusion => array(...) Then split the path being looked up apart, and walk down the tree. epriestley: This is `O(N^2)` if you run, e.g., `arc lint --everything`, I think, since we have to iterate… | |||||
Not Done Inline ActionsYeah, this seems better. joshuaspence: Yeah, this seems better. | |||||
} | |||||
if (strpos($glob, '/') === false) { | |||||
$glob = '**/'.$glob; | |||||
} else if (strncmp($glob, '/', 0)) { | |||||
$glob = substr($glob, 1); | |||||
} | |||||
$glob = $path_prefix.'/'.$glob; | |||||
if (!phutil_fnmatch($glob, $path)) { | |||||
continue; | |||||
Not Done Inline ActionsI'd expect this not to work -- I think switch in PHP just does ==, so the case 0: is unreachable. $ cat test.php <?php foreach (array(0, false) as $test) { switch ($test) { case 0: echo "Zero.\n"; break; case false: echo "False.\n"; break; } } $ php -f test.php Zero. Zero. epriestley: I'd expect this not to work -- I think `switch` in PHP just does `==`, so the `case 0:` is… | |||||
} | |||||
foreach ($options as $option => $value) { | |||||
$option = strtolower($option); | |||||
if (idx(self::$knownProperties, $option) === null) { | |||||
continue; | |||||
} | |||||
// TODO: Maybe we should validate the data type of `$value` here. | |||||
if (is_string($value)) { | |||||
$value = strtolower($value); | |||||
} | |||||
$matches[$option] = $value; | |||||
} | |||||
} | |||||
} | |||||
return $matches; | |||||
} | |||||
/** | |||||
* Returns the EditorConfig files which affect the specified path. | |||||
* | |||||
* Find and parse all `.editorconfig` files between the specified path and | |||||
* the root directory. The results are returned in the same order that they | |||||
* should be matched. | |||||
* | |||||
* return list<pair<string, map>> | |||||
*/ | |||||
private function getEditorConfigs($path) { | |||||
$configs = array(); | |||||
$found_root = false; | |||||
$root = $this->root; | |||||
do { | |||||
$path = dirname($path); | |||||
$file = $path.'/.editorconfig'; | |||||
if (!Filesystem::pathExists($file)) { | |||||
continue; | |||||
} | |||||
$contents = Filesystem::readFile($file); | |||||
$config = phutil_ini_decode($contents); | |||||
if (idx($config, 'root')) { | |||||
$found_root = true; | |||||
} | |||||
unset($config['root']); | |||||
array_unshift($configs, array($path, $config)); | |||||
if ($found_root) { | |||||
break; | |||||
} | |||||
} while ($path != $root && Filesystem::isDescendant($path, $root)); | |||||
return $configs; | |||||
} | |||||
} |
One vague concern with this is that issuing find against a large working copy can be slow. Particularly, Facebook has very large working copies (many GB) with huge numbers of files, and has spent a great deal of engineering effort attempting to mitigate the costs of this.
One cheap way to avoid paying this cost (at the cost of accuracy) would be to check for $root/.editorconfig before searching, under the theory that any project which uses these files will have a default at the root. I'm not sure if this is a reasonable assumption or not.
We could also fill the cache lazily (maybe falling back to a full fill at, say, 2048 entries, under the assumption that if we're doing anything with more than 2048 paths we're probably doing something with the whole repository) but this is significantly complex.
We could also let installs enumerate the locations of their .editorconfig files elsewhere in configuration, I guess.
(I don't think any of this is worth pursuing for now.)
Once this works, though, it would be nice to capture tab width information when sending source to Phabricator, which implies we might start running it all the time.