Page MenuHomePhabricator

Arcanist filename linter's regex should be configurable
Closed, WontfixPublic

Description

There is a linter for file names. This is good.

It tests 'Name files using only letters, numbers, period, hyphen and underscore.' using a regex of ^[a-zA-Z0-9./\\\\_-]+$.

This requirement is too strict for certain development environments. It would be useful if we could set a custom regex, via configuration options, to test filenames against.

Example:
In iOS / OS X development as we use:

  • OurFancyLogo@2x.png // Double resolution version for high DPI devices. See also @3x triple resolution files.
  • JetpackInterface~iPad.xib // iPad specific UI. See also ~iPhone.

Note that the @ and the ~ above are not rare exceptions, they are standard in iOS / OS X development.

I have a small patch prepared that adds the described functionality. Please let me know if I can send it in.

Event Timeline

iangordon updated the task description. (Show Details)
iangordon added a project: Arcanist.
iangordon added a subscriber: iangordon.
joshuaspence renamed this task from Arcanist filename linter's regex should be configurable. to Arcanist filename linter's regex should be configurable.Dec 1 2015, 7:20 AM
joshuaspence updated the task description. (Show Details)

To configure the regexp in an .arclint option for the linter seems valuable, with a clear use case.

I'm concerned that it's difficult to give users a descriptive error message if the pattern is an arbitrary regular expression. What does your patch do when a filename fails a custom regular expression? Specifically, how does it help the user fix their filename?

You also probably have the implicit rules "@ is allowed, but only if followed by a numeral and 'x' in a resource file", and "~ is allowed, but only if followed by a device identifier on certain files". Maybe you don't care about these too much, but in the general case they might be important to enforce. While you could encode these in a regexp, the best error message you could show to the user would be something like:

File names must match regular expression: /^[a-zA-Z0-9./\\\\_-]+((@[1-9]\d*x\.png)|(~(iPad|iPhone)\.xib))?$

This is totally incomprehensible and not helpful in resolving the error.

I think a better pathway forward here is to make third-party linters much easier to write and distrubte, broadly via changes connected to T5055. Then the builtin linter would serve as a template for writing your own linter with several rules, which could emit multiple, tailored error messages:

The "~" character should only be used in filenames to distinguish device resource files. It is only valid on ".xib" files, and should be followed by "iPhone" or "iPad", like "example~iPhone.xib".

Overall, this should be one of the very simplest linters to write. It isn't particularly easy to write today, but I think the root problem here is that writing linters is hard, not that the builtin linters aren't powerful enough.

I wouldn't expect users to attach filetype specific filtering via the regex. The include portion of the linter config seems like a much better place to do such things.

epriestly is absolutely correct about the error message. However, in our case it ends up being the ever-so-slightly more comprehensible:
File names must match the following regular expression ^[a-zA-z0-9./\\\\_-@~]$
The existing error message is maintained if you do not explicitly configure your own custom regex.

I considered making the error message configurable as well, but that would open up the possibility of the error message and regex getting out of sync.

I think the vast majority of the filename linter users want filenames that only use ascii upper and lowercase alphabet, numbers and common symbols. This allows users to expand common symbols.

T5055 is a great goal, but well beyond the scope of this change.

Attached is the patch. It changes less than 50 lines.
CLA has been signed.

epriestley claimed this task.

I don't think telling users that their filenames must match a complicated regexp and expecting them to figure it out is a reasonable behavior for a linter to have. I think I'm fairly comfortable with regular expressions and not confident I could figure out what the message meant quickly, and I know at least some engineers just don't have regexp literacy as part of their toolset.

See also T9710, where a user had difficulty with the current message because it doesn't highlight the specific character causing a problem. Although I'd like to see more users running into issues with this before specifically changing the upstream behavior, we can't start highlighting characters if we accept an arbitrary regexp. (We could test each prefix of the filename until we found the first character which created a problem, but since the an arbitrary regexp may include end-of-filename tests like my pattern above, this may produce results which are very misleading.)

In theory, we could let you provide multiple different patterns with messages for each of them and some kind of tailoring flag to tell us about whether the regexp can safely match prefixes, but I broadly want to get out of the business of writing increasingly complicated JSON DSLs for configuring linters.

If we had T5055 I wouldn't accept this patch, so I don't want to accept it now and be left with it after we do have T5055.

See Contributing Code for some general discussion of why we are primarily concerned about the maintenance cost of the patch in the long term. Quick, small patches are often less likely to make it upstream because they don't account for the long-term plans. This patch would lock us into bad error messages and limit our ability to change the behavior of the linter, as well as increasing the complexity of testing and maintaining the linter.

The patch itself is also technically wrong because it does not use pht() in a way that supports extraction, so it can't be pulled out for translation. See:

https://secure.phabricator.com/book/phabcontrib/article/internationalization/

(If you submit with arc diff, the project's linters can catch this error, among others, and warn you about it.)

(I think you also have made two small errors in the expression above, in case you're trying to use it elsewhere -- z should be Z, and _-@, where - is not the last character in the character class, is an invalid character range of all characters between _ and @, not the literal characters _, - and @ as intended. The original regexp puts - in the last character so it means "hyphen, literally" instead of "character range".)

For what it's worth, I think that this is a reasonable request. I think that the only way to implement this would be to use a custom message which would be specified in the .arclint file. For example:

.arclint
{
  "linters": {
    "file": {
      "type": "file",
      "file.regex": "(^[^\w]+$)",
      "file.message": "FIlename must consist only of punctuation characters, alphanumeric characters are forbidden."
    }
  }
}

For what it's worth, I think that this is a reasonable request. I think that the only way to implement this would be to use a custom message which would be specified in the .arclint file. For example:

.arclint
{
  "linters": {
    "file": {
      "type": "file",
      "file.regex": "(^[^\w]+$)",
      "file.message": "FIlename must consist only of punctuation characters, alphanumeric characters are forbidden."
    }
  }
}

Hmmm... Now, for some more magick:

.arclint
{
  "linters": {
    "file-normal": {
     "type": "file",
     "exclude" : [
       "(\\.xib$)",
       "(x\\.png$)"
      ] 
      "file.regex": "(^[^\w]+$)",
      "file.message": "FIlename must consist only of punctuation characters, alphanumeric characters are forbidden."
    },
    "file-xib": {
     "type": "file",
     "include" : [
       "(\\.xib$)"
      ] 
      "file.regex": "(^[^\w]+(~(iPad|iPhone))?\.xib$)",
      "file.message": "FIlename for xib files must consist only of punctuation characters, alphanumeric characters are forbidden. '~' character can only be used if followed by device name (iPad or iPhone)"
    },
    "file-resourcepng": {
     "type": "file",
     "include" : [
       "(x\\.png$)"
      ] 
      "file.regex": "(^[^\w]+(@[1-9]\d*x)?\.png$)",
      "file.message": "FIlename for resource files must consist only of punctuation characters, alphanumeric characters are forbidden, unless prefixed by '@' and filename ends with 'x'."
    }
  }
}

This way this request seems VERY reasonable, albeit making configuration extremely verbose ๐Ÿ˜‰

Seriously - with this way of configuring filename linter, we could enforce file naming schemes! This would be awesome ๐Ÿ‘ What do You think @epriestley ?

I just think this isn't very good:

"file.regex": "/^[a-zA-Z0-9./\\\\_-]+((@[1-9]\d*x\.(png|jpg))|(~(iPad|iPhone)\.xib))?$",
"file.message': "Filename must consist only of letters, numbers periods, hyphens, and underscores. The '@' and '~' characters are allowed in special cases: if a file is an image file (with extension '.png' or '.jpg') it may end in '@2x.png', where "2x" may be any multiplier followed by an "x"; if a file is a UI resource file (with extension '.xib') it may end in "~iPad.xib", where "iPad" may be any recognized device name (exhaustively: 'iPad', 'iPhone')."

And even after going through all that, we still can't point at the problem character in the error message.

If we were writing this linter in the upstream, it would have several different checks which are each tailored to point at a specific problem, with a smaller amount of more helpful language.

It should be easy enough to write, maintain and distribute linters that we don't need to compromise on linter quality. It isn't right now, but I don't want to gum up the upstream linters with changes that we wouldn't make if packaging was in a place we were happy with.

This comment was removed by johnny-bit.

Custom linters for stuff like that are way to go, that's true: for example some projects may have very complicated naming scheme, where "files in views must be all lowercase with .template extension, placed into dirs corresponding to name of controller, every view should either correspond to action in said controller or be named "all" or stard with underscore [...]" ๐Ÿ˜‰

As for managing and distribution of linters - while there isn't yet a packaging solution present, I find composer + private satis repo very adequate for everything we (as in me, company I work for etc) need, despite composer having obvious problems...