Page MenuHomePhabricator

Improve warning/error severities in default linters and explanations in help text
Open, LowPublic

Description

Dan from Disqus came into IRC today asking for help to configure Arcanist to only run the linter on new changes. The context is the existing codebase has many lint errors, so it would be best if the linter only ran against lint errors the change in question introduces.

See chatlog here -- https://secure.phabricator.com/chatlog/channel/6/?at=108388

I jumped in the code and the --only-new flag is pretty complicated. I came away thinking it would voraciously lint any file with changes... Overall, I was not too helpful in getting resolution here.

Halp?

Event Timeline

btrahan assigned this task to epriestley.
btrahan raised the priority of this task from to Needs Triage.
btrahan updated the task description. (Show Details)
btrahan added a project: Arcanist.
btrahan added subscribers: btrahan, dan-disqus.

Just some links to things that might be helpful:
.arcconfig: https://gist.github.com/dan-disqus/bd3de21c6e84bbbc6a62
Our custom arc stuff: https://github.com/disqus/disqus-arcanist/tree/master/src. It doesn't modify ArcanistLintWorkflow, so probably not the cause of the issues.

Essentially, this problem came about a couple of months ago, when we replaced PyFlakes + Pep8 with flake8. I've noticed that pep8 on its own does what we want (only lints new changes), even without the --only-new flag. From my various tests, I couldn't find a real difference between running arc lint, arc lint --only-changes and arc lint --only-new 1.

Can you show me some examples of messages you don't want to appear when you run arc lint <something something> and walk me though why not?

These options are probably not actually what you want.

The idea is that we have some files that already have a bunch of lint errors. We've then modified the files, and ideally we only want 'new' errors to show up. My reading of --only-new made me think that it'd do exactly that. Essentially, run lint twice - once on a previous revision (for some definition of previous that we might have to specify), and once on the current head (including uncommited files). It'd then compare the two sets of errors, and only print ones that weren't there before.

So for example, I add

import sys

to the top of my file, but it's unused. So when I run arc lint --only-new 1, I expect to see only:

Error  (F401) flake8 1
 'sys' imported but unused

            3 import datetime
            4 from collections import defaultdict
            5 from itertools import combinations, chain
 >>>        6 import sys
            7 
            8 
            9 import lxml.objectify

But I actually get a bunch of other lint that's unrelated to my change.

This may be overcomplicated. I'd also be happy if it just linted the modified lines in my file (and with a bit of tweaking, checked the imports too).

Okay. I think the root issue is that this should be a warning, not an error. The default behavior of warnings and error is this:

  • Errors are expected to be severe (e.g., syntax error) and are always reported, under the theory that they should never exist in the codebase.
  • Warnings are not expected to be severe (e.g., unnecessary import, unused variable, incorrect spacing/formatting) and are reported only on modified lines by default (when run as arc lint), or all lines by default (when run as arc lint <file>). The --lintall and --only-changed flags modify these behaviors and always show warnings when linting a commit range (arc lint --lintall) or show only changed lines when linting a file (arc lint --only-changed <file>).

Particularly, the language on --only-changed and --lintall which discusses "lint warnings" means warnings (those messages with severity level "warning") as distinct from messages (all messages emitted by lint at any severity level).

(I would still expect --only-new to hide all old messages; it may just be buggy. This feature is probably not what you actually want, and is basically specific to garbage messes of unfixable code.)

If messages like "module imported but unused" are raised as warnings instead of errors, you'll get this behavior out of arc lint by default:

  • If you modified the line (e.g., changed import sys to import sys, other), you'll get a warning. This is generally desirable.
  • If you didn't modify the line, you won't.

So the upshot here is that Arcanist has generally sensible definitions of what error severities mean, which are consistent with real-world usage.

Unfortunately, not all the linters we support have severities, and when they do have severities they don't necessarily jive with the Arcanist severities. So we're stuck between doing things that don't make much sense, things which are fragile / not future proof, and things which are counterintuitive.

In this case, we raise this as an error by default because Flake8 does not distinguish between error severities for PyFlakes errors (in turn, this is because PyFlakes does not distinguish between error severities). In contrast, in the PyFlakes linter, we have a hard-coded regexp which raises only these messages as errors:

/(^undefined|^duplicate|before assignment$)/

These are both bad in different ways (the Flake8 linter is too severe; the PyFlakes linter is unintuitive and not future-proof), and having two different approaches is also bad. There isn't a clear way forward here in general, although I think we'll probably move toward doing the "right" thing by default (hard-coded regexps) over doing the intuitive or consistent thing (trust the underlying linter / raise severities consistently).

Regardless, the short-term fix is to adjust the severity of these errors. The easiest way to do this (assuming you're running only Flake8 and not otherwise customizing it) is to destroy any linter customizations (including lint.engine in .arcconfig) and create a .arclint file with this in it:

{
  "linters" : {
    "flake8" : {
      "type" : "flake8",
      "include" : "(\\.py$)",
      "severity" : {
        "F401" : "warning"
      }
    }
  }
}

Then add all the lint codes that Flake8 produces which are too severe to the "severity" map to raise them as warnings instead.

You can also use severity.rules to adjust all rules matching an expression in one go, see https://github.com/epriestley/arclint-examples/blob/master/.arclint

This stuff is new and not widely deployed (see T2039) but should get more traction somewhat soon. When it does, it will get real documentation and I'll probably adjust the default severities by adding some regexp junk. So you could also wait for that and this will fix itself.

Thanks for all the info, I have a much better idea of how/why things work now. I've implemented this using a .arclint with severity.rules and it works perfectly with little tweaking.

We're using jshint as well. I've set this up to run using .arclint, but it relies on the .arcconfig values lint.jshint.bin and lint.jshint.prefix, which I am unable to specify at the moment. I dug around in the source a bit, and at the moment it seems these values are retrieved using getConfigFromAnySource, which doesn't seem to read from the .arclint. Is there any way I can work around this?

That one hasn't been fully ported over to ArcanistExternalLinter yet. Once it is, you'll be able to specify the paths in .arclint and we'll eventually deprecate the config versions. For the moment, leaving those values in .arcconfig should work OK, I think.

(All of this could generally be far clearer and more consistent than it is. There's some work on this stuff coming up soon which should smooth things out a bit.)

Oh sorry, my mistake on having those values in the .arcconfig. It works fine, there was just an unrelated problem on my end.

Thanks so much for all your help, things are working great now!

epriestley renamed this task from arcanist: --only-new and --changed-only confusing and not working to Improve warning/error severities in default linters and explanations in help text.Jan 15 2014, 8:05 PM
epriestley triaged this task as Low priority.