diff --git a/src/docs/user/userguide/arcanist_lint.diviner b/src/docs/user/userguide/arcanist_lint.diviner index 0e1b9cd20a..95db89d30c 100644 --- a/src/docs/user/userguide/arcanist_lint.diviner +++ b/src/docs/user/userguide/arcanist_lint.diviner @@ -1,417 +1,419 @@ @title Arcanist User Guide: Lint @group userguide Guide to lint, linters, and linter configuration. This is a configuration guide that helps you set up advanced features. If you're just getting started, you don't need to look at this yet. Instead, start with the @{article:Arcanist User Guide}. This guide explains how lint works when configured in an `arc` project. If you haven't set up a project yet, do that first. For instructions, see @{article:Arcanist User Guide: Configuring a New Project}. Overview ======== "Lint" refers to a general class of programming tools which analyze source code and raise warnings and errors about it. For example, a linter might raise warnings about syntax errors, uses of undeclared variables, calls to deprecated functions, spacing and formatting conventions, misuse of scope, implicit fallthrough in switch statements, missing license headers, use of dangerous language features, or a variety of other issues. Integrating lint into your development pipeline has two major benefits: - you can detect and prevent a large class of programming errors; and - you can simplify code review by addressing many mechanical and formatting problems automatically. When arc is integrated with a lint toolkit, it enables the `arc lint` command and runs lint on changes during `arc diff`. The user is prompted to fix errors and warnings before sending their code for review, and lint issues which are not fixed are visible during review. There are many lint and static analysis tools available for a wide variety of languages. Arcanist ships with bindings for many popular tools, and you can write new bindings fairly easily if you have custom tools. Available Linters ================= To see a list of available linters, run: $ arc linters Arcanist ships with bindings for a number of linters that can check for errors or problems in JS, CSS, PHP, Python, C, C++, C#, Less, Puppet, Ruby, JSON, XML, and several other languages. Some general purpose linters are also available. These linters can check for cross-language issues like sensible filenames, trailing or mixed whitespace, character sets, spelling mistakes, and unresolved merge conflicts. If you have a tool you'd like to use as a linter that isn't supported by default, you can write bindings for it. For information on writing new linter bindings, see @{article:Arcanist User Guide: Customizing Lint, Unit Tests and Workflows}. Configuring Lint ================ To configure lint integration for your project, create a file called `.arclint` at the project root. This file should be in JSON format, and look like this: ```lang=js { "linters": { "sample": { "type": "pep8" } } } ``` Here, the key ("sample") is a human-readable label identifying the linter. It does not affect linter behavior, so just choose something that makes sense to you. The `type` specifies which linter to run. Use `arc linters` to find the names of the available linters. **Including and Excluding Files**: By default, a linter will run on every file. This is appropriate for some linters (like the Filename linter), but normally you only want to run a linter like **pep8** on Python files. To include or exclude files, use `include` and `exclude`: ```lang=js { "linters": { "sample": { "type": "pep8", "include": "(\\.py$)", "exclude": "(^third-party/)" } } } ``` The `include` key is a regular expression (or list of regular expressions) identifying paths the linter should be run on, while `exclude` is a regular expression (or list of regular expressions) identifying paths which it should not run on. Thus, this configures a **pep8** linter named "sample" which will run on files ending in ".py", unless they are inside the "third-party/" directory. In these examples, regular expressions are written in this style: "(example/path)" They can be specified with any delimiters, but using `(` and `)` means you don't have to escape slashes in the expression, so it may be more convenient to specify them like this. If you prefer, these are all equivalent: "(example/path)i" "/example\\/path/i" "@example/path@i" You can also exclude files globally, so no linters run on them at all. Do this by specifying `exclude` at top level: ```lang=js { "exclude": "(^tests/data/)", "linters": { "sample": { "type": "pep8", "include": "(\\.py$)", "exclude": "(^third-party/)" } } } ``` Here, the addition of a global `exclude` rule means no linter will be run on files in "tests/data/". **Running Multiple Linters**: Often, you will want to run several different linters. Perhaps your project has a mixture of Python and Javascript code, or you have some PHP and some JSON files. To run multiple linters, just list them in the `linters` map: ```lang=js { "linters": { "jshint": { "type": "jshint", "include": "(\\.js$)" }, "xml": { "type": "xml", "include": "(\\.xml$)" } } } ``` This will run JSHint on `.js` files, and SimpleXML on `.xml` files. **Adjusting Message Severities**: Arcanist raises lint messages at various severities. Each message has a different severity: for example, lint might find a syntax error and raise an `error` about it, and find trailing whitespace and raise a `warning` about it. Normally, you will be presented with lint messages as you are sending code for review. In that context, the severities behave like this: - `error` When a file contains lint errors, they are always reported. These are intended to be severe problems, like a syntax error. Unresoved lint errors require you to confirm that you want to continue. - `warning` When a file contains warnings, they are reported by default only if they appear on lines that you have changed. They are intended to be minor problems, like unconventional whitespace. Unresolved lint warnings require you to confirm that you want to continue. - `autofix` This level is like `warning`, but if the message includes patches they will be applied automatically without prompting. - `advice` Like warnings, these messages are only reported on changed lines. They are intended to be very minor issues which may merit a note, like a "TODO" comment. They do not require confirmation. - `disabled` This level suppresses messages. They are not displayed. You can use this to turn off a message if you don't care about the issue it detects. By default, Arcanist tries to select reasonable severities for each message. However, you may want to make a message more or less severe, or disable it entirely. For many linters, you can do this by providing a `severity` map: ```lang=js { "linters": { "sample": { "type": "pep8", "severity": { "E221": "disabled", "E401": "warning" } } } } ``` Here, the lint message "E221" (which is "multiple spaces before operator") is disabled, so it won't be shown. The message "E401" (which is "multiple imports on one line") is set to "warning" severity. If you want to remap a large number of messages, you can use `severity.rules` and specify regular expressions: ```lang=js { "linters": { "sample": { "type": "pep8", "severity.rules": { "(^E)": "warning", "(^W)": "advice" } } } } ``` This adjusts the severity of all "E" codes to "warning", and all "W" codes to "advice". **Locating Binaries and Interpreters**: Normally, Arcanist expects to find external linters (like `pep8`) in `$PATH`, and be able to run them without any special qualifiers. That is, it will run a command similar to: $ pep8 example.py If you want to use a different copy of a linter binary, or invoke it in an explicit way, you can use `interpreter` and `bin`. These accept strings (or lists of strings) identifying places to look for linters. For example: ```lang=js { "linters": { "sample": { "type": "pep8", "interpreter": ["python2.6", "python"], "bin": ["/usr/local/bin/pep8-1.5.6", "/usr/local/bin/pep8"] } } } ``` When configured like this, `arc` will walk the `interpreter` list to find an available interpreter, then walk the `bin` list to find an available binary. If it can locate an appropriate interpreter and binary, it will execute those instead of the defaults. For example, this might cause it to execute a command similar to: $ python2.6 /usr/local/bin/pep8-1.5.6 example.py **Additional Options**: Some linters support additional options to configure their behavior. You can run this command get a list of these options and descriptions of what they do and how to configure them: $ arc linters --verbose This will show the available options for each linter in detail. **Running Different Rules on Different Files**: Sometimes, you may want to run the same linter with different rulesets on different files. To do this, create two copies of the linter and just give them different keys in the `linters` map: ```lang=js { "linters": { "pep8-relaxed": { "type": "pep8", "include": "(^legacy/.*\\.py$)", "severity.rules": { "(.*)": "advice" } }, "pep8-normal": { "type": "pep8", "include": "(\\.py$)", "exclude": "(^legacy/)" } } } ``` This example will run a relaxed version of the linter (which raises every message as advice) on Python files in "legacy/", and a normal version everywhere else. **Example .arclint Files**: You can find a collection of example files in `arcanist/resources/arclint/` to use as a starting point or refer to while configuring your own `.arclint` file. Advanced Configuration: Lint Engines ==================================== If you need to specify how linters execute in greater detail than is possible with `.arclint`, you can write a lint engine in PHP to extend Arcanist. This is an uncommon, advanced use case. The remainder of this section overviews how the lint internals work, and discusses how to extend Arcanist with a custom lint engine. If your needs are met by `.arclint`, you can skip to the next section of this document. The lint pipeline has two major components: linters and lint engines. **Linters** are programs which detect problems in a source file. Usually a linter is an external script, which Arcanist runs and passes a path to, like `jshint` or `pep8`. The script emits some messages, and Arcanist parses the output into structured errors. A piece of glue code (like @{class@arcanist:ArcanistJSHintLinter} or @{class@arcanist:ArcanistPEP8Linter}) handles calling the external script and interpreting its output. **Lint engines** coordinate linters, and decide which linters should run on which files. For instance, you might want to run `jshint` on all your `.js` files, and `pep8.py` on all your `.py` files. And you might not want to lint anything in `externals/` or `third-party/`, and maybe there are other files which you want to exclude or apply special rules for. By default, Arcanist uses the @{class@arcanist:ArcanistConfigurationDrivenLintEngine} engine if there is an `.arclint` file present in the working copy. This engine reads the `.arclint` file and uses it to decide which linters should be run on which paths. If no `.arclint` is present, Arcanist does not select an engine by default. You can write a custom lint engine instead, which can make a more powerful set of decisions about which linters to run on which paths. For instructions on writing a custom lint engine, see @{article:Arcanist User Guide: Customizing Lint, Unit Tests and Workflows} and @{class@arcanist:ExampleLintEngine}. To name an alternate lint engine, set `lint.engine` in your `.arcconfig` to the name of a class which extends @{class@arcanist:ArcanistLintEngine}. For more information on `.arcconfig`, see @{article:Arcanist User Guide: Configuring a New Project}. You can also set a default lint engine by setting `lint.engine` in your global user config with `arc set-config lint.engine`, or specify one explicitly with `arc lint --engine `. This can be useful for testing. There are several other engines bundled with Arcanist, but they are primarily predate `.arclint` and are effectively obsolete. Using Lint to Improve Code Review ================================= Code review is most valuable when it's about the big ideas in a change. It is substantially less valuable when it devolves into nitpicking over style, formatting, and naming conventions. The best response to receiving a review request full of style problems is probably to reject it immediately, point the author at your coding convention documentation, and ask them to fix it before sending it for review. But even this is a pretty negative experience for both parties, and less experienced reviewers sometimes go through the whole review and point out every problem individually. Lint can greatly reduce the negativity of this whole experience (and the amount of time wasted arguing about these things) by enforcing style and formatting rules automatically. Arcanist supports linters that not only raise warnings about these problems, but provide patches and fix the problems for the author -- before the code goes to review. Good linter integration means that code is pretty much mechanically correct by the time any reviewer sees it, provides clear rules about style which are especially helpful to new authors, and has the overall effect of pushing discussion away from stylistic nitpicks and toward useful examination of large ideas. It can also provide a straightforward solution to arguments about style, if you adopt a policy like this: - If a rule is important enough that it should be enforced, the proponent must add it to lint so it is automatically detected or fixed in the future and no one has to argue about it ever again. - If it's not important enough for them to do the legwork to add it to lint, they have to stop complaining about it. This may or may not be an appropriate methodology to adopt at your organization, but it generally puts the incentives in the right places. Philosophy of Lint ================== Some general thoughts on how to develop lint effectively, based on building lint tools at Facebook: - Don't write regex-based linters to enforce language rules. Use a real parser or AST-based tool. This is not a domain you can get right at any nontrivial complexity with raw regexes. That is not a challenge. Just don't do this. - False positives are pretty bad and should be avoided. You should aim to implement only rules that have very few false positives, and provide ways to mark false positives as OK. If running lint always raises 30 warnings about irrelevant nonsense, it greatly devalues the tool. - Move toward autocorrect rules. Most linters do not automatically correct the problems they detect, but Arcanist supports this and it's quite valuable to have the linter not only say "the convention is to put a space after comma in a function call" but to fix it for you. = Next Steps = Continue by: - integrating and customizing built-in linters and lint bindings with @{article:Arcanist User Guide: Customizing Existing Linters}; or + - use a linter that hasn't been integrated into Arcanist with + @{article:Arcanist User Guide: Script and Regex Linter}; or - learning how to add new linters and lint engines with @{article:Arcanist User Guide: Customizing Lint, Unit Tests and Workflows}. diff --git a/src/docs/user/userguide/arcanist_lint_script_and_regex.diviner b/src/docs/user/userguide/arcanist_lint_script_and_regex.diviner new file mode 100644 index 0000000000..6fe5db2a2f --- /dev/null +++ b/src/docs/user/userguide/arcanist_lint_script_and_regex.diviner @@ -0,0 +1,153 @@ +@title Arcanist User Guide: Script and Regex Linter +@group userguide + +Explains how to use the Script and Regex linter to invoke an existing +lint engine that is not integrated with Arcanist. + +The Script and Regex linter is a simple glue linter which runs some +script on each path, and then uses a regex to parse lint messages from +the script's output. (This linter uses a script and a regex to +interpret the results of some real linter, it does not itself lint +both scripts and regexes). + +Configure this linter by setting these keys in your configuration: + + - `script-and-regex.script` Script command to run. This can be + the path to a linter script, but may also include flags or use shell + features (see below for examples). + - `script-and-regex.regex` The regex to process output with. This + regex uses named capturing groups (detailed below) to interpret output. + +The script will be invoked from the project root, so you can specify a +relative path like `scripts/lint.sh` or an absolute path like +`/opt/lint/lint.sh`. + +This linter is necessarily more limited in its capabilities than a normal +linter which can perform custom processing, but may be somewhat simpler to +configure. + +== Script... == + +The script will be invoked once for each file that is to be linted, with +the file passed as the first argument. The file may begin with a "-"; ensure +your script will not interpret such files as flags (perhaps by ending your +script configuration with "--", if its argument parser supports that). + +Note that when run via `arc diff`, the list of files to be linted includes +deleted files and files that were moved away by the change. The linter should +not assume the path it is given exists, and it is not an error for the +linter to be invoked with paths which are no longer there. (Every affected +path is subject to lint because some linters may raise errors in other files +when a file is removed, or raise an error about its removal.) + +The script should emit lint messages to stdout, which will be parsed with +the provided regex. + +For example, you might use a configuration like this: + + "script-and-regex.script": "/opt/lint/lint.sh --flag value --other-flag --" + +stderr is ignored. If you have a script which writes messages to stderr, +you can redirect stderr to stdout by using a configuration like this: + + "script-and-regex.script": "sh -c '/opt/lint/lint.sh \"$0\" 2>&1'" + +The return code of the script must be 0, or an exception will be raised +reporting that the linter failed. If you have a script which exits nonzero +under normal circumstances, you can force it to always exit 0 by using a +configuration like this: + + "script-and-regex.script": "sh -c '/opt/lint/lint.sh \"$0\" || true'" + +Multiple instances of the script will be run in parallel if there are +multiple files to be linted, so they should not use any unique resources. +For instance, this configuration would not work properly, because several +processes may attempt to write to the file at the same time: + + COUNTEREXAMPLE + "script-and-regex.script": "sh -c '/opt/lint/lint.sh --output /tmp/lint.out \"$0\" && cat /tmp/lint.out'" + +There are necessary limits to how gracefully this linter can deal with +edge cases, because it is just a script and a regex. If you need to do +things that this linter can't handle, you can write a phutil linter and move +the logic to handle those cases into PHP. PHP is a better general-purpose +programming language than regular expressions are, if only by a small margin. + +== ...and Regex == + +The regex must be a valid PHP PCRE regex, including delimiters and flags. + +The regex will be matched against the entire output of the script, so it +should generally be in this form if messages are one-per-line: + + /^...$/m + +The regex should capture these named patterns with `(?P...)`: + + - `message` (required) Text describing the lint message. For example, + "This is a syntax error.". + - `name` (optional) Text summarizing the lint message. For example, + "Syntax Error". + - `severity` (optional) The word "error", "warning", "autofix", "advice", + or "disabled", in any combination of upper and lower case. Instead, you + may match groups called `error`, `warning`, `advice`, `autofix`, or + `disabled`. These allow you to match output formats like "E123" and + "W123" to indicate errors and warnings, even though the word "error" is + not present in the output. If no severity capturing group is present, + messages are raised with "error" severity. If multiple severity capturing + groups are present, messages are raised with the highest captured + serverity. Capturing groups like `error` supersede the `severity` + capturing group. + - `error` (optional) Match some nonempty substring to indicate that this + message has "error" severity. + - `warning` (optional) Match some nonempty substring to indicate that this + message has "warning" severity. + - `advice` (optional) Match some nonempty substring to indicate that this + message has "advice" severity. + - `autofix` (optional) Match some nonempty substring to indicate that this + message has "autofix" severity. + - `disabled` (optional) Match some nonempty substring to indicate that this + message has "disabled" severity. + - `file` (optional) The name of the file to raise the lint message in. If + not specified, defaults to the linted file. It is generally not necessary + to capture this unless the linter can raise messages in files other than + the one it is linting. + - `line` (optional) The line number of the message. + - `char` (optional) The character offset of the message. + - `offset` (optional) The byte offset of the message. If captured, this + supersedes `line` and `char`. + - `original` (optional) The text the message affects. + - `replacement` (optional) The text that the range captured by `original` + should be automatically replaced by to resolve the message. + - `code` (optional) A short error type identifier which can be used + elsewhere to configure handling of specific types of messages. For + example, "EXAMPLE1", "EXAMPLE2", etc., where each code identifies a + class of message like "syntax error", "missing whitespace", etc. This + allows configuration to later change the severity of all whitespace + messages, for example. + - `ignore` (optional) Match some nonempty substring to ignore the match. + You can use this if your linter sometimes emits text like "No lint + errors". + - `stop` (optional) Match some nonempty substring to stop processing input. + Remaining matches for this file will be discarded, but linting will + continue with other linters and other files. + - `halt` (optional) Match some nonempty substring to halt all linting of + this file by any linter. Linting will continue with other files. + - `throw` (optional) Match some nonempty substring to throw an error, which + will stop `arc` completely. You can use this to fail abruptly if you + encounter unexpected output. All processing will abort. + +Numbered capturing groups are ignored. + +For example, if your lint script's output looks like this: + + error:13 Too many goats! + warning:22 Not enough boats. + +...you could use this regex to parse it: + + /^(?Pwarning|error):(?P\d+) (?P.*)$/m + +The simplest valid regex for line-oriented output is something like this: + + /^(?P.*)$/m