The Problem
In the Glasgow Haskell Compiler, we have a number of ideas for small linting checks which we'd like to fold into Arcanist's linter. Here are a few,
- Checking whether testsuite results are updated in a consistent manner (e.g. if a test's expected output on 32-bit output is changed, we would also expect that the output for 64-bit platforms should change).
- Checking that submodule are updated in a consistent way
- Ensure that AST unique identifiers don't overlap
- Checking that build artifacts aren't accidentally committed to the repository
We already have scripts which check these invariants and would like to integrate them into the standard Arcanist workflow.
Current options
Currently users who want to write project-specific linters have two options
- Write an ArcanistLinter implementation in PHP
- Define an ad-hoc format for producing linter messages which can be parsed with ArcanistScriptAndRegexLinter
Approach (1) can be a substantial hurdle for projects with little experience (or desire to work with) PHP. Approach (2) is essentially a workaround to the fact that there exists no standardized way of serializing Arcanist's linting messages. Such a standard serialization would make the Arcanist's linting mechanism substantially more flexible at little cost.
JSON is a widely used format for serializing structured data and for good reason: it's extensible, reasonably normalizing, and self-describing. It would be nice if Arcanist would provide a linter type similar to ArcanistScriptAndRegexLinter but accepting JSON-serialized messages from the invoked script.
Specification
Similar to ArcanistScriptAndRegexLinter the linter type would have minimal configuration surface, consisting solely of a command line for invoking an external linter.
The external tool would produce a standard JSON array on standard output. The array's elements would be JSON objects with a small vocabulary of attributes (following the model of the matches currently accepted by ArcanistScriptAndRegexLinter),
- message (required) Text describing the lint message. For example, "This is a syntax error.".
- name (optional) Text summarizing the lint message. For example, "Syntax Error".
- severity (optional) The word "error", "warning", "autofix", "advice", or "disabled", in any combination of upper and lower case.
- file (optional) The name of the file to raise the lint message in. If not specified, defaults to the linted file. It is generally not necessary to specify this unless the linter can raise messages in files other than the one it is linting.
- line (optional) The line number of the message.
- char (optional) The character offset of the message.
- offset (optional) The byte offset of the message. If provided, this supersedes line and char.
- original (optional) The text the message affects.
- replacement (optional) The text that the range captured by original should be automatically replaced by to resolve the message.
- code (optional) A short error type identifier which can be used elsewhere to configure handling of specific types of messages. For example, "EXAMPLE1", "EXAMPLE2", etc., where each code identifies a class of message like "syntax error", "missing whitespace", etc. This allows configuration to later change the severity of all whitespace messages, for example.
- throw (optional) If set with a string error message arc will throw the given message. You can use this to fail abruptly if you encounter unexpected output. All processing will abort.