Page MenuHomePhabricator

Provide a more structured way of interfacing with external linters
Closed, WontfixPublic

Description

The Problem

In the Glasgow Haskell Compiler, we have a number of ideas for small linting checks which we'd like to fold into Arcanist's linter. Here are a few,

  1. Checking whether testsuite results are updated in a consistent manner (e.g. if a test's expected output on 32-bit output is changed, we would also expect that the output for 64-bit platforms should change).
  2. Checking that submodule are updated in a consistent way
  3. Ensure that AST unique identifiers don't overlap
  4. Checking that build artifacts aren't accidentally committed to the repository

We already have scripts which check these invariants and would like to integrate them into the standard Arcanist workflow.

Current options

Currently users who want to write project-specific linters have two options

  1. Write an ArcanistLinter implementation in PHP
  2. Define an ad-hoc format for producing linter messages which can be parsed with ArcanistScriptAndRegexLinter

Approach (1) can be a substantial hurdle for projects with little experience (or desire to work with) PHP. Approach (2) is essentially a workaround to the fact that there exists no standardized way of serializing Arcanist's linting messages. Such a standard serialization would make the Arcanist's linting mechanism substantially more flexible at little cost.

JSON is a widely used format for serializing structured data and for good reason: it's extensible, reasonably normalizing, and self-describing. It would be nice if Arcanist would provide a linter type similar to ArcanistScriptAndRegexLinter but accepting JSON-serialized messages from the invoked script.

Specification

Similar to ArcanistScriptAndRegexLinter the linter type would have minimal configuration surface, consisting solely of a command line for invoking an external linter.

The external tool would produce a standard JSON array on standard output. The array's elements would be JSON objects with a small vocabulary of attributes (following the model of the matches currently accepted by ArcanistScriptAndRegexLinter),

  • message (required) Text describing the lint message. For example, "This is a syntax error.".
  • name (optional) Text summarizing the lint message. For example, "Syntax Error".
  • severity (optional) The word "error", "warning", "autofix", "advice", or "disabled", in any combination of upper and lower case.
  • file (optional) The name of the file to raise the lint message in. If not specified, defaults to the linted file. It is generally not necessary to specify this unless the linter can raise messages in files other than the one it is linting.
  • line (optional) The line number of the message.
  • char (optional) The character offset of the message.
  • offset (optional) The byte offset of the message. If provided, this supersedes line and char.
  • original (optional) The text the message affects.
  • replacement (optional) The text that the range captured by original should be automatically replaced by to resolve the message.
  • code (optional) A short error type identifier which can be used elsewhere to configure handling of specific types of messages. For example, "EXAMPLE1", "EXAMPLE2", etc., where each code identifies a class of message like "syntax error", "missing whitespace", etc. This allows configuration to later change the severity of all whitespace messages, for example.
  • throw (optional) If set with a string error message arc will throw the given message. You can use this to fail abruptly if you encounter unexpected output. All processing will abort.

Revisions and Commits

Abandoned

Event Timeline

bgamari updated the task description. (Show Details)
bgamari added a project: Arcanist.
bgamari added a subscriber: bgamari.

See Contributing Feature Requests. Please describe the specific, concrete problem you're encountering.

@epriestley, the amended text should clarify my goals here.

epriestley closed this task as Wontfix.EditedNov 22 2015, 10:39 PM
epriestley claimed this task.

Thanks!

We haven't seen other requests in this vein from other users, and are not aware of a more general interest in using JSON bindings to write linters. From your problem description, it sounds like you have a fairly unusual and specific need within your project. We haven't seen other projects report similar needs, and I think this is isn't a good candidate for bringing upstream: we don't want to upstream code which will only be used by one install, and I have no reason to believe that these bindings will see wider use.

In particular, this approach will always be less powerful than writing "real" (liner-specific) bindings: for example, you can't do things like version checks or specialized linter configuration easily. Many (perhaps most? nearly all?) of the bindings we see getting wider use make use of features like this. If a linter is seeing use by multiple installs, it is likely worth our time to write proper bindings for it, which is better for everyone.

While this binding could be made far more powerful so it could support these features, the complexity of writing PHP linters is ultimately not large and writing a JSON-based DSL in PHP so no one has to write PHP linters doesn't seem promising to me. I'm also broadly not sympathetic to the argument that "PHP is icky" -- PHP isn't a very good language, and it is icky, but total unwillingness to write ~50-100 lines of string parsing in PHP strikes me as an excessive level of language zealotry in a general sense. Many other projects have successfully waded into the syrupy, stinking mire that is PHP and emerged successfully, so this isn't a need we're seeing more generally.

Doing this could certainly be easier, and is something we intend to improve over time (see T5447). Installing and distributing extensions is also messy right now, and we likewise have plans to improve it (T5055).

The ScriptAndRegex linter is partly a relic of an earlier time when we accepted a lot of stuff upstream that we shouldn't have (in hindsight long-term cost to us of accepting these features clearly outweighed any value they can ever provide), and we probably would not accept it into the upstream today.

Instead of bringing this upstream, you might consider releasing it yourself and adding it to the Community Resources page.

Fair enough. Thanks for your thoughtful reply.

Instead of bringing this upstream, you might consider releasing it yourself and adding it to the Community Resources page.

Sure, this sounds like an excellent compromise.

Thanks again!