Page MenuHomePhabricator

Auto-fix patch is wrong when multiple issue affect the same line
Open, Needs TriagePublic

Description

I'm writing a new linter for C#, and I'm encountering this problem:

June@JUNE-WINDOWS-PC ~/Documents/Projects/games-project (debug-log)
$ arc lint Assets/Code/test.cs
>>> Lint for Assets\Code\test.cs:


   Error  (CSINDENT1) Code is not indented correctly
    This token is not indented correctly.

               4     {
               5         public void Method()
               6         {
    >>> -      7           Debug.Log("test");
        +                    Debug.Log("test");
               8         }
               9     }

   Error  (CSDEBUGLOG1) Debug.Log should be removed
    Debug.Log should be removed as it clutters the Unity console while
    playing the game.  If the message is important enough, use Debug.Warning
    or Debug.Error instead.  If you really want to just log, use
    Debug.LogFormat as that method call is not detected by this lint rule.

               4     {
               5         public void Method()
               6         {
    >>> -      7           Debug.Log("test");
        +
               8         }
               9     }
--- C:\Users\June\Documents\Projects\games-project\Assets\Code\test.cs  Mon Jul 27 16:08:36 2015
+++ C:\Users\June\AppData\Local\Temp\7vjpuokcyds8ws44\2966BC3.tmp       Mon Jul 27 16:08:55 2015
@@ -4,7 +4,7 @@
     {
         public void Method()
         {
-          Debug.Log("test");
+            Debug.Log("test");
         }
     }
 }


    Apply this patch to Assets\Code\test.cs? [y/N]

The cumulation of both fixes should be to remove Debug.Log, but that doesn't happen. Instead only the indentation fix is suggested in the patch.

Is there some way of fixing this (without making every lint policy detect if every other lint policy would affect it)?

Event Timeline

hach-que assigned this task to epriestley.
hach-que raised the priority of this task from to Needs Triage.
hach-que updated the task description. (Show Details)
hach-que added a project: Arcanist.
hach-que added a subscriber: hach-que.

I tested running the lint policies in the opposite order, and it didn't make a difference to the suggested patch.

The indentation is suggesting to change the whitespace between the start of the line and the start of Debug.Log, while the removal lint policy is suggesting to remove the line entirely.

I've temporarily worked around this by making it remove from the start of Debug.Log, but this means that the line itself doesn't get removed (the indentation policy adds indentation to a line which then has all the contents after that removed).

When a linter emits two strictly conflicting patches ("Change A to B", "Change A to C"), what behavior do you expect as a user?

I'd expect the change that wholly encompasses the other to take precedence. For example a patch removing characters 20 characters at offset 10 should take precedence over a patch that adds 5 characters at offset 15.

The alternative is to provide linters a "run again after patching" flag, allowing multiple passes for linters where some fixes might introduce secondary issues (because traversing and applying fixes to an AST is extremely non-trivial).

Suppose the patches literally suggest changing "A" to "B":

  • Neither patch affects more characters.
  • Running the linters again will never exit (they'll keep swapping "B -> C", then "C -> B").

With the first case (pick the bigger patch), what would you expect to happen when the patches are the same?
With the second case (run again), how would you expect us to deal with loops?

When the patches are the same I'd expect this to be considered a flaw in the linter; there should never be two suggested patches which yield a result of the same importance.

(We could also attach a priority property to lint messages which Arcanist could then use to determine which patch to apply when they conflict in any scenario)

The same goes for loops; it should the responsibility of the linter to ensure this doesn't happen since there's no way Arcanist can accurately detect or safely handle it.

Okay, so suppose we decide this represents a flaw in the linter and a user does accidentally write two linters which are "flawed" by suggesting strictly conflicting patches. Maybe they just made a mistake, or maybe they enabled two different linters which they didn't realize have conflicting behaviors, or maybe they intuitively think the rules for deciding which message wins should be different (for example: use the stronger severity; or use the first message raised; or use the message from the first declared linter; or use the message with the shorter replacement text). Or we add a special priority field for this and both messages have the same priority. What should the behavior be?

In the second case, suppose we add a flag or a default that makes arc run lint again after messages are raised, and there's a flawed linter that causes a loop swapping "B" and "C" over and over again? Your expectation is that arc will just run forever and the user will be responsible for figuring out what they did wrong?

Oh hm, I didn't consider two different linters suggesting similar patches; I was just considering the scenario of the same linter raising similar patches.

I think giving the lint messages a priority is the best way forward as it should alleviate issues at least within the same linter. If two message patches conflict and have the same priority, then Arcanist could either give the user a choice of what patch to apply (hard), or throw an exception (easy) or continue the current behaviour of picking one at seemingly random (also easy).

I guess the loop case we could hash the file contents before and after each patch run and make sure the same hash doesn't occur twice? That would detect of a second or more patch returns the file back to a previous state (and thus inducing a loop).

angie added a project: Restricted Project.Nov 14 2015, 4:31 AM

This appears to still be open.

Is there any better approach than adding custom patch application prompts inside all of our own custom linters?

Your desired behavior is to emit multiple conflicting patches, then prompt the user to choose between them?

Here's an example:

clang-format and clang-tidy are 2 different tools, which both can emit fixes, and can operate on the same file. If I write a linter around them that sets up the original and replacement text, so that patches are suggested by arc lint, the linter does NOT behave as anyone would expect.

What I want, is this capability coupled with some kind of intuitive behavior.

Some options:

  1. detect the conflicts (via hash, overhapping lines/chars, etc.) and emit a warning about it, such as "This patch conflicts with another suggested autofix. Applying it may mean that you need to re-run lint a second time. Apply this patch? [Y/n]"
  2. re-archetect the patch application to operate per linter or per issue rather than after completing all analysis
  3. re-run lint after applying patch to verify that it is now clean, instead of just saying it's okay after application
  4. documentation or examples that suggest any other workaround that meets the "capable and intuitive" criterion. It's even ok if that comes at runtime cost...

re-archetect the patch application to operate per linter or per issue rather than after completing all analysis

Maybe we're talking about different things here, but I believe this should already happen: we only run into abnormal behavior here if the actual byte ranges of two or more lint patches overlap. When linters emit messages which do not conflict, my belief is that we apply them cleanly.

Do both of these tools emit a list of minimal, distinct patches in your environment (e.g.: change if () into if()), or do one or both emit patches which rewrite entire files ("change entire file A into entire file B")?

If one or both of the linters emits entire-file-long patches and you currently just prompt users to accept or reject all the automatic fixes made by, e.g., clang-tidy, you'll get bad behavior, but the prompt isn't really meaningful in the first place.

The Phabricator environment has ~100 linter rules which behave like distinct linters but operate harmoniously in practice, and the original report here is specifically about two linters raising messages which affect the same bytes in a file.


Broadly, I don't want to adopt any solution which prevents us from parallelizing linters in the general case, particularly if we need to do serial <lint + interactive prompt + lint + interactive prompt> phases. I think the user experience cost of these approaches is too high in the overwhelming majority of cases today, and significant planned work in this area focuses on reducing these costs (increasing parallelism and reducing or consolidating interactive prompting), e.g. see T12996, and that resolving this case by requiring users to sit and watch the prompt more often/for longer would be harmful on the balance.

If you want to pursue a serial lint pipeline (where you pass files through clang-format, apply patches, then pass the output of that through clang-tidy, or vice versa) I think your best bet is to do it inside your linter today (i.e., write a clang-format-then-clang-tidy wrapper) -- but you'll have to automatically detect and resolve conflicting fixes yourself. It's possible we might let you build serial lint pipelines in arc some day, but this would be an explicit opt-in, not the default behavior, and is far on the other side of major work in T13098, T12996, T10038, etc.

Likewise, I don't want to pursue a solution which requires re-running linters again in the general case. This punishes every arc diff user because someone might have configured a linter pipeline which can not resolve in one pass, and there's no guarantee a linter pipeline can resolve in finite passes anyway. It's possible we might eventually let you opt into this behavior, but I think it is vanishingly unlikely that this will ever be part of the upstream.


To step back here, the ideal behavior from the upstream viewpoint is:

  • Linters emit minimal, well-labeled patches: this lets us track lint issues in a repository, raise them meaningfully in Differential, gives users meaningful decisions, etc.
  • Linters resolve in one pass: this should generally be achievable with well-designed linters, even if they aren't directly cooperating.
  • Linters rarely/never emit conflicting patches: this lets us avoid prompting users to make decisions that they shouldn't have to make.

The ideal behavior for users configuring linters is sometimes:

  • Linters rewrite entire files: because, say, clang-whatever does this and doesn't have flags for individual messages and you don't want to modify it.
  • Linters just work, taking as many retries and serial steps to resolve as necessary: since you don't want to adjust how the linters work and don't really care how long people have to sit in arc diff answering prompts as long as beautified code comes out the other end.
  • Linters frequently emit conflicting patches and arc does something reasonable: so you don't have to adjust linters.

These behaviors make linters easier to configure, but harm end users by running more slowly, requiring them to make more and/or less meaningful decisions, potentially requiring them to manually resolve lint conflicts, and preventing individual lint issues from being tracked through review into the repository.

Assuming I'm on the right track here and this is mostly a "whole file linters" issue, I think two possible approaches are:

  • someone wraps or modifies these linters to be well-behaved after T10038 and this issue goes away on its own in 99% of cases; or
  • we introduce a separate beautifier phase (I think this is touched on in T5064) which is similar to the planned "generate code" phase and runs beautifier scripts which just rewrite entire files and aren't really linters (particularly, they do not raise granular issues tied to specific narrow byte ranges of source code). If you just want to run beautifiers that don't point at individual issues and don't give users any meaningful decisions, you run them in this phase instead.

It feels like the basis of the problem is that linting was written well without autofix in mind and then the feature was added later. It wasn't designed in from the beginning and that results in limitations where compromises were/are necessary. Unfortunately, this means users sometimes end up with unexpected or unintuitive behaviors. I think we're all just trying to find ways to mitigate them.

Do both of these tools emit a list of minimal, distinct patches in your environment (e.g.: change if () into if()), or do one or both emit patches which rewrite entire files ("change entire file A into entire file B")?

The linters we have (multiple authors, often written to best leverage the command line of the tool) either emit a diff or copy the file, modify it, and we diff it against the original. Then, the diff is pulled in as a generic error with original and replacement text populated. The text is often multi-line, so they can be somewhat large byte ranges rather than just a few characters.

If one or both of the linters emits entire-file-long patches and you currently just prompt users to accept or reject all the automatic fixes made by, e.g., clang-tidy, you'll get bad behavior, but the prompt isn't really meaningful in the first place.

Yes, I agree completely agree on the meaningful point. The patches are usually big, so they do not careful review even with the prompt. That also arrises, independent from multi-line individual fixes, from any file with a lot of issues since they are aggregated into a single patch per file.

To step back here, the ideal behavior from the upstream viewpoint is:

  • Linters emit minimal, well-labeled patches: this lets us track lint issues in a repository, raise them meaningfully in Differential, gives users meaningful decisions, etc.
  • Linters resolve in one pass: this should generally be achievable with well-designed linters, even if they aren't directly cooperating.
  • Linters rarely/never emit conflicting patches: this lets us avoid prompting users to make decisions that they shouldn't have to make.

I like this advice. Is it documented anywhere?

The rarely conflicts part sounds reasonable but I'm sure everyone has varying opinions on what "rarely" means. If, for example, one tool suggests an indent on a line and another tool suggests capitalizing the first word on the line the byte ranges do not overlap (they're just adjacent). If that is not considered conflicting, I'd be impressed with that behavior. I'm (hopefully incorrectly) assuming that the 1st replacement to insert the indent will shift the character offset of the word and the 2nd replacement will now no longer match original text and cannot be applied. This should at least be easily detectable.

Another aside: Can/does Arcanist verify original text matches what it has prior to applying replacement text? That would help sanity check user-written linters.

Assuming I'm on the right track here and this is mostly a "whole file linters" issue, I think two possible approaches are:

  • someone wraps or modifies these linters to be well-behaved after T10038 and this issue goes away on its own in 99% of cases; or
  • we introduce a separate beautifier phase (I think this is touched on in T5064) which is similar to the planned "generate code" phase and runs beautifier scripts which just rewrite entire files and aren't really linters (particularly, they do not raise granular issues tied to specific narrow byte ranges of source code). If you just want to run beautifiers that don't point at individual issues and don't give users any meaningful decisions, you run them in this phase instead.

I think the beautifier phase is exactly what I and others are asking for--be it a little more specific rather than enabling multiple generic "lint" phases. One thing to watch out for is if this kind of feature is just another bandaid when an autofix-aware re-design may be warranted.

It feels like the basis of the problem is that linting was written well without autofix in mind and then the feature was added later.

Although autofix was added later, I wouldn't say this is true in a broad sense. Autofix only adjusts prompt behavior and lint has always included many beautification rules since day one, and has always assumed that the majority of rules are beautification rules which users will almost always answer "y" to.

(A sort of hidden assumption around beautification rules is that new users will read the messages the first few times, learn the coding standard, begin writing their code to that standard, and rarely see the messages thereafter, so that they won't always be answering "y" a lot of times in arc lint. Although we have a large number of beautification rules in this codebase, I rarely actually encounter one day to day personally. This may not be true of all engineers, editors, or environments.)

Lint is written assuming the linter is a smart, surgical, AST-based analyzer which can emit tailored messages, since I believe these kinds of linters are most valuable, and assumed all linters would emit this kind of message when I wrote the pipeline, since this kind of message is generaly easiest for an AST-based linter to emit (it's more work to actually apply all the changes to the file). The desire for wrapped external "beautifier"-style "linters" which emit huge replacement blocks or rewrite entire files arrived later, but is not what "autofix" was aimed at.

Generally, arc lint is written to make it very hard for arc ... to damage your working copy by surprise. When users run arc and it does something confusing, it can cause a big loss of trust in the tool and a feeling that it's unreliable, and the user may go on Twitter immediately and say "arc and phabricator are a steaming pile of garbage and whoever wrote them is an incompetent clown", even if the bad behavior was configured by someone at their company. They can't tell, it feels like "arc" burned them, and they reasonably develop a strong aversion to "arc".

Many users currently find "arc" to be a perplexing, unpredictable mess despite how heavily we err on the side of explaining behavior carefully and explicitly in many cases. I think users often have different expectations about what a given arc command will do when they run it, and can find it frustrating when it does something else, but often aren't aware of the other sets of expectations that different users might have and the tool's need to accommodate those (in some cases, because they have limited fluency in git).

To my knowledge, we've actually destroyed data in working copies only once, in ~2010, because of a bug in HPHP with substr() that wasn't out fault. I've also become more comfortable with the reality that a vanity search for your project on Twitter is always going to bias heavily toward complaints, and more comfortable that the "rule from an eternal iron throne" model of open source is the one that I'm most comfortable with as a project lead. I'd expect to make it easier to make arc lint automatically mutate the working copy after T13098.

I'm (hopefully incorrectly) assuming that the 1st replacement to insert the indent will shift the character offset of the word and the 2nd replacement will now no longer match original text and cannot be applied.

These patches will "merge" properly and apply correctly: they do not affect the same byte range as far as the patch application algorithm is concerned. There should be unusual behavior only when two different linters actually say that you need to make two different, conflicting edits to the same byte range.

Here are the two linters you propose:

.arclint
$ cat .arclint 
{
  "linters": {
    "the-one-that-indents": {
      "type": "script-and-regex",
      "script-and-regex.script": "cat indent.lint #",
      "script-and-regex.regex": "/^(?P<offset>\\d+):(?P<original>[^>]*)>(?P<replacement>[^:]*)::(?P<message>.*)$/m"
    },
    "the-one-that-capitalizes": {
      "type": "script-and-regex",
      "script-and-regex.script": "cat capitalize.lint #",
      "script-and-regex.regex": "/^(?P<offset>\\d+):(?P<original>[^>]*)>(?P<replacement>[^:]*)::(?P<message>.*)$/m"
    }
  }
}

Here is the first edit:

indent.lint
12:>    ::Quack should be indented

Here is the second edit:

capitalize.lint
12:quack>QUACK::Quack should be capitalized

Here is the subject:

quack.text (Input)
if (duck) {
quack();
}

Here is the UX in arc:

$ arc lint quack.text
>>> Lint for quack.text:


   Error  (S&RX) Lint
    Quack should be indented

               1 if (duck) {
    >>> -      2 quack();
        +            quack();
               3 }

   Error  (S&RX) Lint
    Quack should be capitalized

               1 if (duck) {
    >>> -      2 quack();
        +        QUACK();
               3 }
--- /Users/epriestley/scratch/spellbook/quack.text	2018-08-09 11:59:09.000000000 -0700
+++ /private/var/folders/p3/9wkc1nw50292mwz0zsw3g7j40000gn/T/9pt2iwm9ro4c8kk0/68520-YyguBr	2018-08-09 12:05:24.000000000 -0700
@@ -1,3 +1,3 @@
 if (duck) {
-quack();
+    QUACK();
 }


    Apply this patch to quack.text? [Y/n]

Note that the proposed patch includes both changes. Here is the output file after applying both patches with "y":

quack.text (Output)
if (duck) {
    QUACK();
}

Note the call to quack(); has been both indented and capitalized.

The algorithm is relatively smart and I believe it has very good behavior when messages actually describe surgical edits (capitalize this, indent that, add space here) rather than "replace entire file A with entire file B" or "replace gigantic block A with gigantic block B, reverse-engineered from diff output".

(If you find a set of surgical messages which you believe should be automatically mergeable but which are not, I'd be happy to consider upstream behavioral changes to improve our ability to merge them. Note that we should already be able to minimize patches by discarding common prefix and suffix characters: if you patch quack into quAck, it's the same as patching a into A. But we can not safely minimize patches with internal similarities in the general case, like those arising from reverse-engineering diff output.)

Another aside: Can/does Arcanist verify original text matches what it has prior to applying replacement text? That would help sanity check user-written linters.

I think it currently does not, but this is a reasonable guard rail it probably should include. This error is relatively hard to make if you're writing a "proper" linter with an AST, but it's easy to imagine it occurring with a less-formal linter.

I like this advice. Is it documented anywhere?

Not currently. We've seen beautifier-style linters only somewhat recently and don't have any formal guidance on them today.

I do think this intent should be somewhat obvious from the shape of the API, and how it accepts messages at particular offsets with original and replacement text (not before/after files), and the apparent need to insert a diff step in the pipeline (our design is awful if this is the expectation, right?), and the poor behavior of arc when you write a "linter" like this (huge meaningless prompts), all the upstream linters being message oriented, and the documentation's description of what a linter is as "a general class of programming tools which analyze source code and raise warnings and errors about it", rather than "...and rewrite files", etc. That said, it's reasonable to include guidance about beautifier-style linters which do not emit messages, and reasonable to imagine that, say, different people may build different parts of a beautifier-style lint pipeline, assume everyone else made reasonable decisions, and end up far afield from what things were intended to look like.


I think the best fix is to turn these linters into surgical message-emitters somehow, and that doing so will really resolve essentially all of this behavior except in cases where the tools legitimately disagree about how code should be formatted. Users will also get much better behavior, lint messages will make sense in Differential and Diffusion, the linter pipeline will teach the coding standard, etc. I also imagine that larger projects will eventually reach a point where being able to customize AST-based linter behavior is clearly beneficial, and if you're modifying the tool and writing your own rules anyway, it's normally trivial to emit messages surgically.

But, for a given external-binary, that may be a lot of work, and may not be possible or realistic. You also need to be very surgical for best behavior. For example, this:

quack( TRUE ) -> quack(true)

...should be three messages to apply cleanly: remove the leading whitespace, write TRUE as true, remove the trailing whitespace. Even relatively surgical linters may emit the whitespace messages as a single edit, quack( TRUE ) -> quack(TRUE) (which is probably better for human consumers, but worse for automated patch appliers), which will conflict with a TRUE -> true edit.

XHPAST emits this kind of message surgically, and this sort of thing is generally relatively easy to do if you have a mutable AST-based linter, but I think the reality is that almost no one who wants arc lint to run lint rules is actually writing those rules themselves by interacting with an AST.

In the case of beautifiers, where the path between here and surgical messages is a very long one, a separate beautifier step is probably more realistic.

In the case of third-party linters which emit surgical-but-not-minimal messages or reasonable-but-conflicting messages I think the behavior referenced by this task probably should be improved somehow, but I believe this is really only a very small handful of edge cases which rarely occur in practice where two rules truly do propose conflicting edits to the same bytes in the file.

I'm going to try to rewrite our linters to emit minimal fixes and try that
out. I do think that will make the conflicts nearly nonexistent.

I think adding the "requirements of a custom linter" and possibly also
adding verification of original text matching the byte range for fix
application should still be addressed, though.

I'll plan to make these changes:

  • Add a separate "beautifier" phase and/or include "beautifiers" as a supported use case in the planned "code generation" phase. For our purposes, "beautifiers" are lint-like programs which rewrite files in place without giving users meaningful decisions, usually to make the files conform to a style guideline (even if the actual edits are more correctness-oriented and less presentation-oriented).
  • Update the documentation to discuss "beautifiers" vs "linters", the relevant behavioral distinctions, and how to choose which phase is better for a given program.
  • Add byte range validation as a guardrail for lint.

This stuff should cascade under T13098 somewhere.

Oh, and also to the original use case, I'll at least add some kind of warning for this since "silently skip some patches" isn't a good behavior. I'm open to adding some other hint mechanisms on top of this later so linters can better pick which patch wins, but would like to motivate it with some specific, surgical examples. The original example, where one linter removes debugging code and one linter reformats it, is a good starting point: in this case, "remove" should have greater strength than "reformat" and we'd prefer to discard the reformat patch and apply only the removal patch.

One additional perspective on this:

We could also look at re-running part of the linter pipeline after auto-fixes are applied (against the updated files). We have a use case where the result of one "fixer" can produce output that a later "beautifier" would rewrite based on style rules.

The current solution (as noted above) is to combine these two linters into one so they work more closely in concert.

... but I think it would also be reasonable to re-run the proceeding (by priority) linters after an auto-fix was produced by an earlier linter. The resulting user experience would involve multiple rounds of diff prompts, eventually arriving as the final file state, but it would be unambiguous.

I think that workflow is pretty strongly undesirable: it suggests users make an incorrect change (e.g., suggest a patch which violates the style guideline) and they're expected to answer "Y" to it.

If the "fixer" is impossible to modify, a possible approach is to simulate this cycle by just turning "beautify + lint, apply + beautify" into a single meta-beautifier step and running it in the beautify phase. The user won't be prompted, but the prompt is sort of bogus anyway, so maybe this is fine.

The "build" workflow may end up getting access to separate build phases on either side of lint for other reasons so you could just explicitly apply the beautifier twice if it does, but I don't plan to make arc ever run the same step twice on its own: it's on you to make sure all your beautifiers/fixers/whatever generate the correct state in a single pass, or implement your own phase gymnastics if it's more practical to send users around in circles than correct the linter or granularize the beautifier so it can act as a linter.