Page MenuHomePhabricator

Script-and-regex linter doesn't handle whole file lint correctly.
Closed, ResolvedPublic

Description

Hi. We hit this over at Dropbox and I believe there's a small bug in the code. Essentially our script outputs:

advice:0 You're editing this file, remember to also edit this other file.

Our regex is "/^(?P<severity>advice):(?P<line>\\d+) (?P<message>.*)$/m"

However when this code runs:

private function getMatchLineAndChar(array $match, $path) {
  if (!empty($match['offset'])) {
    list($line, $char) = $this->getEngine()->getLineAndCharFromOffset(
      idx($match, 'file', $path),
      $match['offset']);
    return array($line + 1, $char + 1);
  }

  $line = idx($match, 'line');
  if ($line) {
    $line = (int)$line;
  } else {
    $line = 1;    <------------------------------
  }

Instead of the line being 0 (meaning whole file) the line becomes 1 which means that you'll only see the lint if you happened to edit the first line.

There's a simple work around which is to change the regex to "/^(?P<severity>advice):(?P<line>.+) (?P<message>.*)$/m" This way it uses a '0' instead of a 0.

Event Timeline

This is very similar to T6303.

Instead of the line being 0 (meaning whole file)

Is this established somewhere (like, the documentation says to do that so the behavior is inconsistent with what we've documented), or is it just the behavior you desire?

(From T6303, it sounds like 0 didn't mean "the whole file" in Oct 2014 either?)

Hmm. I thought I had seen that somewhere. Oh I see what I did. I followed the filename linter code and noticed that it called:

final public function raiseLintAtPath($code, $desc) {
  return $this->raiseLintAtLine(null, null, $code, $desc, null, null);
}

And the only way to do that from the other linter was to pass in 0. Anyway I'm quite happy to keep using .+ but I just wanted you to know.
Oh and this behavior actually changed with D14307

I believe D14307 changed the behavior from "fatal horribly" to "work". It's possible that capturing 0 meant "whole file" at some point (although I'm confused by T6303 if it did) but I think that was accidental and don't believe we suggested/recommended/documented that anywhere.

I'm OK with letting you capture something to mean "no line", but I'd like to make it explicit. For example, you'd capture:

... (?:(?P<wholeFile>0)|(?P<line>[1-9]\d*))) ...

...or make your script output some more meaningful symbol like "whole-file", if you reasonably control the output.

(Or, if you can find something somewhere that documents that "0" means "whole file", I'll just make that the law of the land.)

Well, maybe that's overkill.

D15000 will make capturing no text mean "the whole file", and documents the behavior. You can either change your regexp to not capture "0":

(?:0|(?P<line>[1-9]\d+))

...or change the underlying script to not output 0 (or output something else, for clarity to humans), if you have control over it.