Page MenuHomePhabricator

Also coerce "char" for lint messages
ClosedPublic

Authored by epriestley on Jul 27 2015, 4:41 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 4, 8:40 PM
Unknown Object (File)
Wed, Jan 1, 4:43 AM
Unknown Object (File)
Tue, Dec 24, 11:36 PM
Unknown Object (File)
Sun, Dec 15, 7:35 AM
Unknown Object (File)
Dec 12 2024, 4:22 AM
Unknown Object (File)
Dec 6 2024, 11:37 PM
Unknown Object (File)
Dec 6 2024, 5:01 PM
Unknown Object (File)
Dec 6 2024, 5:02 AM

Details

Summary

Ref T8921. See similar change in D13695. This expands the scope to also coerce setChar().

Test Plan

arc unit --everything

Diff Detail

Repository
rARC Arcanist
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

epriestley retitled this revision from to Also coerce "char" for lint messages.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: btrahan.
This revision was automatically updated to reflect the committed changes.
csilvers added inline comments.
src/lint/ArcanistLintMessage.php
239

I think this should be if (!$value), or some other construct that also matches the empty string. Maybe?

This is what I'm seeing:
Our script-and-regexp regexp is this:

\/^((?P<file>[^:]*):(?P<line>\\d+):((?P<char>\\d+):)? (?P<name>((?P<error>E)|(?P<warning>W))\\S+) (?P<message>[^\\x00\n]*)(\\x00(?P<original>[^\\x00]*)\\x00(?P<replacement>[^\\x00]*)\\x00)?)|(?P<ignore>SKIPPING.*)$\/m

The important part is that the P<char> is in a ?. And in fact sometimes we have lint errors that don't include a char:

foo:3: E512 Unexpected end of file

It seems that this causes 'char' to be set to the empty string? getMatchLineAndChar has this:
$char = idx($match, 'char');. I don't know enough php to know what idx returns in this case, but it doesn't seem to be null, because we're getting a validateInteger failure.

I think it's correct to keep the base class strict, but we can make the ScriptAndRegex linter handle this case. It's reasonable for the pipeline overall to have the behavior you want, I just want the accommodation to be narrow (in ScriptAndRegex) instead of broad (in the base message class).

src/lint/ArcanistLintMessage.php
239

Makes sense. I'm still kinda confused what's going on though -- I would expect preg_match_all to be returning null in this case.

Sensible behavior? In PHP? Get outta here!

$ php -r 'preg_match_all("/(?P<x>a)?b/", "b", $matches); var_dump($matches);'
array(3) {
  [0]=>
  array(1) {
    [0]=>
    string(1) "b"
  }
  ["x"]=>
  array(1) {
    [0]=>
    string(0) ""
  }
  [1]=>
  array(1) {
    [0]=>
    string(0) ""
  }
}

I'm actually not sure either, since we don't do this -- we use PREG_SET_ORDER, which has more reasonable behavior:

$ php -r 'preg_match_all("/(?P<x>a)?b/", "b", $matches, PREG_SET_ORDER); var_dump($matches);'
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(1) "b"
  }
}

Oh, nevermind. Look, PHP!

$ php -r 'preg_match_all("/(?P<x>a)?(?P<y>b)/", "b", $matches, PREG_SET_ORDER); var_dump($matches);'
array(1) {
  [0]=>
  array(5) {
    [0]=>
    string(1) "b"
    ["x"]=>
    string(0) ""
    [1]=>
    string(0) ""
    ["y"]=>
    string(1) "b"
    [2]=>
    string(1) "b"
  }
}

In conclusion: PHP.

Here's my (untested) patch, does this fix it for you?

diff --git a/src/lint/linter/ArcanistScriptAndRegexLinter.php b/src/lint/linter/ArcanistScriptAndRegexLinter.php
index 70ce944..ca8883c 100644
--- a/src/lint/linter/ArcanistScriptAndRegexLinter.php
+++ b/src/lint/linter/ArcanistScriptAndRegexLinter.php
@@ -324,7 +324,7 @@ final class ArcanistScriptAndRegexLinter extends ArcanistLinter {
    * Get the line and character of the message from the regex match.
    *
    * @param dict Captured groups from regex.
-   * @return pair<int,int> Line and character of the message.
+   * @return pair<int,int|null> Line and character of the message.
    *
    * @task parse
    */
@@ -336,8 +336,19 @@ final class ArcanistScriptAndRegexLinter extends ArcanistLinter {
       return array($line + 1, $char + 1);
     }
 
-    $line = idx($match, 'line', 1);
+    $line = idx($match, 'line');
+    if ($line) {
+      $line = (int)$line;
+    } else {
+      $line = 1;
+    }
+
     $char = idx($match, 'char');
+    if ($char) {
+      $char = (int)$char;
+    } else {
+      $char = null;
+    }
 
     return array($line, $char);
   }

I just tried the patch, and it does indeed fix things for us!