Page MenuHomePhabricator

Provide more complete guidance on using regular expressions in Herald
Open, LowPublic

Description

Reproduction steps

  • Attempt to use Herald as a new user
  • Become extremely confused and frustrated

Version Information

  • The current live docs

Struggle with this yesterday and today until I figured out what was going wrong.

When creating a Herald rule to mail me when certain code pattern appear I couldn't for the life of me get it to run against a commit in the Test Console that I knew to contain this pattern. I was using matches regex (for both the affected files and the diff contents) and it was unclear as to whether I should of been writing:

  1. /\.js$/
  2. "/\.js$/"
  3. "/\\.js$/"
  4. @/\.js/@
  5. @/\.js/@
  6. @/\\.js/@
  7. @\.js@ <- The correct answer for matches regexp

Telling me that I need to use the enclosing character "@" is fine. Giving me an example (in the error messages) that says the following is fine as well.

throw new HeraldInvalidConditionException(
  pht(
    'The regular expression "%s" is not valid. Regular expressions '.
    'must have enclosing characters (e.g. "@/path/to/file@", not '.
    '"/path/to/file") and be syntactically correct.',
    $condition_value));

https://github.com/phacility/phabricator/blob/87f663ef77b4120debb1c1de48d24650776d6a60/src/applications/herald/adapter/HeraldAdapter.php#L540

But it wasn't clear to me that I needed to use option 7 as listed above.

Could you please place some more information about the finer points of using this feature in the docs. I struggled to figure out how to use the matches regexp pairs as well. Overall there should be some real examples of things people would want to do... like me wanting to hone in on just certain files types... and then honing in on a particular word/pattern.

There is an example in there for matches regexp pairs which is what I managed to use in the end, not without a lot of hassle of back and forth to the Test Console with no clear explanation as to why it wasn't finding a pattern that definitely in the code.

Event Timeline

Form (1) is expected to work, which is why we don't document a need for form (7): one is not expected to exist. If form (1) does not work, this is a bug with Herald itself, not with the Herald documentation.

How can I reproduce the issue you encountered with form (1)?

Here's what I tried. I wrote this rule:

Screen Shot 2016-08-31 at 5.23.38 AM.png (1×2 px, 377 KB)

Then I created this revision:

Screen Shot 2016-08-31 at 5.23.58 AM.png (1×2 px, 401 KB)

It appeared to work correctly:

Screen Shot 2016-08-31 at 5.23.33 AM.png (1×2 px, 357 KB)

@epriestley

I think I'm going insane. I was wrangling with that for ages and as you pointed out /\.js$/ does in fact work. I have no idea how I did every permutations of that regex except that one.

I have to say though going from:

\.js$ to @\.js$@ or /\.js$/ still caught me by surprise.

I think the error message:

(e.g. "@/path/to/file@", not "/path/to/file")

Should be tweaked to represent a real regex (as those forward slashes should be escaped no?).

(e.g. "@\/path\/to\/file@", not "/path/to/file")

Maybe I'm just a regexp noob but I do think there is value in at least throwing in a few examples on the docs page. What is obvious to you isn't necessarily the experience for everyone else.

The reason we suggest @ as a delimiter is that you don't have to escape forward slashes if you use a delimiter other than /, so you get a more readable regexp.

That is, the regexp grammar allows you to to select from a wide range of delimiter characters. / is the most common delimiter character, but choosing it gives you pretty messy regexps when you're matching paths, since you often want to include / as a literal slash in the regexp.

I think a lot of users don't necessarily know the full rules for delimiters (roughly: pick any symbol; start and end with it, then add flags; escape it inside the regexp; more magic for (...)), and reasonably believe / is the only valid delimiter (which is arguably somewhat true in Javascript because of the syntax magic built into the language, but not true in Perl, PHP, or any other major delimiter-using context I'm aware of) and/or haven't run into the the connection between delimiters and internal escaping before. So it's possible that suggesting @ is more harmful on the balance (by increasing confusion about delimiters) than helpful (by reducing errors/confusion from additional required escaping).

A more confusing thing we could do is suggest (...) as delimiters, which are valid, but which look like a capturing group and which everyone assumes mean "capturing group, no delimiters required" when they actually serve as valid-but-magic delimiters. The advantage of these delimiters is that no symbols need to be escaped internally.

Ultimately, we aren't trying to do anything special/weird here and I think our behavior is the least-magic one for anyone familiar with Perl or PHP regexps, at least. Apparently this specific situation is common enough to have a Wiki page, although it feels a little bit made up to me:

https://en.wikipedia.org/wiki/Leaning_toothpick_syndrome

A possible alternate attack on this is to try to teach users more about regexps and, e.g., explain the delimiter rules. I personally resisted really getting comfortable with them for a long time since I knew how * and + worked and it seemed like all the bizarre syntax wasn't useful very often and I could always go look it up and sort of muddle through, but I feel like eventually making a bit more of an effort to figure out how most of the behaviors work has actually been quite a bit more valuable than I expected.

I'll leave this open for now since I'm not exactly sure where to take it yet. I'm going to file it under T8644 and may just merge it into that (e.g., write a "here's a simple regexp that works" recipe) when the time comes.

epriestley edited projects, added Documentation, Herald; removed Bug Report.
epriestley renamed this task from Herald docs require better information around how to use contains, matches regexp, matches regexp pairs...etc to Provide more complete guidance on using regular expressions in Herald.Sep 1 2016, 2:26 PM

Hey @epriestley,

Sorry for the late reply. What an incredibly well written and thoughtful response! Thanks!

I guess there is a lot about regexp that I didn't fully understand before trying to use Herald to do what I was doing. I had thought I'd known quite a lot but you definitely opened my eyes a lot with your answer. It would be great for the less experience to have a touch more guidance. Perhaps at the very least they might stumble on this bug/your response and it'll click for them like it did me.

Thanks again,
Matthew