Page MenuHomePhabricator

Herald Differential Diffs rule for "Added files".
Closed, WontfixPublic

Description

Context

I'd like to be able to create a Herald rule that enforces the existence of open source license headers on specific file types during arc diff.

My Herald rule thus far

  • Repository is any of rMyRepo
  • Affected files matches regexp (\.(c|cpp|h)$)
  • Added file content does not contain Snippet of license text

The problem: this herald rule leads to false-positives on legitimate source changes. A new function added to a .c file will cause this Herald rule to fail, even if the file is already properly licensed. This makes sense and is a reasonable outcome of the above rules.

What I think might solve my problem

An additional Herald rule for Diffs called Added files with the standard "contains/regexp" options, I would be able to change the Affected files constraint to:

  • Added files matches regexp (\.(c|cpp|h)$)

This would allow me to introduce enforcement of the license for all new source without affecting legitimate changes to existing files that don't already have the license.

I plan to address any source that is missing the license in a separate change.


Perhaps not part of this feature request, but related

I can imagine a sibling rule called Removed files being added as well. This would bring the "file" rules in line with the "content" rules with respect to supporting all of Added, Removed, and Changed.

Event Timeline

eadler added a project: Restricted Project.Mar 5 2016, 7:28 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.

I believe this wouldn't solve your problem: I could add two files, A.c and B.c, and add a header to only one. This would pass the rule but fail what you intend to enforce, I think.

Broadly, Herald is fairly poorly suited for this class of problem, and I'm very hesitant to build out its capabilities here because we already have a pipeline for addressing this class of issue: lint. Lint is the intended tool for detecting and enforcing source rules. Lint will let you write detailed rules, raise inline errors, track existing errors in the codebase, test for errors before submitting code for review, automatically correct errors at arc diff time, etc, none of which will ever realistically be possible with Herald. See T7617 for additional discussion of issues with implementing lint using Herald rules.

On this issue in particular (copyright headers), T2274 is the story of these headers in this codebase. Briefly:

  • We once included them, and shipped with a linter which enforced them.
  • We were able to fight the good fight and get legal to consent to their removal.
  • We removed them in T2035.
  • We removed the linters in T2274, per reasoning there.
  • We lived happily ever after.

Of course, this avenue may not be available to you, or may not be worth pursuing, depending on your legal environment or appetite for and assessment of the underlying risk.

You may also be working on a project where distributing or updating linters is inconvenient, or writing linters in the first place is daunting. These should improve after T10038, via T5055.

Or, you may be working on a project where not everyone uses arc, and you can't or don't want to mandate its usage. It is possible to run lint on the server side, but not currently very convenient. This is something I plan to improve eventually, but I don't think there's a dedicated task for it right now (interest has been mixed; I think installs dedicated enough to invest in linters are generally also invested in arc?). If this is interesting, I can at least start up a task to consolidate that.

Finally, you can write a custom Herald field like "Any added .c, .cpp or .h file is missing a copyright header" by subclassing HeraldField or some subclass of it. DifferentialDiffAffectedFilesHeraldField might be a starting point. This would let you write this rule:

When:
[Any added .c, .cpp or .h file is missing a copyright header][is true]
Take these actions:
[Block diff with message:]["One of the files you added is a .c, .cpp or .h file but doesn't have the right copyright header."]

Even this is fairly bad/hostile since it doesn't tell you which file was a problem, although you could get that into the transcript.

Alternatively, you can write a custom Herald action like "Block diff with a custom message if any added .c, .cpp or .h file is missing a copyright header", then write this rule:

When:
[Always]
Take these actions:
[Block diff with a custom message if any added .c, .cpp, or .h file is missing a copyright header]

But this pathway leads into madness. Lint is a far better solution to this class of problem on every dimension except initial setup cost.

Upshot:

  • You may be able to moot this issue by removing the copyright headers entirely.
  • If at all possible, don't use Herald for this; use lint instead.
  • Linters will be easier to write, distribute and update after T10038.
  • Linters will be easier to run server-side at some point in the future.
  • If you're intent on using Herald, you can write a custom, hard-coded field or action that does exactly what you want.
  • I'm strongly resistant to making Herald better at linting in the upstream, since I believe there's a huge mismatch between Herald's capabilities and the lint/source analysis problem domain.

You may be able to moot this issue by removing the copyright headers entirely.

Just read through T2035. I'll take all the points mentioned there into consideration, though some of the points in favor of license headers may apply to our case.

If at all possible, don't use Herald for this; use lint instead.

Will do.

I had written a linter for this previously but โ€” as you correctly identified โ€” we haven't solidified on a way to deploy custom linters across our team and maintain version consistency. I fell back to Herald as a result of exploring the possibility of enforcing it server-side.

Will keep an eye on T5055. In the meantime, I may look into creating a git repo for the team that maintains our arc extensions as submodules and add the license linter there.

Thank you for unwinding my assumptions here. Much appreciate the detailed response as always!

epriestley claimed this task.

Sounds good. I'm going to close this out since we don't plan to pursue the particular rule it describes and I think other related work is already covered elsewhere.