Page MenuHomePhabricator

permit wildcards in OWNERs paths
Open, Needs TriagePublic

Description

One of the installs I maintain, we requires ownership of */.directory. In another org we have diffs that sometimes come from the root of the repo head/foo/bar and other times come from branches branches/baz/foo/bar and still other times come from a partial clone of head (foo/bar). We currently either have to specify all the paths but having wildcards or general regular expressions would be great.

Event Timeline

eadler added a project: FreeBSD.
eadler moved this task from Backlog to Nice To Have on the FreeBSD board.

I suspect it isn't going to get very far because we lose our ability to optimize lookups (I wrote this in the general case as "regexp", but mean "regexp/wildcard" -- wildcards aren't as complex but I think the general case still holds).

Particularly, when a user pushes a commit affecting N paths and we have packages containing M regexp rules, I don't see a way to avoid evaluating O(N*M) regexps in the general case (in some cases we can reduce this "a bit", but since we often need to know what a package's longest match is -- not just whether it matches or not -- there probably aren't many great shortcuts). While M will be small to start with and N is usually small, they're both unbounded and values like N=1000000, M=5000 aren't unimaginable, which would require us to evaluate 5B regexps. I believe our current matching cost is approximately O(N + M) on all reasonable, non-degenerate inputs.

Evaluating O(N*M) regexps isn't necessarily impossible since this evaluation happens in the daemons and 1-million-path commits are rare, but it's not attractive.

In the specific case of */.directory, you should be able to just write a Herald rule for this that adds a project (or two rules for revisions + commits) instead of using Owners. That seems reasonable if the number of cases like this is small, which it presumably is, and the difficulty of doing this makes it hard for M to get out of control.

In the specific case of a partial clone, we should be automatically "repairing" paths to be absolute to the real working copy already. I'm not sure if we currently do this successfully in whatever case you're looking at, but the expectation is that we'll hand Owners full paths from the root of the repository if we have a reasonable way to figure this out. In SVN with diffs submitted by arc, this "should" be happening automatically. If it isn't, I'd prefer to fix that.

That doesn't leave a particularly good solution for branches/x vs trunk/x (nothing better than "list them both", at least).

@pingpongboss, in T11787 it looks like you actually want to wildcard the repository names, not the paths?

A fix for that specific problem might be something like letting you specify either repositories or projects as containers. If you specify a project, it means "any repository tagged with this project". Then you'd tag all your team1-xyz with #team1 and could use that to trigger other behaviors too (e.g., Herald) -- I believe Herald already supports "Repository projects include: ...".

Of course, you could still end up with the same problem (forgetting to tag the repository with a project).

Since other installs haven't expressed an interest in this (and the UI needs some adjustments, I think) I imagine we won't get to it any time soon, although I think it's conceptually reasonable.

We could also add a special token like "Any Repository" for owning /LICENSE or something, although I think neither the original use case or your use case would benefit.

You should also be able to use the Conduit API to automatically populate rules, although that's a bit involved. But it could query repositories, do the string matching, and then write all the ownership rules into Owners and you could just cron that to run every night or whatever.

Yes, you are correct, I want to wildcard the repository names. Thanks for your alternative fixes. The low priority is not a problem.

I don't suppose there's been any interest in adding wildcards/function matchers to Owners in the last couple of years?

I currently have a bunch of herald rules that I'd love to convert to Owners packages that rely on the "Repository's Projects Include..." functionality; I would absolutely love it if Owners had the ability to wildcard repository names.

I would also specifically like "Any Repository" specifically for the "LICENSE" / "LICENSE.TXT" / "COPYING" case that you mentioned.

After D19191 it would be relatively easy to support projects and wildcard functions (like "Any Repository") but there's currently no interest in this from customers.

Yeah, that was the commit that reminded me how I wanted this. Unfortunately, it looks like the database still stores literal repositoryPHIDs, so the schema would need to change if it were to handle functions (which is why I bumped the ticket instead of taking a stab at implementing it myself first). I figured I'd add my voice to the ticket for now and continue toiling along with complex Herald rules for now.