Page MenuHomePhabricator

Consolidate readers of "differential.revisionID" property

Authored by epriestley on Apr 23 2019, 5:28 PM.



Depends on D20467. Ref T13277. Currently, the "MessageParserWorker" writes this property on commits, then Herald and Audit both read it.

Make them share code so this property has one writer and one reader. This property isn't great, but at least now the badness is hidden.

Currently, we can't just use edges because they may not have been written yet. I am likely to just do this, soon:

  • Just write the edges (in "MessageParserWorker").
  • Hide the edges from mail.

However, we'll sort-of lose the "revisionMatchData" explanation thing if I do that. Maybe this is fine? But when commits match because hashes match, it legitimately isn't obvious.

For now, just reduce the amount of harm/badness here.

Test Plan
  • Ran bin/repository reparse --publish ....
  • Ran a Herald "Audit" rule using the "Accepted Differential revision" field.

Diff Detail

rP Phabricator
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

epriestley created this revision.Apr 23 2019, 5:28 PM
epriestley requested review of this revision.Apr 23 2019, 5:30 PM
epriestley edited the summary of this revision. (Show Details)Apr 23 2019, 5:30 PM

The "how do we store this data" question kind of relates back to T13056. This is a case where we have edge-like data (Commit Has Related Revision) but we also have metadata (why the two are associated). This currently isn't stored as edge data, but a possibly cleaner approach would be:

  • The "MessageParserWorker" writes these edges, with explanations in edge data.
  • Audit/Herald can then read these edges without worrying about parse sequence issues.
  • The UI can show "commit X is linked to revision Y because reason Z".

One reason not to do this is that I'd like to get rid of edge data (T13056). There are almost no remaining readers or writers of edge data, and essentially all historical use cases eventually outgrew edge data into a separate table.

Maybe "object relationships" should not be edges. Arguments for moving them to something else:

  • We could store this metadata on a "Relationship" object. Particularly, the useful metadata here is "Commit X has related revision Y because commit X has hash Q, and related revision Y also has a diff with hash Q". This is not obvious from context.
  • The "Relationship" object could enforce T3577, where we want to let revision X have exactly one type of relationship to task Y, i.e. it can't have both a "fixes" and a "reopens" relationship.
  • We could do drive "Fixes" in the commit pipeline using an "uncommitted relationship" edge or similar which would respect web UI edits, rather than parsing magic strings from commit messages.

Arguments against:

  • These relationships sure are a whole lot like edges.

I guess I lean slightly toward maybe moving these to some other datastore, but hopefully not in the scope of this set of changes.

One note here is that HeraldPreCommitContentAdapter has some similar logic, although it's more low-level.

amckinley accepted this revision.May 1 2019, 1:58 PM
amckinley added inline comments.

Move this comment to loadRevisionForCommit?

This revision is now accepted and ready to land.May 1 2019, 1:58 PM

The comment is pretty specific to Herald (where $viewer is omnipotent) and loadRevisionForCommit() has other callers where $viewer is legitimate, so I think moving the comment would reduce clarity on the balance. But let me update it a little bit, since it's less clear now than it was before.

Specifically, the attack this comment is describing is this:

  • You do not have permission to see D123.
  • You want to learn about D123.
  • You write an "if any condition matches" Herald rule using every "Related revision..." field, matching against an impossible value, with "Do Nothing" as an action.
  • You push a commit containing Differential Revision: https://.../D123.
  • You review the transcript for the commit.

This discloses some information about the revision, even though you can not see it. Today, it discloses:

  • The existence of the revision.
  • The accepted state of the revision.
  • The build state of the revision.
  • Packages affected by the revision.
  • Reviewers on the revision.
  • Subscribers on the revision.

This is a nontrivial amount of leaked information, and this is not information which we let you learn in other ways.

For now, we accept that this is reasonable: none of this information is that interesting, and you have to jump through a lot of hoops and leave a lot of evidence to get it.

I think the only "fix" for this issue would be to have a rule like "a commit may only be associated with objects that the user who pushed it can see", but this can't realistically be a default rule since it won't work with observed repositories.

epriestley updated this revision to Diff 48874.May 1 2019, 4:07 PM
  • Clarify the comment.
  • Adjust the cache behavior slightly.