Page MenuHomePhabricator

Plans: 2018 Week 17/18 Bonus Content
Closed, ResolvedPublic

Description

These are discussed here, but didn't make it into either release:

See PHI592. A diff with 600MB of videos required more than two hours to turn into an email patch, which exceeded the task lease duration and cascaded into various other kinds of problems. This is somewhat adjacent to T11767.

See PHI580. An install would like write API access to object relationships (depends on, fixes, etc). This is complicated, but perhaps plannable, and we may be able to put a reasonable facade with partial support into the upstream.


Completed

See PHI617. This appears to report a trivial bug in the new Harbormaster log UI, where clicking the chevron icon has different behavior from clicking the text ("Show More Above") next to the chevron icon. The chevron behavior is broken.

See PHI619. This appears to report a trivial bug where executing a Herald test console run when an active rule uses a "Content Source" field fails because test runs from the console don't have a content source.

See PHI615. This reports a bug where "Lease Working Copy" build steps incorrectly report "Built Instantly" even when they didn't. This is probably not harder than it looks.

See PHI483. The available actions from "Changes Planned + Draft" (and maybe "Abandoned + Draft") are currently too permissive. Fewer actions should be available; particularly, "Accept" should not be available and the actions should generally be more similar to the actions available from "Draft".

See PHI285. An install would like a way to monitor changes to Herald rules. This mostly needs planning. The actual implementation may not be difficult, it's just not clear if "Herald rules for Herald rules" are the best approach.

See PHI285. Mail about Herald rules does not actually include a link to the rule.

See PHI590. See T13128. The policy tag in the header of Phriction documents could be more useful than it currently is. This is potentially the tip of a nest of vipers, but improving the Phriction behavior narrowly would be a good first step.

See PHI598. See T13110. An install hunted down some specific performance issues with very large changesets in Differential. At least some of the big ones are likely trivial to get rid of.

See https://discourse.phabricator-community.org/t/unable-to-create-owners-package-with-same-path-in-multiple-repositories/1400. I was able to reproduce this.

See T13132, an issue with arc download and CSP changes.

See T13135. See PHI633. An install encountered an issue with playing certain "video/quicktime" videos in Chrome.

See PHI604. See also T13126. An install is hitting the "large blame does not degrade gracefully" issue mentioned in that task, and would also like a way to selectively disable blame (which I was leaning toward implementing but don't think got written down anywhere).

Internally, it would be nice to better document the Phabricator query layer. (Ongoing; see T13133.)

See PHI251. This requests an "Ignore generated changes." option for Owners. There may not be a direct path to this today, but this is almost certainly a plannable feature. This is a bit rough.

Revisions and Commits

rARC Arcanist
D19408
D19407
rP Phabricator
D19428
D19427
D19426
D19425
D19420
D19418
D19416
D19414
D19413
D19412
D19411
D19410
D19403
D19400
D19398

Related Objects

Mentioned In
T13189: Plans: 2018 Week 35 Bonus Content
T13187: Plans: 2018 Week 34 Bonus Content
T13164: Plans: 2018 Week 31 - 33 Bonus Content
T784: Allow Differential changesets to be marked with various attributes
T13151: Plans: 2018 Week 23 - Week 30 Bonus Content
T13137: Plans: 2018 Week 19 - Week 22 Bonus Content
T2312: Modifying PHP
T13128: Phriction document header doesn't render policy strength variations
Mentioned Here
T13137: Plans: 2018 Week 19 - Week 22 Bonus Content
T13133: Phabricator Query Layer Overview
T13135: Chrome does not play some videos with type "video/quicktime", but does play them with type "video/mp4"
T13132: `arc download` broken for non-CDN Phabricator
T13110: Plans: Hefty Differential revisions, draft state transitions, and exotic interactions
T13126: Plans: Diffusion cleanup for document engine
T13128: Phriction document header doesn't render policy strength variations
T13065: Move storage for `mailKey` to the Mail application
T8644: Herald template / recipe book / examples / quick-create tool
T8661: Unclear distinction between Herald rule and Owners package
T13056: Remove "data" from edge infrastructure
D19372: Add a rough "!history" email command to get an entire object history via email
D19019: When a change removes recipients from an object, send them one last email
T6994: Write a general "Security guidelines" document
T784: Allow Differential changesets to be marked with various attributes
T11767: Unify email patch byte/line/time limit behaviors across Differential/Diffusion

Event Timeline

epriestley triaged this task as Normal priority.Apr 19 2018, 9:45 AM
epriestley created this task.

See PHI251. This requests an "Ignore generated changes." option for Owners.

Historically, an issue with this was my hesitance to add more "Add Owner As Reviewer If Moon is Full" builtin rules to Owners: Herald already allows you to implement arbitrary rules with Owners, and an implementation of Herald where every possible rule is a dropdown option obviously isn't very good.

However, we shipped the "Non-Owner Author" stuff and the sky didn't fall, and there's some amount of room to give Owners more default behavior -- I just want to make sure there's some basic barrier of reasonableness that rules need to satisfy to become default behaviors.

I think this meets that barrier: it's easy to understand, it's a generally reasonable behavior, and it's not a behavior which can be written with Herald.

A point against it is that you can encode this behavior with "Exclude" rules: this doesn't do anything you can't already do, and there's already an API for doing it. However, there are reasonable cases where enumerating and synchronizing all generated files is difficult, and if we imagine a future which handles "Generated" tags and other classes of tags more cleanly this would quickly start to become a DRY issue.


The future of "@generated" is expanded support for tagging files with metadata, described in T784. Although use cases for other types of categories ("Third Party", "folded/less-important but not actually generated") are currently somewhat weak, I think this class of use cases is broadly a good one and that some other problems may be solved by such a system. One example is PHI582, where files with more than 4MB of text changes currently get marked as "binary". Although we need to degrade these changes out of normal text diff handling at some threshold, marking them as "binary" is misleading and a barely-satisfactory solution to the problem. A "huge text file" mark using the same system might be a much better approach.

PHI64 also discusses supporting overrides for "encoding" and "highlight", which I think are reasonable.

I think this just means that [ ] Ignore generated changes. may become Ignore changes with attributes: [ generated ] in the future. Possibly we just build this as the UI today, especially to support the API.


Another issue is that this rule allows attackers to skip triggering review/audit by marking a file as @generated. For now, we can probably just accept this: if you're concerned about this attack, enumerate @generated files with "Exclude" rules instead.

In the future, we could provide an option to let repositories disable client tags. This would stop any runtime-detected attributes from working (for example, arc could no longer mark a file as @generated based purely on the file content) and apply only server-side rules instead. To change the rules, you'd need to either edit the repository or commit to the repository, depending on how this is actually implemented. Under this scheme, you'd make a tradeoff to make categorizing files more difficult in order to prevent attackers from abusing file categories from the client.


A practical technical issue is that files are currently marked @generated only at render time, so Owners evaluation can not cheaply exclude generated files: to know if a file is generated, you must load all the data and look for @generated today.

For Differential, I think this can move to a diff step easily enough. A cost is that dirtying this cache will be difficult/impossible, e.g. after a change to differential.generated-paths. I expect this to be less important in the future, and that it's reasonable to say "changes to this option are not retroactive", just like you would not expect editing some .arc-generated-rules file in a repository to be retroactive.

There is no convenient place to cache attributes in Diffusion today, but it's probably reasonable not to cache this for now.

I think we can store this on the existing changeset metadata property without a schema change.


So I think this looks like:

  • Add an "attribute extraction" phase when building a Diff. This can probably go immediately above the existing copied code detection, which is fundamentally similar.
  • Extract @generated here.
  • Store it in metadata. Today, distinguish between sources (trusted vs untrusted).
  • Provide an accessor.
  • Add an Ignore changes with attributes: [ generated ] control to Owners.
  • Implement the actual ignore logic in a proper, general way.
  • Wrestle this into approximately similar shape in Diffusion, and/or call the option "Ignore revision changes with attributes: ..." for now.

This should extend gracefully in to T784, PHI582, etc.

See PHI285. An install would like a way to monitor changes to Herald rules.

One option is to write "Herald Rules for Herald Rules". This is probably not very difficult technically, but it seems excessively clever.

One option is to write something like an Application-level "watchlist", where you "Subscribe" to an application and automatically get notified about all changes. For example, you could Subscribe to Phame or Phriction to easily get all changes to blogs or wikis without having to get involved with Herald. This is nicely general, and seems like it could be useful, especially for smaller installs. A problem is that it's not very flexible -- and the original request in PHI285 is for a way to monitor global rules, not all rules. It also only "seems" like it could be useful: I'm not really aware of any particular problem that this is actually a good tool for attacking today.

So I'm inclined to actually just implement "Herald Rules for Herald Rules". Rules themselves aren't special, and this feature isn't re-entrant or conceptually difficult. The only problem is that if you use a Herald rule like "Send me an email any time a global rule is edited.", and someone edits that rule, the version of the rule which applies will be the post-edit version, not the pre-edit version. Thus, an attacker can disable this watchrule by editing it away first, then making the other changes they want to make.

After D19019, a reasonable way around this is to also subscribe to the rule itself. So a complete approach would be to write a rule like this:

When:
[ Rule type ] [ is ] [ Global ]
Send an email to:
[ Watchful Eye ]

Then subscribe "Watchful Eye" to the rule itself.

This isn't necessarily obvious/intuitive, but seems easy to explain if we have a couple of paragraphs, and basically reasonable and like something we can make obvious enough through documentation, guidance (e.g. T6994), and maybe a UI callout with a reminder hint: users will rarely be in this interface, so it's not exceptionally burdensome to encumber it with some hint text ("If you are viewing a global Herald rule for Herald rules that you aren't subscribed to, show a reminder that edits to the rule can defuse it.").

We can also probably back away from this cheaply if it turns out to be a bad idea or some other system later surpasses it: adding Herald Rules for Herald Rules doesn't create any sort of complicated technical or product burden.

I'm inclined to just pursue this for now: even though it feels too clever by half and has the rule-defusal gotcha, I think it may reasonably be the least-bad solution, and it isn't too much of a commitment.

See PHI592. A diff with 600MB of videos required more than two hours to turn into an email patch, which exceeded the task lease duration and cascaded into various other kinds of problems.

This runs into a chain of related issues, plus more in T11767.

A 600MB patch should not take 3 hours to encode. PHP is a toy language for babies but it's not that ridiculous. We should reproduce this and fix whatever the problem is (buffer appends or otherwise), or clearly document why we actually can't fix it.

Doing that might be good enough on its own practically, since this problem is rare to start with.

After encoding the patch, we immediately throw it away. The byte limit should be passed down to the patch generator and it should abort early if it detects the patch will exceed the limit.

Arguably, we should not include binary changes in these patches at all. This seems especially debatable for patches generated for inline-patches, which I believe are mostly consumed by humans, not applied on the CLI. It seems less clear for attach-patches, which feel like they should make a more authentic effort to actually apply the entire change. But if we pushed byte limiting down into the generator this would just be a refinement, to let us include some amount of the most useful patch information in email more often.

Conceptually, I'm broadly not excited about either of these configuration options (inline patches or attach patches). You can always click once to see the changes, and click twice to get a patch. GitHub doesn't support either an "inline" nor an "attach" option and I've never seen anyone complain about it or ask for support, nor laud Phabricator for its support of these features. I can't find a corresponding issue anywhere in isaacs/github either. I don't hate these enough to try to remove them, but it does make it hard to get excited about really pushing through and solving all the issues here and in T11767. This is perhaps somewhat adjacent to the "subversive options to help engineers get around VPNs" discussion from D19372, although I think these options are not primarily VPN subversion options.

See PHI580. An install would like write API access to object relationships (depends on, fixes, etc). This is complicated, but perhaps plannable, and we may be able to put a reasonable facade with partial support into the upstream.

This is difficult because different object types can have several different types of relationships. The most complicated relationship is between revisions/tasks/commits, where <Verbs> Txxx as <status> (e.g., Closes T123 as wontfix) in the text of a revision causes the corresponding commit to close with the specified status.

This relationship is not fully represented in edges today, and is not representable with edges after T13056, since an install can define an arbitrarily large number of task states and each edge can only represent one type of relationship. I don't think this use case is compelling enough to motivate retaining edge data so I'd ideally like to rule that out as an approach.

A wrench in the gears is that these relationships can already be read as edges with edge.search. When a revision Fixes T123, we write a "Revision is related to task" edge, and then separately process the exact terms of that relationship ("...as <status>") when parsing the corresponding commit.

Some of these relationships are also (possibly?) related to one another in more complex ways. For example, X reverts Y and X resurrects Y accept either a revision or a commit as "Y", and should probably treat either as "the group of revisions and commits associated with the specified commit or revision", i.e. "reverting" a commit should also "revert" the corresponding revision. Maybe. Perhaps a better example is that X depends on Y and Y reverts X is nonsensical? (Although probably not particularly harmful.)

In an ideal world, <verbs> <object> as <status> should have a complete database representation and be readable/writable, but this probably implies a very-nearly-like-edges table for every object which supports relationships, and like any flavor of it is pretty much walking back the plans in T13056, which feels bad. On the other hand, moving away from edges breaks edge.search and also feels bad, since this works well in most cases.

Perhaps we could also just abandon this feature, and say that there are two types of relationships only ("Ref" and "Closes/Fixes"). I sort of like this? I suspect Closes Txxx as wontfix is virtually unused. The full "...as <status>" is also challenging to internationalize.

I'll think about this some more, I still don't have a clear pathway forward here.

See PHI251. This requests an "Ignore generated changes." option for Owners.

Historically, an issue with this was my hesitance to add more "Add Owner As Reviewer If Moon is Full" builtin rules to Owners: Herald already allows you to implement arbitrary rules with Owners, and an implementation of Herald where every possible rule is a dropdown option obviously isn't very good.

This goes back to T8661/T8644 - Owners is more or less "Simplified, Limited, Herald."
Maybe it should officially be that: A GUI for building complicated Herald rules:

  • Each enabled Owners Rule (Option?) becomes a real Herald Rule
  • This Herald rule has an Edit policy of "The Owners object" (Effectively "Nobody")
  • The UI says "This rule is managed via Owners Package 23: Kernel Psychosis"
  • Changes to the package are automagically reflected to the Herald rules

Offhand, two possible issues with that:

  • At least one install has an enormous number of packages (tens of thousands? Hundreds of thousands? Seventy-two billion?) and the minimum cost for a package is much, much smaller than the minimum cost of a Herald rule. If packages actually created Herald rules this install would probably explode.
  • [ ] Ignore generated files. can't actually be expressed as a Herald rule. This is surmountable, but we'd have to add some field like "Affected packages, ignoring generated files". Maybe doable but doesn't feel especially clean to me, especially if this is later followed by "Ignore files with attributes: [ third-party ]" or similar, too.

This would bridge the two sets of capabilities a little more cleanly from a UX perspective, but I think the leap between the two isn't causing too many problems today. Some recent-ish iteration of the documentation also made it much more clear that Herald is how to evolve out of Owners, e.g.:

The intent of this feature is to make it easy to configure simple, reasonable behaviors. If you want more tailored or specific triggers, you can write more powerful rules by using Herald.

One option is to write "Herald Rules for Herald Rules".

Before these can send mail, they need a mailKey. I think I'm going to do T13065 first, so they don't need a mailKey, since I don't want them to need a mailKey.

Offhand, two possible issues with that:

  • At least one install has an enormous number of packages (tens of thousands? Hundreds of thousands? Seventy-two billion?) and the minimum cost for a package is much, much smaller than the minimum cost of a Herald rule. If packages actually created Herald rules this install would probably explode.

13,636.

A 600MB patch should not take 3 hours to encode. We should reproduce this and fix whatever the problem is (buffer appends or otherwise), or clearly document why we actually can't fix it.

I can't immediately reproduce this, so I may have been making bad assumptions about some of the behavior here.

Locally, a 10MB patch takes 13.5 seconds to encode. A 100MB patch takes 130 seconds to encode -- so this is pretty much an exact match for linear cost.

These aren't fast by any means, but a projected ~13 minutes to encode a 600MB patch is way more reasonable than 3+ hours.

I'm going to see if I can improve this anyway, but this leaves the general question of the 3-hour hang sort of up in the air.

epriestley renamed this task from Plans: 2018 Week 17 Bonus Content to Plans: 2018 Week 17/18 Bonus Content.Apr 27 2018, 11:42 PM

An install is hitting the "large blame does not degrade gracefully" issue mentioned in that task

I'm not actually able to reproduce this -- it seems like blame is timing out and degrading properly for me locally. It could theoretically be better, but the current behavior seems reasonable. I'll follow up with the reporting install.

Some of this moved forward to T13137.