Page MenuHomePhabricator

Threaded comments
Open, Needs TriagePublic


Problem: As a participant in a conversation on a revision in Differential it is difficult for me to gather all the context of one of the many conversations related to the revision.

  • Why? Because I have to search through all the conversations to see which ones apply to the specific conversation I am interested in.
    • Why? Because the individual conversations are not grouped together.

Related to: T11450

Event Timeline

This is a duplicate task, but is open for internal tracking reasons.

eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Aug 9 2016, 5:06 PM

I'd prefer it if another way were found to track internal issues.

Mostly concerned this method (public tasks on our tracker) is not a scalable way for every company that uses Phabricator manage it's interests.

(For now, I'm inclined to let this and T11450 run their courses and then talk about how we can improve meta-issues afterward. I think both of these tasks will serve as good examples of our concerns.)

To start out here, let me just clarify the request: you want threaded top-level comments, versus threaded inline comments? Basically something along the lines of a "Reply" action adjacent to the existing "Quote" action, but which creates formal strong threading instead of a weak indirect reference?

A major part of the goal in asking for root problems is to explore different approaches to an issue. Often, the most obvious approach isn't the best, or a similar approach is about as good but much easier to implement technically, or much easier to maintain, or solves a lot of other problems too. I think this task has sort of cycled back on itself in describing the root problem, since this:

Because the individual conversations are not grouped together. really just restating the feature request ("group individual conversations together in threads").

This doesn't give us much to go on in exploring different ways to solve the problem. Maybe a better exercise is something like this:

Suppose we agree 100% that this feature is desirable, but we've previously signed a contract promising not to develop this specific feature. What are some other features we could develop instead which might also help solve the problem?

To this point in particular:

Because I have to search through all the conversations to see which ones apply to the specific conversation I am interested in.

My expectation is that you always read all of the unread comments when performing a review. The software tries to make this fairly easy: comments are sorted chronologically, so all of the newest comments are in one place, in order. if you previously left a comment, we fold older activity above that last action so the default display is just that last comment (to remind you about the context) and then new things. My personal experience, at least, is that when I arrive on a review page it's easy to find and read all the activity since my last interaction.

I generally expect that the total amount of discussion on a revision is small enough that you can reasonably read it (i.e., in all cases, reading and reviewing the code takes significantly more time than reading the comments).

I also generally expect that the discussion on a revision is fairly focused, and that it's difficult or impossible to really respond in a useful way without reading all of it in most cases. That is, I expect that it is very rare that there are multiple, completely independent lines of discussion about a single change which different reviewers are responding to.

I'd like to better understand how situations arise where there is (a) enough feedback that you can't reasonably read all of it (i.e., the time cost of reading the feedback is on the order of the time cost of reading the actual change) and (b) most of that feedback is sufficiently irrelevant to you don't need to read all of it.

These expectations ("there is less feedback than code", "feedback is relatively cohesive") hold for the overwhelming majority of revisions in my experience.

In cases I can recall where the "total amount of feedback" assumption break down, the review basically got converted into a discussion about some bigger issue which should ideally take place in some other forum but is more convenient to discuss on the review. Offhand, one example of this is D13957.

I think that discussion usually obsoletes the review (e.g. greatly exceeds the scope of the particular change). I think that this is all generally a reasonable workflow and use case (you can mention "see discussion in Dxxx" as easily as "see discussion in Txxx", for example) but that no one is really reviewing the code anymore in almost all cases I've seen once the discussion spirals into a larger topic.

It is harder for me to recall cases of noncohesive feedback. Here are some types of feedback which might be noncohesive, and how I've had success approaching them:

Feedback about style issues: essentially any cosmetic change with no material impact on runtime behavior.

Automatically enforce as much style information as you can in lint or with other automated beautification tools. Write a consistent style guide which applies across your entire organization. Write codebases in a consistent style so that users can mirror it. Review new users' first few changes carefully to help them understand conventions. If a style rule isn't important enough to write enforcement for in lint, it isn't important. Put the onus on reviewers to either write automatic enforcement rules in lint or concede that particular style issues are not important enough to merit discussion.

These approaches aren't easy fixes and require technical investment, but they will almost completely eliminate discussion of style issues. They pay for themselves many times over in the long run.

Examples in Phabricator:

Feedback about API issues: unsafe use of sensitive APIs, incorrect use of APIs in general, etc.

Rewrite your APIs to be safer by default and make unsafe usage obvious. Document proper usage. Use APIs consistently. Limit API discussion on unrelated changes: if linking to the documentation doesn't resolve the issue, treat the problem as a separate problem with the API which needs to be resolved.

Examples in Phabricator:

  • The PhutilSafeHTML system makes XSS difficult to write and easy to detect in almost all cases (requires nonstandard usage of unusual calls).
  • The CSRF system makes writes to any storage engine safe by default.
  • The Query API makes queries safe (apply proper policies) by default so it is relatively difficult to violate policies, and easy to review how policies are being applied and enforced.
  • Internationalization discusses proper use of the pht() API and rationale.

Some API have clever solutions. Offhand, two that I've found useful:

  • PhutilOpaqueEnvelope, combined with APIs which accept and emit these objects, makes manipulating sensitive blobs safe by default.
  • Although we don't fully implement it today, a system like PhabricatorAuthHighSecurityToken where one call returns a token and a later call requires a valid, previously issued token can help make APIs safe by default: it's obvious to the engineer what they need to do, it's easier to do the right thing than do the wrong thing, and it's extremely clear if they work around the issue by artificially constructing a token.

Changes that are too large: Any change which covers multiple different topics and does more than one thing.

Discuss large topics in a different venue before writing code. The cost of deciding what to build and pointing out hazards is usually lower before writing code than in review. Make smaller changes. When users send you changes which are too large, suggest a way to divide the changes and immediately reject them.

There are many techniques for dividing large changes into small changes. Although a large API or storage cutover is occasionally unavoidable, these changes should be exceptionally rare. Structure systems so that gradual change is easy.

Too many cooks: Changes which trigger responses from ~4+ reviewers.

This is likely a result of all of the other stuff. An effective review process should not require numerous reviewers to respond routinely. Make smaller changes which touch less stuff. Move style to lint. Move design/architecture concerns to a different venue: everyone should be mostly on the same page before writing code. Seek feedback about particular concerns earlier. Make APIs harder to misuse. Make deploys safer to reduce the cost of mistakes. Use audit for nonblocking issues. Improve onboarding. Onboard or mentor new engineers until they are making few mistakes. Diffuse knowledge about APIs so fewer reviewers can cover more ground. Discourage use of automatic reviewers and look for automated, technical, or process solutions instead.

A lot of these changes are organizational, process, or infrastructure improvements. They require investment and time, but fixing all of this stuff is possible -- and worthwhile, especially for large organizations. When I joined Facebook in 2007, there was barely a style guide, no lint, unsafe-by-default XSS, unsafe-by-default queries, unsafe-by-default policy checks, ignore-by-default error handling, etc. When I left in 2011, much of this had improved substantially, largely because of the efforts of a relatively small number of engineers (I wrote a style guide, Marcel Laverdet wrote XHP, etc).

I don't know exactly what issues you're seeing, but if they're in these veins I contend that the root problem is not a technical or tooling problem but an organizational, process, or infrastructure problem. There is nothing we can ever do in Phabricator to make 8 reviewers providing feedback on 1,500-line changes streamlined and effective: if the comments take longer to read than the code, I think you have a fundamental process inefficiency which you should examine and remedy.

We could make Phabricator better at dealing with inefficient environments under assumptions of certain types of failure, and it's possible that we should. But I'd like to understand why this is occurring, why fixing the underlying issue isn't possible (or is possible, but will take time, or is possible but too difficult, or whatever else) and structure any response in Phabricator to try to directly attack these problems and drive toward eliminating them, not making them easier to manage.

I am broadly very hesitant to make changes which make Phabricator better for organizations with major opportunities to improve process but worse (harder to use, more complex, more difficult to maintain) for organizations with good attention to process. There are grey areas here, but my current expectation is that any environment where review routinely involves a huge amount of discussion relative to change complexity has big opportunities to improve process, and that making those improvements is far higher-leveraged than papering over them in tooling.

Put another way, we're routinely asked to work around bugs in other software by applying patches to Phabricator, mostly because we're usually pretty responsive and have a short release cycle. For example, T11362 asks us to work around an unusual behavior in some GUI VCS tool, rather than asking that tool to adjust the behavior. Sometimes putting a workaround in Phabricator is reasonable, but I think that in most cases the onus should be on that other software to work correctly. If we took this to an extreme, we'd end up shouldering a huge complexity burden for bugs in other software.

I want to make sure this request isn't working around a set of "bugs" in organizational processes that have much more effective process or infrastructure solutions than tooling solutions. Although we can work around a lot of issues with tooling, many of these workarounds are imperfect and solving the underlying problems would be far more effective overall. Even if we do implement these workarounds, I want to implement them in a way that positions them as workarounds and helps organizations move toward improving processes, not in a way that is indifferent toward or accepting of process which we know can work more smoothly.