Page MenuHomePhabricator

Add a "Duplicates" tab to the "Related Objects" section on task page
Closed, ResolvedPublic

Description

The root problem is that when dup'ing a number of tasks against a single one, I can't see at a glance all the dups but have to scan the page's long history which is error prone.

Event Timeline

Why is it important to see all the duplicates at a glance?

Maybe a more useful form of this would be some form of being able to view "how all related objects are related to this one". You can currently see related objects but you cannot easily tell the nature of that relationship for some of them. The task graph solves part of this for dependencies but not for other relation types.

As to why it's important I think dependes on the style of project management being employed. Some places keep track of the dupe count as a way to gauge how often an issue is being seen or how many separate requests for a feature have shown up. A task with a lot of dupes might end up getting higher priority because of the number of dupes.

I'd rather solve that problem with Facts.

Wouldn't that split up important data? A project manager will normally always be looking at the project board and moving tasks around there. Would it be cumbersome to have to go to some other application in order to do a query to figure out the relations on this one object? Not sure how you guys are thinking about Facts but going somewhere else for that info might be too cumbersome to do often. Especially if I'm jumping around a group of tasks in projects already. I would want that information presented there as well.

Maybe I'm not seeing the bigger picture with Facts but other task trackers/managers provide this data on the tasks/boards for this very reason. Facts aside, this seems more of a request to better display the information already in Maniphest not add any additional information.

I was referring to "A task with a lot of dupes might end up getting higher priority because of the number of dupes.". This kind of data should be pulled by facts.

Anyways, I don't know what the root problem of this task is, like @epriestley mentioned above. Knowing the problem this information solves is what we're most interested in. It might be Facts, it might be something else. Basically, what do you do with this information? Why is it important? How often do you need it? What decisions do you make differently with it? Can subscribers be a better proxy for this information because it includes both hard duplicates and people who proactively found and followed the task?

We've taken a few passes at building some sort of "interesting-ness" filter with tasks using different weights and data points, but at the end of the day it didn't change or alter our roadmap. Just because something had high point value didn't mean it was going to have the most impact. YMMV ๐Ÿš™

Yeah: if it's important to identify duplicates because you use duplicates as a priority signal, making it easier to see a list of duplicates on the task detail isn't a very powerful approach.

For example, if we believed that surfacing duplicate counts for use as a prioritization signal was important/valuable, we might do these sorts of things instead:

  • automatically count duplicates;
  • put a large "35 duplicates" badge in the task header for tasks with duplicates (and/or search results, workboard cards);
  • let you order tasks by duplicate count;
  • expose duplicate counts over the API;
  • let you query for tasks with "at least X duplicates";
  • expose duplicate counts to Herald so you could automatically tag or prioritize tasks once they reached some threshold;
  • ...and so on.

These are all better and more powerful ways to surface duplicate counts as a priority signal than adding them to "Related Objects" is.

We can only make product decisions like this once we understand the root problem we are trying to solve. When you file a task like "put duplicates in related objects", we only have one solution available to us, and we can't change the feature in an intelligent way if we eventually reorganize Maniphest and remove "Related Objects" because we have no record of why we added the feature.

If you file a task describing a desire to make duplicate count more available as a prioritization signal, we can consider a dozen different product changes, pick the ones that make the most sense (given our roadmap, other features, other problems, implementation difficulty, product fit, etc), and evolve the tools we provide to solve the problem as the product changes (for example, with a discussion of how duplicates serve as a priority signal, we can make intelligent choices about what related features to build when we build a new application like Facts).

In this case, to step even further back, we'd want to see discussion of why duplicates are a useful prioritization signal in your environment so we can understand if there's an even bigger issue. My experience here and in other environments is that duplicates mostly point at highly-visible bugs (if something is obviously broken, it tends to get duplicate reports), not necessarily high-priority issues, so it isn't self evident that high duplicate counts are a good priority signal (if it was, we'd probably already surface them).

If duplicate count is a good signal in your environment, perhaps it's because you have unusual practices or workflows which make your duplicates different than our duplicates. If you do, and you've successfully imbued duplicates with a useful prioritization signal, we might want to change how our workflow works. We can't know any of this if you file a "put duplicates in related objects" request.

Our request for root problems is a throughly considered request based on many years of developing this software and learning from mistakes with handling user feedback, not a dark plot to waste users' time.

@epriestley I was just jumping on here to provide some simple examples and reasoning from my experience on our internal bug tracker (we don't use Maniphest). Note this isn't my bug. Just trying to help out someone by giving an example for reasons they could want this. I'm sure your response was well intentioned but does come off a bit user hostile. For instance, nowhere did anyone mention wasting time by having a constructive conversation about this. It gives the impression that you guys consider having to deal with the community is a waste of your time. And no one is arguing against the root problem concept.

I don't think I've ever experienced anything like this in any other open source communities. It may be tough to handle a large community but you need to accept that you will never ever be in a place to always get perfect bug reports or feature requests. So you should try and encourage a process that allows the community to take a given idea and build it up until it's in a good enough state to be reviewed (maybe look at how projects like Swift are handling this with their discussions and evolution proposals).

Instead this only gives the impression of stifling discussion because people haven't been able to properly express themselves yet. As if it's somehow just wrong in the phabricator community to discuss potential features first and then arrive at a root problem through discussion.

Please don't take this comment in a hostile way. I am merely giving my impressions so far just observing the community and getting involved minimally. I would really like to see this community improve.

Why is it important to see all the duplicates at a glance?

Real-life case: we start having a bunch of bug reports that ultimately are consequences of the same underlying problem:

  • friend requests have this bug in this part of the UI
  • friend requests have this bug in this other part of the UI
  • and on, and on

Some of these bug reports may be perfect dups but that's rare.

At some point, it's starting to be a lot of noise in the project, so we dup all these reports against the task tracking the root cause.

Finally, when someone tackles the root cause, it's quite convenient to see the list of dups in one place, for instance to do the test plan!

From a technical standpoint, we did not record these relationships in a normalized way prior to D16196 (June 2016, about 7 months ago) -- they only existed in the transaction log, and we just wrote a "close", extra_data="as duplicate of X" transaction that can't be queried efficiently.

After D16196 and related work connected to T4788, we write edges on merges, so if we shipped this feature today it should be accurate for the last ~7 months without doing a migration. That's probably good enough that we don't necessarily need to migrate, although it's also not really that far in the past potentially misleading if we do this without migrating (we may show only some of the duplicates).


A possible workflow change might be to use "Edit Parent Tasks" instead of "Merge", to collect separate bugs under a single parent task but not actually close them. You can close them later when you fix the issue, or by copying the specific issue to the parent task's description and integrating it into the rest of the information in a cohesive way. Here in the upstream, I think I tend to use three different strategies:

  • If the subtask has no unique information, just merge.
  • If the subtask has a little bit of unique information but basically isn't a separate task, merge and then add a comment summarizing the interesting bit of the merged task (Txxx is similar, but asked for a way to...), or update the description to discuss the case from the subtask.
  • If the subtask is related but kind of its own thing, or I'm not sure or don't want to deal with it, make it a subtask.

Two product changes I could imagine might be:

  • Let you add a comment to the surviving task at the same time you perform a merge.
  • Have some kind of automatic way to copy one task's description into the other task at merge time (e.g., add a "From Txxx: blah blah" at the bottom), or give you some kind of "description merge" interface, I suppose.

Offhand, I don't particularly like either of these. At least personally, I don't find separate comment / merge operations onerous. I rarely want to merge descriptions in any sense, so it's hard to imagine anything in that vein feeling very useful.


Upshot:

  • Are there particular reasons that using "Edit Parent Tasks" instead of "Merge" to accomplish this (grouping/relating similar tasks) doesn't feel as good as merging?

At some point, it's starting to be a lot of noise in the project, so we dup all these reports against the task tracking the root cause.

For example, if this noise is mostly something like "lots of extra cards on the workboard that aren't really a good reflection of the state of the world", could you use the "Open Parents: Show Only Tasks Without Open Parents" filter to hide these subtasks and cut through the noise? (This probably won't work well if you sometimes use subtasks to legitimately break up large blocks of work, though, since they'd also be hidden.)

A possible workflow change might be to use "Edit Parent Tasks" instead of "Merge", to collect separate bugs under a single parent task but not actually close them.

It would keep the noise in the projects since all these tasks remain open. I'd rather dup.

Here in the upstream, I think I tend to use three different strategies:

I've noticed your approach indeed. It's certainly do-able but 1) it's somewhat more maintenance and 2) you still have information spread out in the task's page (some of it hidden since the page only shows recent updates by default on top of that), so we're sort of back to square one.

we write edges on merges, so if we shipped this feature today it should be accurate for the last ~7 months without doing a migration.

No idea if realistic, but what if this "feature" is only enabled on installs of less than 6 months (or with the right db schema or something)?

This probably won't work well if you sometimes use subtasks to legitimately break up large blocks of work, though, since they'd also be hidden

Yes, it wouldn't work for us for that very reason.

BTW I know it's not a justification to implementation this feature (not the root issue), but it does feel quite strange that parent / subtasks have their convenient all in one display at the top of the page, commits objects have it, pholio objects have it, etc... and even mentioned tasks (!), but not duplicate tasks. That I do found most puzzling from a UI perspective.

No idea if realistic, but what if this "feature" is only enabled on installs of less than 6 months (or with the right db schema or something)?

please no I am so young and have so much to live for


I generally this feature is reasonable and I added the edge in D16196 roughly anticipating that we might build something along these lines (I'd also like to make the other side -- "This task was closed as a duplicate of X" -- more prominent on the page of the nuked task now that we have the edge), I just wanted to check that there wasn't some easier/different approach available and that we weren't attacking some different/weird problem (like the "prioritize by counting duplicates" case, which also got a mention in T9390; I think this is a fairly poor attack on that use case).

I do think this workflow is a little bit lazy or maybe un-thorough -- ideally, you probably "should" update the parent to completely reflect the information in the subtask when you close it -- but in practice I think it's a perfectly reasonable shortcut and not an inherently flawed perilous awful mess that leads to inevitable ruin or anything.

But I think we really pretty much have to migrate if we ship it today and that always a bit of a pain. Not a huge deal, just that the work will break down as like 5% UI and 95% migration.

For completeness, there's some prior discussion in T9390, T8345 (particularly T8345#199577), and a mention in D16196. This is probably more or less a duplicate of T8345 now, practically, since the other changes there seem to be well-received and not need further iteration. I couldn't immediately turn up the task I referred to there about the other side of this (showing "this was merged into X" more clearly) so maybe I made that up or it came to me in a dream or it already got merged somewhere else or something.

I expect to:

  • Implement this more or less a described here and in T8345.
  • Write a migration to try to at least try to populate this data correctly for merges from before June 2016.
  • See if I can sneak some kind of "This was closed as a duplicate of X." UI past @chad on closed-as-duplicate tasks to make the other side of the merge more clear (I think this is already more clear than it used to be, since we have a colored "Closed, Duplicate" badge now, but want to try strengthening it slightly).
  • Resolve this and T8345 after that stuff ships.

please no I am so young and have so much to live for

๐Ÿ˜†

I do think this workflow is a little bit lazy or maybe un-thorough -- ideally, you probably "should" update the parent to completely reflect the information in the subtask when you close it

I certainly agree, it would be good practice. If I were to do that though, I put this information in the task description so it can be found at a glance vs scanning the page and having some of it hidden.

From a technical standpoint, we did not record these relationships in a normalized way prior to D16196 (June 2016, about 7 months ago) -- they only existed in the transaction log, and we just wrote a "close", extra_data="as duplicate of X" transaction that can't be queried efficiently.

Would anyone be willing to provide an example of these transaction rows so I can start working on the migration?

Here's an example of the old style:

https://secure.phabricator.com/T1356#13162

Here are the rows for it (you can fish around on secure001 to turn up more):

*************************** 8. row ***************************
             id: 13162
           phid: PHID-XACT-TASK-mw65eg7tltdptxa
     authorPHID: PHID-USER-ba8aeea1b3fe2853d6bb
     objectPHID: PHID-TASK-gajqopcpybe6xxsl4j3q
     viewPolicy: public
     editPolicy: PHID-USER-ba8aeea1b3fe2853d6bb
    commentPHID: 
 commentVersion: 0
transactionType: status
       oldValue: "open"
       newValue: "duplicate"
  contentSource: 
       metadata: []
    dateCreated: 1339621619
   dateModified: 1339621636
*************************** 9. row ***************************
             id: 13163
           phid: PHID-XACT-TASK-mh5hgcujwvldsjk
     authorPHID: PHID-USER-ba8aeea1b3fe2853d6bb
     objectPHID: PHID-TASK-gajqopcpybe6xxsl4j3q
     viewPolicy: public
     editPolicy: PHID-USER-ba8aeea1b3fe2853d6bb
    commentPHID: PHID-XCMT-TASK-kwvc2gvsw2tqwb5
 commentVersion: 1
transactionType: core:comment
       oldValue: "0"
       newValue: 4
  contentSource: 
       metadata: []
    dateCreated: 1339621619
   dateModified: 1339621636

And the comment:

mysql> select * from maniphest_transaction_comment where phid = 'PHID-XCMT-TASK-kwvc2gvsw2tqwb5'\G
*************************** 1. row ***************************
             id: 3266
           phid: PHID-XCMT-TASK-kwvc2gvsw2tqwb5
transactionPHID: PHID-XACT-TASK-mh5hgcujwvldsjk
     authorPHID: PHID-USER-ba8aeea1b3fe2853d6bb
     viewPolicy: public
     editPolicy: PHID-USER-ba8aeea1b3fe2853d6bb
 commentVersion: 1
        content: โœ˜ Merged into T1216.
  contentSource: 
      isDeleted: 0
    dateCreated: 1339621619
   dateModified: 1339621636
1 row in set (0.00 sec)

So I think the logic will go something like this:

  • For each task transaction, if it's a "status" transaction with newValue "duplicate"...
  • ...and the next transaction is a comment on the same task by the same author (maybe within a few seconds)...
  • ...and the text of the comment matches "Merged into <another task>"...
  • ...try adding an edge between the tasks.

This is now deployed here, to secure.

See T196 for an example of it in action.