Page MenuHomePhabricator

Support bulk transmission of notification frames in internal Aphlict protocol
Open, NormalPublic

Description

It's common for a user action to emit several notifications in quick succession. For instance, pushing a commit will often trigger a notification for the commit, a notification for the revision where it was reviewed, and notifications for one or more tasks which the commit closes. These events happen in rapid succession.

Currently, we'll make a request to the server for each event. This is inefficient, as we pay the per-page startup cost every time. By doing more batching in the pipeline, we can reduce the cost of multiple simultaneous notifications, which are usually responsible for the peak loads the system currently experiences.

Some steps might include:

  1. Allow the endpoint to take a list of IDs.
  2. In the client, when receiving lower-urgency notifications (not chat, not objects on the current page), buffer them (for, say, one second) instead of requesting them immediately. After 1 second, send a bulk request for any notifications we've learned about in the meantime.
  3. Change the Aphlict protocol to support multiple notifications in a single message.
  4. Change the server to batch the messages it sends (i.e., make only one request at the end of a task or page).

Doing (1) and (2) is likely to have the greatest impact. It might be nice to do (3) and (4) eventually.

Event Timeline

epriestley raised the priority of this task from to Normal.
epriestley updated the task description. (Show Details)
epriestley added projects: Conpherence, Aphlict.
epriestley added a subscriber: epriestley.

This means that users will also less bugmail, right? Our advanced users are noticing a big increase of email notifications compared to Bugzilla, with a detrimental effect. Maybe we want to wait more than a second in order to merge sequential user actions. There is a related request in Wikimedia Phabricator: Email notifications should bundle events as the web interface does

No, this refers to real-time notifications only. Users will see no difference in behavior (except an imperceptible delay in some lower-priority real-time notifications). This only improves scalability.

We do not currently plan to ever batch email.

No, this refers to real-time notifications only. Users will see no difference in behavior (except an imperceptible delay in some lower-priority real-time notifications). This only improves scalability.

We do not currently plan to ever batch email.

For product or technical reasons? One of our most common internal complaints is 'phabricator sends too much email!!!'. And yeah we should be better about education, and there are a bunch of fine grained settings, and mail filters are easy to setup in most clients and take 5 minutes, and there is probably another ticket for cleaning up what can be done from 'edit task' vs the comment box... but if I look at my recent emails from phabricator.corp something like 'batch emails for the same object within ~30 seconds' would cut the number in half. Getting three emails in the same minute for a changed columnraised prioritycomment cycle does not feel terrible useful or delightful.

Product reasons normally, although technical reasons if metamta.one-mail-per-recipient is enabled (we can't batch mail which must be delivered to multiple distinct recipients).

In general, if we batch actions within a period of time, we must always delay the first notification about the actions by the batch duration.

If immediacy is not very important, can you just turn off email and use in-app notifications? Why would one email delayed by several minutes about all three changes be better than reviewing them at your leisure from the web UI?

Broadly, I have a hard time solving this problem because I don't really understand the problem. Users often complain about "too much email", but I don't think this is actually the problem anyone experiences (since you can turn it off in a few seconds, and complaints about not being able to selectively filter email are rare).

At Facebook, I checked the data and saw that I had ~11x the daily email volume of the most vocal user complaining about getting "too much email", and I've never had any difficulty managing my mail (I have spent perhaps 10 minutes writing mail rules in my life). I fundamentally do not understand how I can handle 10x the mail volume of another user with zero problems, while they are overwhelmed.

It might be helpful to ask users to be more specific. The problem presumably isn't that all mail is bad (it's easy to disable?), or that it's hard to filter the good stuff out (settings + mail rules make it pretty easy?). What mail are they receiving that they don't want to receive? Why aren't they able to fix it? Batching is a possible approach, but I haven't seen much discussion about it, all considered.

I could also write a command to let you see how much mail Phabricator is sending people. (If you learned that you receive 10x the mail volume that your most agitated user do, would that change your thinking about this issue?)

In this vein:

  • T5791 will let you write default mail rules for users (and let users write rules for themselves), and provide more powerful filtering options. This primarily addresses Gmail's lack of header-based filtering and deals with narrow use cases related to wanting, e.g., mail about repository X but not about repository Y.
  • T4103 will let you set default mail preferences for users.

Purely from an administrator/support point of view, maybe we'll make all (non-password-reset) mail default to off for all users after T5791/T4103, and require users or administrators to opt into the kinds of actions they want to send mail about. I think this is a worse default (and maybe much worse), but it would presumably reduce the support cost of these settings for us as the upstream, and reduce administrator frustration with users complaining about this stuff.

But my experience is basically that users complain about getting too much email no matter what, so I suspect we'd need to do something that extreme to really lay this problem to rest.

Put another way, the one case of this which I've conclusively pinned on a root cause was "user is wildly sensitive about email and also would rather complain about it over and over again than spend much less energy fixing the problem", and the fix was "look up volume statistics, roll eyes out of sockets".

If that's often the root cause, there's probably no technical fix, and disabling mail by default seems reasonable. There's no real technical or product reason we need to send mail by default now that in-app notifications are fairly sophisticated, it just seems kind of bad not to email people when they're asked to review code since the author may be partly blocked for a long time.

@chad, do you have any particular objections to disabling all mail (other than welcome / password reset mail, which we obviously need to send) by default-default after we give administrators tools to selectively adjust the defaults in T5791 / T4103? It feels a little extreme to me, but I think we're going to be dealing with this forever if we don't do it.

The only exception I can come up with is that "Mailing List" users probably need to have mail enabled by default. This is a little tricky, but probably manageable.

Another interesting possible solution would looking at T4316, T6027, and M1430. If the specific issue if a user has to go to 5 places to do a basic edit/comment, reducing that into 1 action, 1 email benefits everyone who uses Phabricator.

Yeah -- we can pursue that stuff, but I'm strongly suspicious that user complaints about mail volume aren't meaningfully correlated to actual mail volume and that there is no level we can drive it down to which will meaningfully reduce user frustration.

Yeah, I'd group the problems into two spaces. The "email storm", when one action triggers multiple emails (closed a revisions, which closes a task, which unblocks other tasks). Storms also occur when a users wants to take multiple actions on an object, but needs to use multiple interfaces (columns, comments, priority, etc).

Then there is just "Phabricator sends emails for everything" which is more T5791/T4103. I think some some sort of NUX for new users might be helpful here too.

Some perspectives from what my users complain about to me w.r.t. mail:

  • Being subscribed to both parts of a an action that affects multiple objects causes duplicate notifications
    • Notification for task and subtask when subtask is closed
    • Notification for task and diff when diff is attached
    • Notification for task and commit when commit is attached
  • Some things there's no way to do in one action
    • Workboard + assign + comment
  • Some email/notification settings aren't fine-tuned enough
    • Don't really care about all subscriber changes; only care about removals (and even then probably only care about myself being removed)
    • Same with reviewers
  • Conpherence settings are in a different location than all other email settings (/settings/panel/conpherence/ vs /settings/panel/emailpreferences/).
    • Also unclear if the dropdown at top of main email preferences page (self actions, etc.) apply to conpherence
  • Things which should have been one action accidentally done as two in quick succession
    • Comment on a task, realize right after that you meant to claim it too
    • Edit -> notice typo immediately after -> fix
    • Even a 30 second batch would alleviate most of those

Also, to address a particular point above:

If immediacy is not very important, can you just turn off email and use in-app notifications? Why would one email delayed by several minutes about all three changes be better than reviewing them at your leisure from the web UI?

A lot of my users don't live on phabricator. They only visit it if they get a poke (email) telling them to.

There are maybe a few items of interest for them per day, but there tend to be a lot of notifications about the item in quick succession. Not sure what the ideal behavior really is there though

I agree that too much email is often a mostly cultural problem. I'm personally only moderately annoyed by email volume, and usually only in cases where a bunch of things happened at around the same time. But I also have 30k unread emails sorted from mailing lists to read on train trips and I suspect most would view that as a bizarre edge case. Batching felt like it might be a salve of sorts for legitimately annoying cases, but I'm not particularly attached to it. Anecdotally some users have reported things along the lines of 'I have used a bunch of other systems and phabricator is the only one I have felt overwhelmed by email with' (I know that feedback is too general to be of much use here.)

I think ~0% of the people who turned off email then looked at web notifications, or wrote a nodejs crawler to snapchat things they might be interested in, or did anything other than ignore their colleagues. Since that's toxic behavior that spirals into 'stuff that managers need to talk about' I'm worried about considering 'just turn off email' to be a reasonable out. (As an admin being able to control the default settings would be good so I don't have a preference for the upstream defaults.) From that point of view tools/data/facts to help show that the volume of email is reasonable to your colleagues would help.

In bullet points:

  • A tool to show email volume from phabricator would be helpful.
  • As an admin being able to see people's email settings would be nice, but probably has thorny security issues and I think I can just go figure this out from the DB anyway without too much trouble anyway.
  • The Storm issues that Chad mentioned seem like good improvements for several reasons and would reduce email volume.
  • When people complains, I'll corner them and make them say something more specific.

We don't actually store the outcome of preference filtering at the time we sent the mail right now (we will after T5791) so I can only report an "Unfiltered" column, which is the maximum number of messages we may have sent (I could do a little better than this now, but it's probably better to hold it until T5791 for a more complete picture). Users with any settings which disable mail will have received less mail than the "unfiltered" number.

(This command may take a while to run since I was extremely lazy about loading the data.)

Here are selected numbers for this install, if the context is helpful:

ubuntu@secure001:/core/lib/phabricator$ ./bin/mail volume
+=======================+============+
| User                  | Unfiltered |
+=======================+============+
| epriestley            | 7380       |
| chad                  | 7127       |
| joshuaspence          | 2366       |
| eadler                | 1737       |
| hach-que              | 1100       |
...
| cburroughs            | 469        |
...
| sshannin              | 148        |
...
  • As an admin being able to see people's email settings would be nice, but probably has thorny security issues and I think I can just go figure this out from the DB anyway without too much trouble anyway.

Oh, json. I think this is a not misleading example in-case others are speliunking

SELECT user_preferences.userPHID,user.userName FROM user_preferences JOIN user ON user_preferences.userPHID = user.phid WHERE REPLACE(REPLACE(SUBSTRING_INDEX(SUBSTRING_INDEX(preferences, "maniphest-comment", -1), "," ,1), "\"", ""), ":", "") = 0 ORDER BY user.userName;

D13878 makes bin/mail volume look like this instead:

+==============+============+===========+
| User         | Unfiltered | Delivered |
+==============+============+===========+
| admin        | 145        | 4         |
| example-list | 8          | 1         |
| dog          | 31         | 0         |
| epriestley   | 24         | 0         |
| ducksey      | 18         | 0         |
| saurus       | 8          | 0         |
| squeakybirdo | 3          | 0         |
| nnn          | 3          | 0         |
| facebooker   | 2          | 0         |
+==============+============+===========+

The "Delivered" column is the actual amount of mail we sent the user. However, mail generated before D13878 won't report properly (we didn't record the information), so to get the real number you need to:

  • Wait for D13878 to land.
  • Update to it.
  • Wait 30 days.
  • Then run bin/mail volume.
epriestley renamed this task from Support message batching in the notification pipeline to Support bulk transmission of notification frames in internal Aphlict protocol.Mar 18 2016, 3:27 PM

This task has nothing to do with any user-visible behavior and is purely a performance/network optimization, despite all the discussion about batching emails here. :)

Sorry, I picked a really bad name for it originally.