Page MenuHomePhabricator

No longer receiving emails from Phacility
Closed, ResolvedPublic

Description

I'm no longer receiving notifications from https://dnanexus.phacility.com/ at akislyuk@dnanexus.com.

Event Timeline

kislyuk updated the task description. (Show Details)
kislyuk added a project: Phacility Support.
kislyuk added a subscriber: kislyuk.

Looking into this now, this is likely related to key cycling (see T7399 for details). I suspect something is just out of sync or needs to be restarted, let me figure out what's up.

On meta at least, I receive email fine.

There's a specific issue with a subset of commit-related mail not using the right credentials to hit the API locally. Here's a characteristic trace:

Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700] [2015-09-22 12:26:39] EXCEPTION: (PhutilProxyException) Error while executing Task ID 255683. {>} (ConduitClientException) ERR-INVALID-AUTH: User account is not permitted to use the API. at [<phutil>/src/conduit/ConduitFuture.php:58]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700] arcanist(head=stable, ref.master=7d15b85a1bc0, ref.stable=b37b2c30345e), libcore(), phabricator(head=stable, ref.master=ee8de4a9ec55, ref.stable=22e60f9fbd49), phutil(head=stable, ref.master=92882eb9404d, ref.stable=0c7b526c9eeb), services(head=stable, ref.master=5b51b63027b7, ref.stable=3558eff5c07a)
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #0 <#2> ConduitFuture::didReceiveResult(array) called at [<phutil>/src/future/FutureProxy.php:58]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #1 <#2> FutureProxy::getResult() called at [<phutil>/src/future/FutureProxy.php:35]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #2 <#2> FutureProxy::resolve() called at [<phutil>/src/conduit/ConduitClient.php:58]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #3 <#2> ConduitClient::callMethodSynchronous(string, array) called at [<phabricator>/src/applications/diffusion/conduit/DiffusionQueryConduitAPIMethod.php:123]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #4 <#2> DiffusionQueryConduitAPIMethod::execute(ConduitAPIRequest) called at [<phabricator>/src/applications/conduit/method/ConduitAPIMethod.php:113]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #5 <#2> ConduitAPIMethod::executeMethod(ConduitAPIRequest) called at [<phabricator>/src/applications/conduit/call/ConduitCall.php:135]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #6 <#2> ConduitCall::executeMethod() called at [<phabricator>/src/applications/conduit/call/ConduitCall.php:85]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #7 <#2> ConduitCall::execute() called at [<phabricator>/src/applications/repository/customfield/PhabricatorCommitTagsField.php:34]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #8 <#2> PhabricatorCommitTagsField::updateTransactionMailBody(PhabricatorMetaMTAMailBody, PhabricatorAuditEditor, array) called at [<phabricator>/src/applications/transactions/editor/PhabricatorApplicationTransactionEditor.php:2636]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #9 <#2> PhabricatorApplicationTransactionEditor::addCustomFieldsToMailBody(PhabricatorMetaMTAMailBody, PhabricatorRepositoryCommit, array) called at [<phabricator>/src/applications/transactions/editor/PhabricatorApplicationTransactionEditor.php:2566]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #10 <#2> PhabricatorApplicationTransactionEditor::buildMailBody(PhabricatorRepositoryCommit, array) called at [<phabricator>/src/applications/audit/editor/PhabricatorAuditEditor.php:646]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #11 <#2> PhabricatorAuditEditor::buildMailBody(PhabricatorRepositoryCommit, array) called at [<phabricator>/src/applications/transactions/editor/PhabricatorApplicationTransactionEditor.php:2331]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #12 <#2> PhabricatorApplicationTransactionEditor::buildMailForTarget(PhabricatorRepositoryCommit, array, PhabricatorMailTarget) called at [<phabricator>/src/applications/transactions/editor/PhabricatorApplicationTransactionEditor.php:2288]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #13 <#2> PhabricatorApplicationTransactionEditor::buildMail(PhabricatorRepositoryCommit, array) called at [<phabricator>/src/applications/transactions/editor/PhabricatorApplicationTransactionEditor.php:1048]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #14 <#2> PhabricatorApplicationTransactionEditor::publishTransactions(PhabricatorRepositoryCommit, array) called at [<phabricator>/src/applications/transactions/worker/PhabricatorApplicationTransactionPublishWorker.php:21]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #15 <#2> PhabricatorApplicationTransactionPublishWorker::doWork() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorWorker.php:91]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #16 <#2> PhabricatorWorker::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/storage/PhabricatorWorkerActiveTask.php:162]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #17 <#2> PhabricatorWorkerActiveTask::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php:22]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #18 PhabricatorTaskmasterDaemon::run() called at [<phutil>/src/daemon/PhutilDaemon.php:183]
Daemon 38179 STDE [Tue, 22 Sep 2015 12:26:39 -0700]   #19 PhutilDaemon::execute() called at [<phutil>/scripts/daemon/exec/exec_daemon.php:125]

This is definitely related to key cycling as part of T7399, I'm just not immediately sure which keys didn't get cycled properly. Digging into that now.

Actually, I'm less sure that this is credential-related.

@kislyuk, I'm only seeing a subset of mail (mail about commits, which includes a disabled user on the recipient list, possibly via project memberhip) impacted on our side. Is this consistent with what you're seeing?

Effectively, I think this is a variant of T9067. I'm going to patch this narrowly since it's straightforward.

epriestley triaged this task as Normal priority.

I've deployed this to repo001 (where this instance's daemons are located) and the queue appears to be flushing. I'd expect that you'll start receiving missing mail shortly (there are about 100 affected commits in the backlog). I'll follow up in a few minutes once things flush or otherwise stabilize.

The queue has flushed so I think this is now resolved (there are no longer any publish actions backed up in the queue), can you let me know if you're still seeing issues when you have a chance?

The root problem was that commit mail which included a disabled user on the recipient list could fail to generate for any recipient, because the "TAGS" field in the mail body could not be built for the disabled user. D14142 resolved this by degrading this mail (which is dropped a little later in the pipeline anyway).

I'm not exactly sure what triggered this: it could have been a configuration change, or disabling the user, or changing Herald or audit rules so that they (or a project they were a member of) got added to more commits. But all affected mail should be delievered, now.

@epriestley, I got a flood of emails starting at 1 PM, thanks! I'm not aware of any further issues with this.

Your assessment is right; all relevant emails are about commits, and are copied to one or more disabled users.

Also, we certainly disable and re-enable users occasionally, so hopefully that will continue to be a supported use case.

Cool, glad things are working again.

Also, we certainly disable and re-enable users occasionally, so hopefully that will continue to be a supported use case.

This is definitely supported, this was just a bug on our side.

I've issued your instance a 24-hour service credit for the disruption.

Thanks for the report, and let us know if you run into anything else.