Page MenuHomePhabricator

Add timeouts to service calls to external mailers (was: PhabricatorMetaMTAWorker may hang indefinitely if "sendmail" hangs indefinitely)
Open, LowPublic

Description

If sendmail does not exit, the sendmail-based mailers hang indefinitely and block the queue. We should put a timeout on sendmail execution.


I've no idea why, but there always seems to be a bunch of PhabricatorMetaMTAWorker tasks that daemons pick up, that have large expiries and don't seem to go away in a short timeframe. I'd expect sending email to be a reasonably quick operation, and that these workers shouldn't just sit there for ages.

There's no exceptions in the daemon logs.

Event Timeline

hach-que created this task.Aug 26 2014, 8:35 AM
hach-que raised the priority of this task from to Needs Triage.
hach-que updated the task description. (Show Details)
hach-que added projects: Daemons, Phabricator.
hach-que added subscribers: hach-que, epriestley.

Have you confirmed that sending mail really is a quick operation -- for example, by using bin/mail send-test? We've seen environments where this operation takes longer than one might expect.

I ran bin/mail send-test and then bin/mail list-outbound almost immediately after; this is what I saw:

14652 Queued for Delivery [Phabricator] Email Verification                                                                                                       
14653 Queued for Delivery [Phabricator] Email Verification                                                                                                       
14654 Queued for Delivery [Phabricator] Email Verification                                                                                                       
14655 Void                T1195: Investigate "Exclude Duplicate External Projects"                                                                               
14656 Void                T1196: Investigate "Resync option loses local changes if sync disabled"                                                                
14657 Void                T1196: Investigate "Resync option loses local changes if sync disabled"                                                                
14658 Sent                rTd7ade5f7f40d: Rename all of the top-level directories to reflect the game name                                                       
14659 Sent                rTd7ade5f7f40d: Rename all of the top-level directories to reflect the game name                                                       
14660 Queued for Delivery rPBc3cc0b7029ca: Enable "PlatformSpecificOutputFolder" by default                                                                      
14661 Queued for Delivery rPR9a583d22f007: Upgrade MonoGame and recompile assets                                                                                 
14662 Queued for Delivery rPRaa671b2b404c: Load IAssetCompiler and ILoadStrategy implementations in Protogame asset tool                                         
14663 Queued for Delivery rPBc3cc0b7029ca: Enable "PlatformSpecificOutputFolder" by default                                                                      
14664 Queued for Delivery rPR0373a308112c: Implement draw circle, various fixes for Android                                                                      
14665 Queued for Delivery rPR03ed7099323a: Fix up touch event handling                                                                                           
14666 Queued for Delivery rPR5ffa5bc3c8e4: Make button and textbox UI elements work on Android                                                                   
14667 Queued for Delivery rPR79675a34fc5b: Allow button draw to be overridden                                                                                    
14668 Queued for Delivery rPR033b38b2d446: Update Protobuild                                                                                                     
14669 Void                T1195: Investigate "Exclude Duplicate External Projects"                                                                               
14670 Sent                hi

I've no idea what "Void" means.

This is what the leased tasks show as well:

You can use bin/mail show-outbound --id <id> to inspect a "Void" message for an explanation as to why it was voided. Usually it means the recipient declined to receive the notification as an email, so we dropped the mail and did not deliver it.

bin/mail send-test sends the mail in-process and doesn't exit until the handoff is complete, so if that ran quickly it didn't experience a slow MTA handoff.

Are there queued tasks ahead of these tasks with expired leases? We always execute never-leased-before tasks before we execute tasks which have been tried at least once.

There are 14 queued tasks; none of them have been processed since I originally made the task. The MetaMTA tasks are now sitting on an expires of -39245.

Backtrace from daemons:

#0 /srv/phabricator/libphutil/src/future/Future.php(146): __phutil_signal_handler__(1)
#1 /srv/phabricator/libphutil/src/future/Future.php(58): Future::waitForSockets(Array, Array, 0.99988913536072)
#2 /srv/phabricator/libphutil/src/daemon/PhutilDaemonOverseer.php(239): Future->resolve(1)
#3 /srv/phabricator/phabricator/scripts/daemon/launch_daemon.php(27): PhutilDaemonOverseer->run()
#4 {main}

All of the daemons have a similar backtrace when I SIGHUP them. I've noticed that this is occurring for other tasks, not just MetaMTA (but it's the most frequent which is probably why it looked like the issue was specific to MetaMTA).

hach-que renamed this task from PhabricatorMetaMTAWorker always takes ages to send email using default email adapter to Phabricator daemons hang forever in Future::waitForSockets.Aug 30 2014, 2:04 PM
hach-que updated the task description. (Show Details)

This is the backtrace on an actual child PID:

#0 /srv/phabricator/libphutil/src/future/Future.php(146): __phutil_signal_handler__(1)
#1 /srv/phabricator/libphutil/src/future/Future.php(58): Future::waitForSockets(Array, Array, 1)
#2 /srv/phabricator/libphutil/src/future/exec/ExecFuture.php(394): Future->resolve(NULL)
#3 /srv/phabricator/phabricator/externals/phpmailer/class.phpmailer-lite.php(540): ExecFuture->resolvex()
#4 /srv/phabricator/phabricator/externals/phpmailer/class.phpmailer-lite.php(499): PHPMailerLite->SendmailSend('Date: Sat, 30 A...', 'hach-que create...')
#5 /srv/phabricator/phabricator/src/applications/metamta/adapter/PhabricatorMailImplementationPHPMailerLiteAdapter.php(100): PHPMailerLite->Send()
#6 /srv/phabricator/phabricator/src/applications/metamta/storage/PhabricatorMetaMTAMail.php(653): PhabricatorMailImplementationPHPMailerLiteAdapter->send()
#7 /srv/phabricator/phabricator/src/applications/metamta/PhabricatorMetaMTAWorker.php(26): PhabricatorMetaMTAMail->sendNow()
#8 /srv/phabricator/phabricator/src/infrastructure/daemon/workers/PhabricatorWorker.php(87): PhabricatorMetaMTAWorker->doWork()
#9 /srv/phabricator/phabricator/src/infrastructure/daemon/workers/storage/PhabricatorWorkerActiveTask.php(124): PhabricatorWorker->executeTask()
#10 /srv/phabricator/phabricator/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php(19): PhabricatorWorkerActiveTask->executeTask()
#11 /srv/phabricator/libphutil/src/daemon/PhutilDaemon.php(91): PhabricatorTaskmasterDaemon->run()
#12 /srv/phabricator/libphutil/scripts/daemon/exec/exec_daemon.php(111): PhutilDaemon->execute()
#13 {main}
hach-que renamed this task from Phabricator daemons hang forever in Future::waitForSockets to PhabricatorMetaMTAWorker always takes ages to send email using default email adapter.Aug 30 2014, 2:16 PM
hach-que updated the task description. (Show Details)

From IRC:

  • Looks like sendmail, executed via PHPMailerLite, is hanging indefinitely.
  • Likely fix is to add a timeout and fail after a few minutes.
  • But see also T5956.
  • This may be a "good" time to merge PHPMailerLite and PHPMailer.
hach-que closed this task as Invalid.Aug 30 2014, 2:49 PM
hach-que claimed this task.

This was caused by Docker killing off the SETGID bit on postqueue / postdrop executables. See https://gist.github.com/porjo/35ea98cb64553c0c718a.

I'm going to repurpose this for fixing the hang, we definitely shouldn't stall the queue over mail issues.

epriestley renamed this task from PhabricatorMetaMTAWorker always takes ages to send email using default email adapter to PhabricatorMetaMTAWorker may hang indefinitely if "sendmail" hangs indefinitely.Aug 30 2014, 2:50 PM
epriestley reopened this task as Open.
epriestley claimed this task.
epriestley triaged this task as Low priority.
epriestley updated the task description. (Show Details)
epriestley edited projects, added Mail; removed Phabricator.

Support Impact This is wildly difficult to diagnose because it manifests as a mysterious hang.

epriestley moved this task from Backlog to v3 on the Mail board.Jul 16 2016, 2:09 PM
epriestley renamed this task from PhabricatorMetaMTAWorker may hang indefinitely if "sendmail" hangs indefinitely to Add timeouts to service calls to external mailers (was: PhabricatorMetaMTAWorker may hang indefinitely if "sendmail" hangs indefinitely).Jan 2 2019, 5:14 PM
epriestley moved this task from v3 to Soon? on the Mail board.