Page MenuHomePhabricator

Remove MetaMTA-specific retry/rate limiting code
Closed, ResolvedPublic

Description

I'm seeing these maybe once a day on my daemon logs:

[2013-12-04 18:15:20] EXCEPTION: (PhutilProxyException) Error while executing task ID 419064 from queue.
 {>} (Exception) Trying to send an email before next retry! now=1386180920, next=1386180921 at 
 [phabricator/src/applications/metamta/storage/PhabricatorMetaMTAMail.php:357]
#0 PhabricatorMetaMTAMail::sendNow() called at [phabricator/src/applications/metamta/PhabricatorMetaMTAWorker.php:25]
#1 PhabricatorMetaMTAWorker::doWork() called at [phabricator/src/infrastructure/daemon/workers/PhabricatorWorker.php:84]
#2 PhabricatorWorker::executeTask() called at [phabricator/src/infrastructure/daemon/workers/storage/PhabricatorWorkerActiveTask.php:122]
#3 PhabricatorWorkerActiveTask::executeTask() called at [phabricator/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php:19]
#4 PhabricatorTaskmasterDaemon::run() called at [libphutil/src/daemon/PhutilDaemon.php:81]
#5 PhutilDaemon::execute() called at [libphutil/scripts/daemon/exec/exec_daemon.php:112]

(I've added the timestamp in there to get a better look).
The time difference is almost always around 1-5 milliseconds too soon.

I've only ever seen it with regards to emails.

Event Timeline

avivey raised the priority of this task from to Needs Triage.
avivey updated the task description. (Show Details)
avivey added a project: Daemons.
avivey added a subscriber: avivey.

My guess is that this is a > which should be a >= or something. Mail is the only thing that has this additional supplemental check. It's no longer really necessary (long ago, mail wasn't part of the task queue, but we got that cleaned up and merged at some point) so a cleaner fix might be to just remove it entirely, although sending mail is a heavy enough side effect that it's not terrible to retain it as an extra check either, as long as it's not making a mess.

Hmm, is your MySQL clock about 20 seconds off of your PHP clock, maybe? You can check the former with:

SELECT UNIX_TIMESTAMP();

You can check the latter with:

$ php -r 'echo time()."\n";'

From a cursory read of the code, it looks like we're putting at least 15 seconds of buffer into the queue, so it's easier to imagine that this could be a clock skew issue than a race.

On reviewing the code, I'm leaning toward removing the redundant mechanism in MetaMTA, but verifying that it's a clock skew issue would give me far greater confidence about doing that.

No, my mysql and php (and everything else) are on the same machine, so there's no time difference there:

$ mysql -Be 'select unix_timestamp()'; php -r 'echo time()."\n";'
unix_timestamp()
1386268344
1386268344

...

epriestley renamed this task from Daemon wakes up a little early to Remove MetaMTA-specific retry/rate limiting code.Dec 8 2013, 4:33 PM
epriestley updated the task description. (Show Details)

I'm just going to nuke the old code anyway, the new stuff is better on a lot of dimensions.