Page MenuHomePhabricator

incoming email handler replies multiple times for a single unrecognized command
Closed, ResolvedPublic

Description

steps to reproduce:

  1. reply to an email from Audit with !accept (trying to accept a commit, like you would with Differential) (the email is triggered by Herald rules)

expect:

  • commit accepted, but I'll take "unrecognized command" once too

actual:

  • multiple failure replies over several hours

reproduced on phacility instance, discovered by coworker, confirmed by me

Event Timeline

epriestley triaged this task as Normal priority.
epriestley added a project: Mail.

I think this is happening:

  • The upstream mail provider is submitting mail to us via webhook.
  • We're processing it, but failing to handle/resolve the exception arising from the invalid command properly.
  • Because the exception escapes, we don't return HTTP 200.
  • The upstream provider interprets this as a failed delivery and re-transmits the mail after a delay.

I think this is fairly straightforward to resolve.

We have an additional safety check which is supposed to prevent this kind of issue: we refuse to process messages with the same "Message-ID" more than once.

However, I think MailGun either changed their API or we never implemented this correctly in the first place, as we don't seem to be reading headers properly. Since we never read the "Message-ID", the check for duplicates fails. I'll resolve this, too.

This should now be resolved in production.

Although this isn't a hugely critical issue and I probably wouldn't hotfix it normally, it's an issue that's hard to be sure about the fix for without verifying it in production, and it's still fairly early on Saturday morning, and it would have to wait a whole week to deploy normally. So:

  • I cherry-picked this to stable (as rPfea2389066ed).
  • I deployed stable to production.
  • I verified that sending an invalid command responds with exactly one reply.
  • I verified that inbound mail now gets proper "Message-ID" headers parsed.
  • (And I verified that normal inbound mail works properly.)

If you have existing threads that are still stuck retransmitting, you may receive one additional copy of the error mail (the queued retry on the upstream side), but it should stop after that. New mail should behave correctly.

Thanks for the report! Let us know if you run into anything else.