Page MenuHomePhabricator

ApplicationTransactionPublishWorker can retry indefinitely, sending email as a side effect, if a transaction takes more than 2 hours to publish
Closed, ResolvedPublic

Description

Workers obtain 2-hour leases by default.

If a TransactionPublishWorker runs for more than two hours, it will lose its lease and be unable to update the task after completion.

In the original case, a very large change was pushed which touched approximately 100 packages and projects and presumably had a recipient list of several hundred users. From a --trace, it appears that much of the time was spent inefficiently rebuilding the Diff, probably during patch rendering. This can probably be brought down to nearly zero: recipients need to individually survive a visibility check, but do not need to individually rebuild the diff content.

More broadly, we can verify and/or renew the lease before moving from mail construction to mail queueing. This would reduce the bad case (mail loop) into a minor nuisance (worker loop with no side effects).

Event Timeline

epriestley added a subscriber: jmeador.
epriestley added subscribers: jcox, jessjohnson, cspeckmim, chad.

T11708 has almost nothing to do with this, but the fix for this will rewrite the code that's running into issues and probably moot them.

epriestley claimed this task.

I believe we haven't seen more of this in two years, and "make the worker always exit in less than 2 hours" is a more-or-less reasonable remedy. Getting one extra email every two hours also isn't a huge problem even if we do get this wrong.

A little bit of this survives to some degree in T11767.