On wikimedia's phabricator instance, we have several broken transactions
lingering in the queue. This handles the case where the task has no associated
data so that the task will get a permanent failure instead of needlessly
retrying tasks that are hopelessly broken and have no chance of success, ever.
Details
- Reviewers
epriestley - Group Reviewers
Blessed Reviewers
Restart phd, see that pending tasks with null data get recorded as
permanent failures.
Diff Detail
- Branch
- production
- Lint
Lint Passed - Unit
Tests Passed - Build Status
Buildable 7665 Build 8368: [Placeholder Plan] Wait for 30 Seconds Build 8367: arc lint + arc unit
Event Timeline
src/applications/transactions/worker/PhabricatorApplicationTransactionPublishWorker.php | ||
---|---|---|
37 | Without the check that I added on line 32, this line would fail with an exception when $data is not an array. Maybe $data should always be an array but some real-world circumstances result in broken transaction data. I believe that this handles the broken transactions more gracefully. |
This is T8672. The exact chain of events for the most common case of this is:
- A commit includes non-utf8 data.
- We try to pass it to the worker so the worker can add a patch to any mail it generates, if configuration requires it.
- We json_encode() it, which fails, returning false. We silently ignore this error.
- In the worker, we json_decode() it, which also fails, returning false. We ignore this error too.
- We finally try to idx(...) it, which explodes.
I want to:
- Detect the error at (3).
- Implement something in this vein before (5).
- Actually do the right thing at (2) [convert it to utf8 before passing it].
- Ideally, catch (4), but this is tricky because the data for some workers is not "real" JSON (e.g., it's just an ID or whatever).
This has been on my short list to fix for a while, particularly in connection with T9187. Let me fix the other cases and then I'll bring some version of this upstream.
I folded this into D13939, which also fixes the root problem (hopefully). It ended up with a LiskDAO change so I want to wait until after we cut the stable branch on Saturday morning.