Differentiate timeouts from deadlocks
ClosedPublic
Actions

Authored by cburroughs on Oct 9 2014, 4:12 PM.

Details

Reviewers

epriestley

Group Reviewers

Blessed Reviewers

Commits

rPHUa055c511ae3c: Differentiate timeouts from deadlocks

Summary

Currently AphrontDeadlockQueryException is thrown for two different
error codes. Roughly:

A lock timeout occurred. Maybe your server is a teeny bit slow or you should adjust a config value. It's possible things will deadlock if it keeps going but we don't know.
Deadklock. Something fundamentally bad has happened or your query is about to lead to a stack overflow session.

Since the resolution in each case is likely different and the raw
error code isn't passed along to tell them apart this is
confusing. Instead this commit creates a new exception for the timeout
case.

Test Plan

I tried setting innodb_lock_wait_timeout to a variety of
absurdly small values. Nothing broke, but I was unable to induce
a timeout error.

I eventually (hours later) got a deadlock exception and confirmed with
SHOW ENGINE INNODB STATUS; that it really was a deadlock.

Diff Detail

Repository

rPHU libphutil

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

cburroughs updated this revision to Diff 25625.Oct 9 2014, 4:12 PM

cburroughs retitled this revision from to Differentiate timeouts from deadlocks.

cburroughs updated this object.

cburroughs edited the test plan for this revision. (Show Details)

Herald added a reviewer: Blessed Reviewers. · View Herald TranscriptOct 9 2014, 4:12 PM

Herald added a subscriber: epriestley. · View Herald Transcript

Harbormaster completed remote builds in B2789: Diff 25625.Oct 9 2014, 4:12 PM

cburroughs mentioned this in T6281: Custom Field indexes are rebuilt on every ticket change even if the fields did not.Oct 9 2014, 8:29 PM

cburroughs edited the test plan for this revision. (Show Details)Oct 10 2014, 1:03 PM

cburroughs edited edge metadata.

I've done a bit of research on exactly these two MySQL errors in order to create a deadlock-proof subsystem for an application of ours. In my experience, the causes you attribute to these two errors are the wrong way around:

Deadlock: this happens when MySQL detects two transactions each holding a specific lock and one of the two requesting the other's lock. This is not a fundamentally bad thing: when developing large applications, it is often impractical or infeasible to make sure all your locks are in the right order. We decided that we just should keep transactions short but effective, and restart them whenever a deadlock occurs.
Lock wait timeout: slow servers could cause this, but I've often seen this error when a transaction was kept open for far too long. These kind of errors are almost always resolved by finding the transaction holding these locks for too long and revising it.

The MySQL documentation contains a nice read on coping with deadlocks.

I suppose I would consider toning down the language around deadlocks. They seem to be easy to induce with single threaded scripts (thanks mysql!) which is different from the use of the word deadlock in programing languages or VMs.

In this application, your original interpretation is correct -- we never expect to deadlock and I consider deadlocks to be severe and the result of a substantive problem in locking code. We have other lock types (like PhabricatorGlobalLock) that can allow the application to guarantee acquisition order, but processes locking resources should generally always be locking them in the same way anyway.

This likely needs an arc liberate.

This revision is now accepted and ready to land.Nov 23 2015, 4:00 PM