Page MenuHomePhabricator

Differentiate timeouts from deadlocks
ClosedPublic

Authored by cburroughs on Oct 9 2014, 4:12 PM.
Tags
None
Referenced Files
F13084903: D10669.diff
Wed, Apr 24, 11:31 PM
Unknown Object (File)
Tue, Apr 23, 12:26 AM
Unknown Object (File)
Sat, Apr 13, 1:58 PM
Unknown Object (File)
Fri, Apr 12, 2:20 AM
Unknown Object (File)
Mon, Apr 1, 8:43 PM
Unknown Object (File)
Thu, Mar 28, 5:50 PM
Unknown Object (File)
Mar 14 2024, 11:35 PM
Unknown Object (File)
Jan 29 2024, 4:50 AM

Details

Summary

Currently AphrontDeadlockQueryException is thrown for two different
error codes. Roughly:

  • A lock timeout occurred. Maybe your server is a teeny bit slow or you should adjust a config value. It's possible things will deadlock if it keeps going but we don't know.
  • Deadklock. Something fundamentally bad has happened or your query is about to lead to a stack overflow session.

Since the resolution in each case is likely different and the raw
error code isn't passed along to tell them apart this is
confusing. Instead this commit creates a new exception for the timeout
case.

Test Plan

I tried setting innodb_lock_wait_timeout to a variety of
absurdly small values. Nothing broke, but I was unable to induce
a timeout error.

I eventually (hours later) got a deadlock exception and confirmed with
SHOW ENGINE INNODB STATUS; that it really was a deadlock.

Diff Detail

Repository
rPHU libphutil
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

cburroughs retitled this revision from to Differentiate timeouts from deadlocks.
cburroughs updated this object.
cburroughs edited the test plan for this revision. (Show Details)
cburroughs edited edge metadata.

I've done a bit of research on exactly these two MySQL errors in order to create a deadlock-proof subsystem for an application of ours. In my experience, the causes you attribute to these two errors are the wrong way around:

  • Deadlock: this happens when MySQL detects two transactions each holding a specific lock and one of the two requesting the other's lock. This is not a fundamentally bad thing: when developing large applications, it is often impractical or infeasible to make sure all your locks are in the right order. We decided that we just should keep transactions short but effective, and restart them whenever a deadlock occurs.
  • Lock wait timeout: slow servers could cause this, but I've often seen this error when a transaction was kept open for far too long. These kind of errors are almost always resolved by finding the transaction holding these locks for too long and revising it.

The MySQL documentation contains a nice read on coping with deadlocks.

I suppose I would consider toning down the language around deadlocks. They seem to be easy to induce with single threaded scripts (thanks mysql!) which is different from the use of the word deadlock in programing languages or VMs.

epriestley added a reviewer: epriestley.

In this application, your original interpretation is correct -- we never expect to deadlock and I consider deadlocks to be severe and the result of a substantive problem in locking code. We have other lock types (like PhabricatorGlobalLock) that can allow the application to guarantee acquisition order, but processes locking resources should generally always be locking them in the same way anyway.

This likely needs an arc liberate.

This revision is now accepted and ready to land.Nov 23 2015, 4:00 PM
cburroughs edited edge metadata.
  • arc liberating
  • rebased
This revision was automatically updated to reflect the committed changes.

(I used the "Land Revision" button this time and as far as I can tell no wild mutations occurred.)