Until MySQL 5.7, each MySQL connection may hold only one simultaneous lock
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	epriestley
	Mar 2 2021, 7:27 PM

Description

See PHI2009. Until MySQL 5.7, GET_LOCK(...) releases other locks held on the same connection as a side effect:

Before MySQL 5.7, only a single simultaneous lock can be acquired and GET_LOCK() releases any existing lock.
https://dev.mysql.com/doc/refman/5.7/en/locking-functions.html#function_get-lock

That is:

mysql> GET_LOCK('A', 0);
msyql> GET_LOCK('B', 0); # Releases lock "A"!

I think this has escaped notice through a combination of factors:

The behavior is wildly surprising.
Most development/testing occurs against newer versions of MySQL that don't have this behavior.
We currently only stack locks in one specific workflow (when accepting writes against non-leader repository nodes).
This is only observable in a multi-master writable cluster with a fairly high write rate.
(See Below) The code accidentally attempts to prevent it.
(See Below) The reproduction case is even more complicated and subtle than I initially believed, and requires an external connection be improperly returned to the connection pool after a failure to acquire a write lock.

Most test environments don't have this behavior, and secure doesn't have a high enough write rate to hit it.

In the short term, the fix is:

Never issue GET_LOCK() on a connection already holding a lock.

In the longer term, perhaps:

Require MySQL 5.7 or newer, or condition this logic on old versions of MySQL, since it has a small performance cost and the old GET_LOCK() behavior is wholly ridiculous.

Revisions and Commits

rP Phabricator
	D21585	rP629429b28354 (stable) Never return external connections to the GlobalLock connection pool
	D21586	rP4aecb6f25d12 (stable) Refuse to acquire a second GlobalLock on a connection
	D21583	rPaa2d89f1d4c5 (stable) When a GlobalLock with an external connection is released, don't…
	D21584	rP84049ed4793c (stable) Prevent external connections from being mutated on held locks
	D21583	rP15dbf6bdf0a5 When a GlobalLock with an external connection is released, don't return it to…
	D21586	rP2b473558c2b3 Refuse to acquire a second GlobalLock on a connection
	D21585	rP33bce22ef2ad Never return external connections to the GlobalLock connection pool
	D21584	rP466013f11a6d Prevent external connections from being mutated on held locks

Related Objects

Mentioned In: 2021 Week 10 (Early March)
T13624: Provide an error log for `sshd` subprocesses
Mentioned Here: T11908: Support an "overlay" database connection mode where multiple applications share a single connection
D21369: When acquiring a GlobalLock, put good connections that just got unlucky back in the pool

Event Timeline

epriestley triaged this task as Normal priority.Mar 2 2021, 7:27 PM

epriestley created this task.

epriestley added a revision: D21583: When a GlobalLock with an external connection is released, don't return it to the pool.Mar 2 2021, 8:49 PM

epriestley added a revision: D21584: Prevent external connections from being mutated on held locks.Mar 2 2021, 8:56 PM

This is actually very subtle.

In addition to the above, we usually dodged this because PhabricatorGlobalLock attempts to always use new connections. This is desirable given MySQL's behavior, but also entirely by accident, because the first version of PhabricatorGlobalLock (in D2864) just held a transaction open instead of using GET_LOCK(), which would have required a unique connection.

This means that it's actually fairly difficult to accidentally acquire two simultaneous locks on the same connection. However, we can do it like this:

Set an external connection.
Attempt to acquire lock A.
Lock acquisition fails.
Attempt to acquire lock A.
Lock acquisition succeeds.
Attempt to acquire lock B.

In step (3), the external connection is incorrectly returned to the connection pool, since D21369. In step (4), we acquire the first lock on the connection. In step (6), we acquire a second lock on the same connection. In MySQL versions older than 5.5, this releases the lock from step (4).

Since we use an external connection to guarantee that the ephemeral and durable write locks are on the same host, this particular perfect storm of conditions can occur under high write volume on a multi-master cluster.

epriestley added a revision: D21585: Never return external connections to the GlobalLock connection pool.Mar 2 2021, 9:33 PM

epriestley added a revision: D21586: Refuse to acquire a second GlobalLock on a connection.Mar 2 2021, 9:38 PM

epriestley added a commit: rP2b473558c2b3: Refuse to acquire a second GlobalLock on a connection.Mar 2 2021, 9:44 PM

epriestley added a commit: rP466013f11a6d: Prevent external connections from being mutated on held locks.

epriestley added a commit: rP33bce22ef2ad: Never return external connections to the GlobalLock connection pool.

epriestley added a commit: rP15dbf6bdf0a5: When a GlobalLock with an external connection is released, don't return it to….

epriestley updated the task description. (Show Details)Mar 2 2021, 9:50 PM

epriestley added a commit: rP84049ed4793c: (stable) Prevent external connections from being mutated on held locks.Mar 3 2021, 3:26 AM

epriestley added a commit: rP4aecb6f25d12: (stable) Refuse to acquire a second GlobalLock on a connection.

epriestley added a commit: rP629429b28354: (stable) Never return external connections to the GlobalLock connection pool.

epriestley added a commit: rPaa2d89f1d4c5: (stable) When a GlobalLock with an external connection is released, don't….

I deployed this to the hosts affected by PHI2009 yesterday, and it appears to have resolved the problem.

It may still be worthwhile to try to navigate the MySQL 5.7 issue as part of some future change, like T11908 (in MySQL 5.7 or newer, we do not need a unique connection per lock) but the impact is small and the code is generally in a reasonable state after these changes.

epriestley mentioned this in T13624: Provide an error log for `sshd` subprocesses.Mar 4 2021, 12:08 AM

epriestley mentioned this in 2021 Week 10 (Early March).Mar 5 2021, 9:29 PM

Until MySQL 5.7, each MySQL connection may hold only one simultaneous lockClosed, ResolvedPublicActions

Description

Revisions and Commits

Related Objects

Event Timeline

Until MySQL 5.7, each MySQL connection may hold only one simultaneous lock
Closed, ResolvedPublic
Actions