See PHI364. An install is seeing lock failures on one repository cluster host.
Global locks are currently difficult to debug remotely. In particular:
- MySQL lock length limits make them difficult to read. It would be better to standardize the lock parameterization so the exception can clearly show exactly which lock failed.
- There's no guidance around which process is holding the locks.
I expect to do this:
- Implement a new PhabricatorParameterLock, which builds on PhabricatorGlobalLock.
- This lock takes parameters and automatically hashes them into a scalar lock name, but provides the full parameter list in lock exceptions.
- Add a new lock log table.
- Add a GC for the lock log table, which defaults to collecting after 0 seconds.
- When the lock log GC is configured to retain locks, write a row to the lock log table after acquiring a lock with additional metadata about the host and process which is holding the lock.
- When a lock is released, update the lock log if we wrote a row.
- When we raise a lock exception, check the lock log table for information about processes which recently held the lock and report it.
- Provide bin/lock commands to inspect the table in more detail.
Then installs can configure this GC to enable additional lock debugging information.