Page MenuHomePhabricator

Implement a global lock log to make it easier to debug global locks
Closed, ResolvedPublic

Description

See PHI364. An install is seeing lock failures on one repository cluster host.

Global locks are currently difficult to debug remotely. In particular:

  • MySQL lock length limits make them difficult to read. It would be better to standardize the lock parameterization so the exception can clearly show exactly which lock failed.
  • There's no guidance around which process is holding the locks.

I expect to do this:

  • Implement a new PhabricatorParameterLock, which builds on PhabricatorGlobalLock.
  • This lock takes parameters and automatically hashes them into a scalar lock name, but provides the full parameter list in lock exceptions.
  • Add a new lock log table.
  • Add a GC for the lock log table, which defaults to collecting after 0 seconds.
  • When the lock log GC is configured to retain locks, write a row to the lock log table after acquiring a lock with additional metadata about the host and process which is holding the lock.
  • When a lock is released, update the lock log if we wrote a row.
  • When we raise a lock exception, check the lock log table for information about processes which recently held the lock and report it.
  • Provide bin/lock commands to inspect the table in more detail.

Then installs can configure this GC to enable additional lock debugging information.

Event Timeline

epriestley triaged this task as Normal priority.Mar 1 2018, 5:24 PM
epriestley created this task.

This log is now available at HEAD of master.

Because this log performs a lot of writes and is generally only useful for diagnostics/debugging, it is off by default. To enable the log, run:

phabricator/ $ ./bin/lock log --enable

To disable the log, run:

phabricator/ $ ./bin/lock log --disable

After enabling or disabling the log, restart the daemons to propagate the new configuration.

When the log is enabled, bin/lock log will show information about recent locks:

phabricator/ $ ./bin/lock log
+======+============================================+=====================+=====================+======+===================================+======================================+
| Lock | Name                                       | Acquired            | Released            | Held | Parameters                        | Context                              |
+======+============================================+=====================+=====================+======+===================================+======================================+
| 2798 | ph:local-jrQG43Acc2.i:gc(coll-M7X9NoL0bQvI | Tue, Mar 6, 2:15 AM | Tue, Mar 6, 2:15 AM | +0   | collector=cache.general           | pid=97929, host=Orbital.local, argv= |
| 2799 | ph:local-jrQG43Acc2.i:gc(coll-qRWtIsymKjKf | Tue, Mar 6, 2:15 AM | Tue, Mar 6, 2:15 AM | +0   | collector=cache.markup            | pid=97929, host=Orbital.local, argv= |
...
| 2896 | ph:local-jrQG43Acc2.i:gc(coll-W7L0ZOgcc0zm | Tue, Mar 6, 2:22 AM | Tue, Mar 6, 2:22 AM | +0   | collector=system.destruction.logs | pid=98181, host=Orbital.local, argv= |
| 2897 | ph:local-jrQG43Acc2.i:calenda-sD4JjsyZLryp | Tue, Mar 6, 2:22 AM | Tue, Mar 6, 2:22 AM | +0   |                                   | pid=98181, host=Orbital.local, argv= |
+======+============================================+=====================+=====================+======+===================================+======================================+

If you know the name of the lock you want information about (for example, because you just received a LockException), you can use the --name argument to filter the log:

phabricator/ $ ./bin/lock log --name <lock-name>

In the output table:

  • Rows are written after a lock is acquired.
  • "Released" shows when the lock was released. If this column is empty, the lock was not explicitly released (and may still be held).
  • "Held" shows how many seconds the lock was held for.
  • "Parameters" contains additional information about exactly what the lock was locking.
  • "Context" contains additional information about the process and host which acquired the lock.