See PHI1980.
A user pushing a large change over HTTP from Windows was able to put a clustered repository into an incorrect lock-held state at least twice.
The log has sets of these errors, although I think these are likely symptoms rather than causes:
[2021-01-21 17:04:44] EXCEPTION: (Exception) An previous write to this repository was interrupted; refusing new writes. This issue requires operator intervention to resolve, see "Write Interruptions" in the "Cluster: Repositories" in the documentation for instructions. at [<phabricator>/src/applications/diffusion/protocol/DiffusionRepositoryClusterEngine.php:374] arcanist(head=stable, ref.master=fad85844314b, ref.stable=ac54d61d7af2), libcore(), phabricator(head=stable, ref.stable=86ad69863930), services(head=stable, ref.master=b5cef1ac31ff, ref.stable=a2c67587b5df) #0 <#2> DiffusionRepositoryClusterEngine::synchronizeWorkingCopyBeforeWrite() called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:589] #1 <#2> DiffusionServeController::serveGitRequest(PhabricatorRepository, PhabricatorUser) called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:462] #2 <#2> DiffusionServeController::serveVCSRequest(PhabricatorRepository, PhabricatorUser) called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:395] #3 <#2> DiffusionServeController::serveRequest(AphrontRequest) called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:99] #4 <#2> DiffusionServeController::handleRequest(AphrontRequest) called at [<phabricator>/src/aphront/AphrontController.php:71] #5 <#2> AphrontController::delegateToController(DiffusionServeController) called at [<phabricator>/src/applications/diffusion/controller/DiffusionController.php:27] #6 <#2> DiffusionController::willBeginExecution() called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:275] #7 phlog(Exception) called at [<phabricator>/src/aphront/handler/PhabricatorDefaultRequestExceptionHandler.php:41] #8 PhabricatorDefaultRequestExceptionHandler::handleRequestThrowable(AphrontRequest, Exception) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:752] #9 AphrontApplicationConfiguration::handleThrowable(Exception) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:300] #10 AphrontApplicationConfiguration::processRequest(AphrontRequest, PhutilDeferredLog, AphrontPHPHTTPSink, MultimeterControl) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:211] #11 AphrontApplicationConfiguration::runHTTPRequest(AphrontPHPHTTPSink) called at [<phabricator>/webroot/index.php:35] [2021-01-21 17:04:44] EXCEPTION: (PhutilAggregateException) Encountered a processing exception, then another exception when trying to build a response for the first exception.\n - PhabricatorDataNotAttachedException: Attempting to access attached data on PhabricatorUser (via getUserSetting()), but the data is not actually attached. Before accessing attachable data on an object, you must load and attach it.\n \n Data is normally attached by calling the corresponding needX() method on the Query class when the object is loaded. You can also call the corresponding attachX() method explicitly.\n - Exception: An previous write to this repository was interrupted; refusing new writes. This issue requires operator intervention to resolve, see "Write Interruptions" in the "Cluster: Repositories" in the documentation for instructions. at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:320] arcanist(head=stable, ref.master=fad85844314b, ref.stable=ac54d61d7af2), libcore(), phabricator(head=stable, ref.stable=86ad69863930), services(head=stable, ref.master=b5cef1ac31ff, ref.stable=a2c67587b5df) #0 <#4> DiffusionRepositoryClusterEngine::synchronizeWorkingCopyBeforeWrite() called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:589] #1 <#4> DiffusionServeController::serveGitRequest(PhabricatorRepository, PhabricatorUser) called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:462] #2 <#4> DiffusionServeController::serveVCSRequest(PhabricatorRepository, PhabricatorUser) called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:395] #3 <#4> DiffusionServeController::serveRequest(AphrontRequest) called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:99] #4 <#4> DiffusionServeController::handleRequest(AphrontRequest) called at [<phabricator>/src/aphront/AphrontController.php:71] #5 <#4> AphrontController::delegateToController(DiffusionServeController) called at [<phabricator>/src/applications/diffusion/controller/DiffusionController.php:27] #6 <#4> DiffusionController::willBeginExecution() called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:275] #7 <#3> PhabricatorUser::requireCacheData(string) called at [<phabricator>/src/applications/people/storage/PhabricatorUser.php:375] #8 <#3> PhabricatorUser::getUserSetting(string) called at [<phabricator>/src/applications/people/storage/PhabricatorUser.php:432] #9 <#3> PhabricatorUser::getTimezoneIdentifier() called at [<phabricator>/src/applications/people/storage/PhabricatorUser.php:565] #10 <#3> PhabricatorUser::getTimeZone() called at [<phabricator>/src/applications/people/storage/PhabricatorUser.php:569] #11 <#3> PhabricatorUser::getTimeZoneOffset() called at [<phabricator>/src/view/page/PhabricatorStandardPageView.php:247] #12 <#3> PhabricatorStandardPageView::willRenderPage() called at [<phabricator>/src/view/page/AphrontPageView.php:46] #13 <#3> AphrontPageView::render() called at [<phabricator>/src/applications/base/controller/PhabricatorController.php:286] #14 <#3> PhabricatorController::willSendResponse(AphrontDialogResponse) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:304] #15 <#2> AphrontApplicationConfiguration::processRequest(AphrontRequest, PhutilDeferredLog, AphrontPHPHTTPSink, MultimeterControl) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:211] #16 <#2> AphrontApplicationConfiguration::runHTTPRequest(AphrontPHPHTTPSink) called at [<phabricator>/webroot/index.php:35] #17 phlog(PhutilAggregateException) called at [<phabricator>/src/aphront/response/AphrontUnhandledExceptionResponse.php:32] #18 AphrontUnhandledExceptionResponse::setException(PhutilAggregateException) called at [<phabricator>/webroot/index.php:46]
This is probably the cause:
[Thu Jan 21 00:52:23.597848 2021] [:error] [pid 3595] [client 172.30.0.14:43024] >>> UNRECOVERABLE FATAL ERROR <<<\n\nMaximum execution time of 30 seconds exceeded\n\n/core/lib/arcanist/src/utils/PhutilRope.php:133
We currently set_time_limit(0) before processing LFS requests, but not before processing normal writes. Remedies are likely:
- Fix the handling of the lock-held exception.
- Disengage the time limit before entering the write-locked repository section.
- The repository UI could be more clear that a repository currently has a long-held write lock.