Page MenuHomePhabricator

Large HTTP repository pushes may leave clustered repository write lock held
Closed, ResolvedPublic

Description

See PHI1980.

A user pushing a large change over HTTP from Windows was able to put a clustered repository into an incorrect lock-held state at least twice.

The log has sets of these errors, although I think these are likely symptoms rather than causes:

[2021-01-21 17:04:44] EXCEPTION: (Exception) An previous write to this repository was interrupted; refusing new writes. This issue requires operator intervention to resolve, see "Write Interruptions" in the "Cluster: Repositories" in the documentation for instructions. at [<phabricator>/src/applications/diffusion/protocol/DiffusionRepositoryClusterEngine.php:374]
arcanist(head=stable, ref.master=fad85844314b, ref.stable=ac54d61d7af2), libcore(), phabricator(head=stable, ref.stable=86ad69863930), services(head=stable, ref.master=b5cef1ac31ff, ref.stable=a2c67587b5df)
  #0 <#2> DiffusionRepositoryClusterEngine::synchronizeWorkingCopyBeforeWrite() called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:589]
  #1 <#2> DiffusionServeController::serveGitRequest(PhabricatorRepository, PhabricatorUser) called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:462]
  #2 <#2> DiffusionServeController::serveVCSRequest(PhabricatorRepository, PhabricatorUser) called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:395]
  #3 <#2> DiffusionServeController::serveRequest(AphrontRequest) called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:99]
  #4 <#2> DiffusionServeController::handleRequest(AphrontRequest) called at [<phabricator>/src/aphront/AphrontController.php:71]
  #5 <#2> AphrontController::delegateToController(DiffusionServeController) called at [<phabricator>/src/applications/diffusion/controller/DiffusionController.php:27]
  #6 <#2> DiffusionController::willBeginExecution() called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:275]
  #7 phlog(Exception) called at [<phabricator>/src/aphront/handler/PhabricatorDefaultRequestExceptionHandler.php:41]
  #8 PhabricatorDefaultRequestExceptionHandler::handleRequestThrowable(AphrontRequest, Exception) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:752]
  #9 AphrontApplicationConfiguration::handleThrowable(Exception) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:300]
  #10 AphrontApplicationConfiguration::processRequest(AphrontRequest, PhutilDeferredLog, AphrontPHPHTTPSink, MultimeterControl) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:211]
  #11 AphrontApplicationConfiguration::runHTTPRequest(AphrontPHPHTTPSink) called at [<phabricator>/webroot/index.php:35]
[2021-01-21 17:04:44] EXCEPTION: (PhutilAggregateException) Encountered a processing exception, then another exception when trying to build a response for the first exception.\n    - PhabricatorDataNotAttachedException: Attempting to access attached data on PhabricatorUser (via getUserSetting()), but the data is not actually attached. Before accessing attachable data on an object, you must load and attach it.\n      \n      Data is normally attached by calling the corresponding needX() method on the Query class when the object is loaded. You can also call the corresponding attachX() method explicitly.\n    - Exception: An previous write to this repository was interrupted; refusing new writes. This issue requires operator intervention to resolve, see "Write Interruptions" in the "Cluster: Repositories" in the documentation for instructions. at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:320]
arcanist(head=stable, ref.master=fad85844314b, ref.stable=ac54d61d7af2), libcore(), phabricator(head=stable, ref.stable=86ad69863930), services(head=stable, ref.master=b5cef1ac31ff, ref.stable=a2c67587b5df)
  #0 <#4> DiffusionRepositoryClusterEngine::synchronizeWorkingCopyBeforeWrite() called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:589]
  #1 <#4> DiffusionServeController::serveGitRequest(PhabricatorRepository, PhabricatorUser) called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:462]
  #2 <#4> DiffusionServeController::serveVCSRequest(PhabricatorRepository, PhabricatorUser) called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:395]
  #3 <#4> DiffusionServeController::serveRequest(AphrontRequest) called at [<phabricator>/src/applications/diffusion/controller/DiffusionServeController.php:99]
  #4 <#4> DiffusionServeController::handleRequest(AphrontRequest) called at [<phabricator>/src/aphront/AphrontController.php:71]
  #5 <#4> AphrontController::delegateToController(DiffusionServeController) called at [<phabricator>/src/applications/diffusion/controller/DiffusionController.php:27]
  #6 <#4> DiffusionController::willBeginExecution() called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:275]
  #7 <#3> PhabricatorUser::requireCacheData(string) called at [<phabricator>/src/applications/people/storage/PhabricatorUser.php:375]
  #8 <#3> PhabricatorUser::getUserSetting(string) called at [<phabricator>/src/applications/people/storage/PhabricatorUser.php:432]
  #9 <#3> PhabricatorUser::getTimezoneIdentifier() called at [<phabricator>/src/applications/people/storage/PhabricatorUser.php:565]
  #10 <#3> PhabricatorUser::getTimeZone() called at [<phabricator>/src/applications/people/storage/PhabricatorUser.php:569]
  #11 <#3> PhabricatorUser::getTimeZoneOffset() called at [<phabricator>/src/view/page/PhabricatorStandardPageView.php:247]
  #12 <#3> PhabricatorStandardPageView::willRenderPage() called at [<phabricator>/src/view/page/AphrontPageView.php:46]
  #13 <#3> AphrontPageView::render() called at [<phabricator>/src/applications/base/controller/PhabricatorController.php:286]
  #14 <#3> PhabricatorController::willSendResponse(AphrontDialogResponse) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:304]
  #15 <#2> AphrontApplicationConfiguration::processRequest(AphrontRequest, PhutilDeferredLog, AphrontPHPHTTPSink, MultimeterControl) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:211]
  #16 <#2> AphrontApplicationConfiguration::runHTTPRequest(AphrontPHPHTTPSink) called at [<phabricator>/webroot/index.php:35]
  #17 phlog(PhutilAggregateException) called at [<phabricator>/src/aphront/response/AphrontUnhandledExceptionResponse.php:32]
  #18 AphrontUnhandledExceptionResponse::setException(PhutilAggregateException) called at [<phabricator>/webroot/index.php:46]

This is probably the cause:

[Thu Jan 21 00:52:23.597848 2021] [:error] [pid 3595] [client 172.30.0.14:43024] >>> UNRECOVERABLE FATAL ERROR <<<\n\nMaximum execution time of 30 seconds exceeded\n\n/core/lib/arcanist/src/utils/PhutilRope.php:133

We currently set_time_limit(0) before processing LFS requests, but not before processing normal writes. Remedies are likely:

  • Fix the handling of the lock-held exception.
  • Disengage the time limit before entering the write-locked repository section.
  • The repository UI could be more clear that a repository currently has a long-held write lock.

Event Timeline

epriestley triaged this task as Normal priority.Jan 21 2021, 6:49 PM
epriestley created this task.

The affected user in PHI1980 reported that this appeared to be effective in resolving the issue.