Page MenuHomePhabricator

Give the commitownersparser a little more time
ClosedPublic

Authored by sowedance on Apr 14 2014, 10:20 PM.
Tags
None
Referenced Files
F14392546: D8773.id20818.diff
Sat, Dec 21, 10:23 PM
Unknown Object (File)
Thu, Dec 19, 1:03 PM
Unknown Object (File)
Thu, Dec 19, 10:55 AM
Unknown Object (File)
Mon, Dec 16, 1:48 AM
Unknown Object (File)
Fri, Dec 13, 12:29 AM
Unknown Object (File)
Wed, Dec 11, 5:12 AM
Unknown Object (File)
Tue, Dec 10, 7:06 AM
Unknown Object (File)
Mon, Dec 9, 3:15 AM
Subscribers

Details

Summary

Recently we see issues with huge commits (branch cuts for www) where people received hundreds of emails for the same commit. By checking all the active and archived tasks related to such commits, I saw the following pattern:

  • The commit itself is marked as importStatus = 15 which means all the processing was actually done;
  • In archived tasks, I see one PhabricatorRepositorySvnCommitMessageParserWorker, one PhabricatorRepositorySvnCommitChangeParserWorker, followed by many PhabricatorRepositoryCommitHeraldWorker, which means that the PhabricatorRepositoryCommitOwnersWorker (who schedule those herald tasks) was never done;
  • PhabricatorRepositoryCommitOwnersWorker is always active (for days) with failureCount = 0;
  • In daemon log I see a lot of lease expire exception for PhabricatorRepositoryCommitOwnersWorker.

So to me it looks like the following happened:

  • Everything is fine until we schedule the PhabricatorRepositoryCommitOwnersWorker
  • PhabricatorRepositoryCommitOwnersWorker actually successfully finished but its running time exceed 60s. Before it finishes, it scheduled the PhabricatorRepositoryCommitHeraldWorker task
  • When we try to archive it, the lease expiration exception happened. As a result, it stayed active and will be picked up immediately since it is in the head of the queue
  • The two steps above repeat forever until we kill it

I am not sure why we want to check lease expiration when we are archiving the task. For now I am giving the worker a little more time since parsing half million affected path needs some time..

Test Plan

Patched in our production and it worked.

Diff Detail

Repository
rP Phabricator
Branch
hugecommit
Lint
Lint Passed
Unit
Tests Skipped

Event Timeline

sowedance retitled this revision from to Give the commitownersparser a little more time.
sowedance updated this object.
sowedance edited the test plan for this revision. (Show Details)
sowedance edited edge metadata.

Add a little checking into the parser workers. Just return in case the commit is already imported to that level.

epriestley edited edge metadata.

Nice work!

I am not sure why we want to check lease expiration when we are archiving the task.

If we just archive the task, another worker may be working on it already, and it would fail when it tried to archive the task. This would lead to an error far away from the original error.

Particularly, this should have been logging an error every time it expired.

That said, the behavior you describe isn't very good. Let me think about ways to improve it. An immediate, indirect fix we can apply is to move followup task scheduling to be conditional on success.

We should revert the isPartiallyImported() checks. They will make these parsers exit immediately when re-run explicitly, which will make debugging much harder.

src/applications/repository/worker/PhabricatorRepositoryCommitHeraldWorker.php
15–18 ↗(On Diff #20816)

We should revert this change, it prevents debugging with scripts/repository/reparse.php --herald.

This revision now requires changes to proceed.Apr 14 2014, 10:41 PM
sowedance edited edge metadata.

Updated based on comments.

This revision is now accepted and ready to land.Apr 14 2014, 10:51 PM
epriestley updated this revision to Diff 20818.

Closed by commit rP6a4f12600034 (authored by @sowedance, committed by @epriestley).