Page MenuHomePhabricator

Importing repositories may incorrectly queue some tasks at "PRIORITY_COMMIT"
Closed, ResolvedPublic

Description

When Phabricator imports a repository for the first time (and in some other cases), the intended behavior is that import tasks are queued at a very low priority ("PRIORITY_IMPORT", currently the lowest priority available).

In some cases, import tasks are being queued at "PRIORITY_COMMIT" instead. This can interfere with import of other active repositories.

  • See PHI1874. An install encountered import queue delays that led to tasks in the queue at the wrong priority.
  • See PHI1953. An install encountered this issue while importing the Linux repository.
  • See PHI1979. An install anticipates importing a large repository soon, and things would likely go more smoothly with this bug fixed.
  • See also PHI1935, a tangential request for "pause a repository import".

These tasks are almost certainly being queued by the setCloseFlagOnCommits(...) pathway in PhabricatorRepositoryRefEngine. It's not entirely clear how this is being reached.

A natural mechanism would be to begin repository import with some refs marked as non-permanent, then later mark them as permanent before import completes. Commits reachable from those refs would be re-queued by setCloseFlagOnCommits(...). However, there's no real reason to believe this occurred in PHI1874, PHI1953, or elsewhere.

Since a natural pathway exists anyway, a narrow fix to setCloseFlagOnCommits(...) is reasonable, but this may not be a complete fix. In particular, if a separate bug is causing publishable commits to be initially queued as unpublishable, that could result in repository imports which take ~2X longer to complete with no strong indicators that such a bug exists.


Possible actions:

  • The code pathway in setCloseFlagOnCommits(...) should be unified with the pathway in DiscoveryEngine->getImportTaskPriority().
  • Is a bug where commits double-import reproducible?
  • Does the Change step still need to occur for Git repositories?
  • Make sure all commit tasks are inserting with objectPHID populated.
  • Consider adding more source/context data to the message tasks (and possibly propagating it down the chain), since they now have multiple reachability pathways.
  • Add a containerPHID to tasks and populate it with the repository PHID.
  • Support bulk priority actions against tasks with particular container PHIDs.
  • Update the daemon UI to break tasks down by object/container.

Event Timeline

epriestley triaged this task as Normal priority.Jan 22 2021, 6:05 PM
epriestley created this task.
epriestley added a project: Daemons.

I'm hoping to land at least a narrow fix for this today to support an import in PHI1979 tomorrow. However, I'd like to have a better understanding of how we're reaching this state, and I'm not satisfied that these repositories are going down the "natural" pathway (of changing ref definitions after the import starts) and suspect there is some more complicated interaction at play here.

Thus, I'm starting by attempting to reproduce this locally. An initial barrier to getting a short reproducibility loop here is that bin/repository discover has a --repair flag (from a long, long time ago in D2850), but it does not currently function correctly. The intended behavior of this flag is to rediscover all commits in the repository, to allow you to repair a repository in cases where you have (for example) run bin/remove destroy rXYZabc by mistake and destroyed a commit object.

The --repair flag doesn't work because discoverGitCommits() (and discoverMercurialCommits()) fills the cache with the head refs explicitly. This is an optimization from D8781.

Moving the test for repair mode into fillCommitCache() seems to be sufficient and non-harmful. No callers appear to interact with this cache in a surprising way or do anything unusual, the change in D8781 should just have been slightly more aggressive in lifting the body of fillCommitCache() out of isKnownCommit().

However, I'd like to have a better understanding of how we're reaching this state, and I'm not satisfied that these repositories are going down the "natural" pathway (of changing ref definitions after the import starts) and suspect there is some more complicated interaction at play here.

As it turns out, this yielded easily. While importing, PhabricatorRepositoryPublisher always returns false from shouldPublishRef() because the repository is importing.

So every commit is initially discovered as non-closable, and then all closable commits (usually "every commit") are later marked closable.

This arises from a confusion of concerns: the "closable" flag effectively has two separate meanings:

  • Is this commit reachable from a permanent ref?
  • Should this commit be published?

The meaning of this flag should only be "is this commit reachable from a permanent ref". To effect this change, I'm going to:

  • Rename IMPORTED_CLOSEABLE to IMPORTED_PERMANENT or similar, to make it more clear what this flag represents in modern Phabricator.
  • Make the DiscoveryEngine set this flag on commits reachable from a permanent ref even if they are discovered in an importing repository.

After that, I'll fix the "natural" (but hard to hit) pathway where you change permanent refs during import.

This also relates slightly to T13580, but I believe the two issues are addressable independently.

I have a change to add containerPHID locally, but it ends up having relatively high complexity because several other patches (including 20190909.herald.01.rebuild.php) call PhabricatorRebuildIndexesWorker::rebuildObjectsWithQuery(...), which does not work if executed in sequence prior to a worker queue schema change.

There's a mechanism for sequencing patches, but "autopatches" do not have access to it because nothing has needed it for many years.

The sequencing mechanism is also currently simple ("patch Y after patch X") but a better capability would be marking these patches as "apply after all schema changes". This gets somewhat messy.

Given my intent to promote these changes today and deploy them tomorrow, I'm going to leave this alone for now.

These parts seem likely resolved once I convince myself the patches so far actually work:

The code pathway in setCloseFlagOnCommits(...) should be unified with the pathway in DiscoveryEngine->getImportTaskPriority().

Done, in D21516.

Is a bug where commits double-import reproducible?

Yes, trivially.

Does the Change step still need to occur for Git repositories?

Maybe, maybe not. But it's not trivial to remove and thus probably not a good fit for scope here.

Make sure all commit tasks are inserting with objectPHID populated.

Done, in D21516.

Consider adding more source/context data to the message tasks (and possibly propagating it down the chain), since they now have multiple reachability pathways.

Done, in D21516.


Still open:

  • Add a containerPHID to tasks and populate it with the repository PHID.

Not in scope for this release because of database migration sequencing issues above.

  • Support bulk priority actions against tasks with particular container PHIDs.
  • Update the daemon UI to break tasks down by object/container.

Dependent on containerPHID.

After D21518:

  • Ran bin/worker execute --active to process all Message tasks (21m 26s).
  • Ran bin/worker execute --active to process all Change tasks (29m 29s).
  • Ran bin/worker execute --active to process all Publish tasks (4m 27s).
  • Ran bin/repository update ... to finish the import, got a correct-looking repository.

Update the daemon UI to break tasks down by object/container.

Since T13561 already discusses other changes related to the task queue that will require UI adjustments, I just kicked this to that task.

I think everything else here is reasonably settled.