Page MenuHomePhabricator

Reduce initial discovery from O(branches * commits) to O(commits)
ClosedPublic

Authored by epriestley on Feb 28 2014, 8:23 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 18, 7:30 AM
Unknown Object (File)
Fri, Jan 17, 2:19 AM
Unknown Object (File)
Fri, Jan 17, 1:21 AM
Unknown Object (File)
Wed, Jan 8, 7:28 PM
Unknown Object (File)
Tue, Dec 24, 12:09 PM
Unknown Object (File)
Sat, Dec 21, 5:36 PM
Unknown Object (File)
Sat, Dec 21, 3:00 PM
Unknown Object (File)
Dec 20 2024, 6:12 AM
Subscribers
Tokens
"Mountain of Wealth" token, awarded by btrahan.

Details

Summary

Fixes T4414. Currently, when we discover a new repository, we do something like this:

foreach (branch) {
  foreach (commit on this branch) {
    do_something();
  }
}

In cases where there are a lot of branches which mostly just branch master, this leads to us doing roughly O(branches * commits) work.

We have a commitCache to prevent this, but it has two problems:

  • It only fills out of the DB, and we do this whole thing before writing to the DB, which is the entire point.
  • It has a fixed size (2048) and on initial discovery we're usually dealing with way more commits than that.

Instead, also stop doing work if we hit a commit which is known already.

Test Plan
  • Added print on the number of discovered refs and number of unique refs.
  • Ran bin/repository discover --repair X on a repo with several branches.
  • Before the patch, got 397 refs and 135 unique refs.
  • After the patch, got 135 refs and 135 unique refs.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

btrahan edited edge metadata.
This revision is now accepted and ready to land.Feb 28 2014, 8:58 PM

"Mountain of wealth" because I think this is a surprisingly important diff to help folks who want to use Phabricator for repos that were / are active on github. :D

Yeah -- I think this worked better before, but that I accidentally broke this at some point during all the cleanup / refactoring from about a month ago. The foreach (branch) might have happened outside of the discovery engine a while ago, maybe? It's been cropping up fairly often recently and I'd never seen it before one report immediately before T4414, so I think this is a recent-ish regression. It would have been easy for me to miss in refactoring, since I'm usually focused on correctness more than asymptotic runtime.