Reduce initial discovery from O(branches * commits) to O(commits)
Summary:
Fixes T4414. Currently, when we discover a new repository, we do something like this:
foreach (branch) { foreach (commit on this branch) { do_something(); } }
In cases where there are a lot of branches which mostly just branch master, this leads to us doing roughly O(branches * commits) work.
We have a commitCache to prevent this, but it has two problems:
- It only fills out of the DB, and we do this whole thing before writing to the DB, which is the entire point.
- It has a fixed size (2048) and on initial discovery we're usually dealing with way more commits than that.
Instead, also stop doing work if we hit a commit which is known already.
Test Plan:
- Added print on the number of discovered refs and number of unique refs.
- Ran bin/repository discover --repair X on a repo with several branches.
- Before the patch, got 397 refs and 135 unique refs.
- After the patch, got 135 refs and 135 unique refs.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4414
Differential Revision: https://secure.phabricator.com/D8374