Page MenuHomePhabricator

Allow Diffusion repositories to disable "push" Herald Rules to improve support for importing large repositories by pushing them
Open, Needs TriagePublic

Description

When importing a super-mega-ultra-huge repository into Phabricator, while the import chugs along, it is incredibly slow in two distinct ways:

  1. The initial git push seemingly hangs at the end of transferring all the objects - but it is not hung, it is simply waiting on commit_hooks.php to run over every single commit first before finally accepting the push (because Herald rules can reject a push as well as a regular hook).
  2. The actual import of all the commits after the hooks have finished takes an absurdly long time. At 533,000 commits, a rate of 1000 commits per hour means it will take nearly 3 weeks to finish the repository import.

Note that you *will not* experience this unless you have a really big repository. Even GHC, the Glasgow Haskell Compiler, can import in less than 6 hours, and it has ~45k commits and ~400kLOC. In my case, I'm importing the Linux kernel for some changes (533,000 commits, over 15-16mil LOC at this point), so Phabricator can manage the private tree and push to mirrors.

As @epriestley noted on IRC, it would probably substantially improve both of these pain points if a particular repository could have Herald rules disabled, so that the hooks and import scripts can completely skip these queries for every single commit. Of course, Linux is still going to be slow - the git repository is so huge some operations can take noticeable seconds anyway, so some slowdown at this scale is probably unavoidable.

Also, this is a one time occurrence (it's pretty unlikely it'll be this much of a problem later, as a few thousand commits here or there should go through at least within a reasonable timeframe), but it particularly hurts on the initial import. I may actually want Herald rules on later, but I probably don't want them on an original clean import.

I'll also note some of the numbers might be off; in particular my machine is running btrfs which may be particularly slow on this workload. So I may kill it and try again on a new filesystem and report back.

Here are some steps to reproduce:

  1. Start up a VM, make a brand new Phabricator instance.
  2. Create a new, hosted repository.
  3. Push the Linux kernel into the new repository
  4. Go play a video game, since git push will take about 4-5 hours for commit_hooks.php to finish.
  5. After that's done, go ahead and take a vacation for 2-3 weeks. Hawaii is pretty sweet.
  6. OPTIONAL Return to work and realize you now have to git push several thousand more changes back in.

Event Timeline

thoughtpolice raised the priority of this task from to Needs Triage.
thoughtpolice updated the task description. (Show Details)
thoughtpolice added projects: Herald, Diffusion.
eadler added a project: Restricted Project.Jan 8 2016, 10:35 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jan 8 2016, 10:40 PM

Hasn't this been done for a long time? In the current form on the page diffusion/edit/.../page/actions/ ?

Disabling Herald there only disables the post-push, at-discovery-time rules, not the pre-push pre-receive-commit-hook rules.

(Although we do formally support and have documentation for the the "import, then make hosted" and "copy the whole working directory into place" methods of repository import now, both of which skip the pre-receive rules.)

eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 4 2016, 9:06 PM

The fix for T13031 may have made this worse by default, since we now extract all pushed changesets and hold them in memory by default, even if no Herald content-based rules which use them actually exist. This may create too much memory pressure if you push a very large number of commits.

Toggling the repository into "Allow Enormous Changes" mode will stop this from happening. This doesn't address the broader issue; Herald will still activate on these changes.

@epriestley Would you guys be open for a patch for this? How would you like to see this implemented in order to be acceptable for you guys?

We needed to fork some Android repos (2,5 GB with 250K commits) which track upstream and also ran in to this issue.

Sorry, the only way to influence feature development is through a support pact. See also Planning.

For almost all changes, the majority of the cost of the change is in design, testing, and maintenance. Usually, only a very small part of the cost of a change is in actually writing the patch, so offering to write the patch rarely decreases the total cost of a change (and can even increase it). In this case, I think it would probably take me longer to help you write the patch than to write the patch myself -- and in either case, I'm working on this instead of something a paying customer has asked us for.

As the old joke goes:

graphic-designer-price-list-client-helps-digital-synopsis-2.jpg (911×700 px, 93 KB)

epriestley renamed this task from Allow Diffusion repositories to disable Herald Rules to Allow Diffusion repositories to disable "push" Herald Rules to improve support for importing large repositories by pushing them.Apr 15 2019, 4:03 PM