Page MenuHomePhabricator

Adding big existing repository not successful
Closed, ResolvedPublic

Description

We have a huge HG repo (takes 24G disk space, couple hundred branches) that I'm not able to successfully add in phabricator. I get a following error after couple of hours :

[2014-01-08 20:05:29] EXCEPTION: (PhutilProxyException) Error while fetching changes to the 'M' repository. {>} (Exception) Failed to proc_open(): proc_open(): fork failed - Cannot allocate memory {>} (Exception) Failed to proc_open(): proc_open(): fork failed - Cannot allocate memory at [/home/sadmin/phabricator/libphutil/src/future/exec/ExecFuture.php:649] #0 phlog(Object PhutilProxyException) called at [/home/sadmin/phabricator/phabricator/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php:181] #1 PhabricatorRepositoryPullLocalDaemon::run() called at [/home/sadmin/phabricator/libphutil/src/daemon/PhutilDaemon.php:85] #2 PhutilDaemon::execute() called at [/home/sadmin/phabricator/libphutil/scripts/daemon/exec/exec_daemon.php:112]

I've also checked and it seems that importing(stuck at 0%) didn't start so I presumed that it must happen during discovery. I run discovery with verbose and it seems to iterate over all branches even though I selected only one to be tracked. (and it's also not successful, ends with the same error)

I have latest version of phabricator (updated on 08.01.2014.)

There's only 4G of memory on my machine, is there req that physical memory needs to match size of the repo ?

Event Timeline

artek raised the priority of this task from to Needs Triage.
artek updated the task description. (Show Details)
artek added a project: Diffusion.
artek added a subscriber: artek.

How many total commits does the repository contain?

You could try stopping the daemons and running this, although it may not work much better (but the output might give us more information):

phabricator/ $ ./bin/repository discover M --trace

When you initially import a repository, we load the entire commit graph into memory so we can process it in a resumable fashion. Currently, we assume it will fit, but if you have >50M commits it might not. We might create other types of memory pressure which lowers the effective ceiling, too; the largest repositories we've run into in practice so far are in the 10M commit range.

That said, I'm surprised by the exact nature of this error. So let's do this:

  • Try running the repository discover command above, and if that fails, give me the last few hundred lines of output.
  • Hopefully that will give me a better idea about exactly what's going on.
  • If the root cause is memory pressure from size of the commit graph, it should be easy to resolve.

Generally, once initial import completes we don't perform any operations which depend on the entire repository, so I'd anticipate that we'll be OK if we can get past this.

Thanks for quick response. There're slightly over 90k changesets in the repo (there are also couple of svn subrepos if that makes any difference). I ran discovery command you suggested, generally bunch of messages like this :

>>> [2120852] <query> SELECT * FROM `repository_commit` WHERE repositoryID = 18 AND commitIdentifier = '6325fbd6a86c3268999f6d3c45d20860c32b5255'
<<< [2120852] <query> 231 us

Memory usage of php process is steadily growing so it will probably fail when all memory is exhausted. Once it does I'll post last lines

Hmm, I'm surprised we're hitting issues at only 90K changesets. But yeah, let me know where it fails. Thanks!

I checked source code(discoverMercurialCommits), it seems commitCache will only store in my case(repo never successfully imported) commits of tips of branches, so all duplicated commits from branches(ones that have common ancestor) that weren't merged and it seems also from inactive branches that are processed before branch that they were merged to will be stored in memory(I guess in refs[]). That might be my problem, with 300 branches and 90k commits I can easily get into millions. Also it seems shouldTrackBranch will return false only for git repos, removing this constraint might actually help me, in the end I'm only interested in one branch, will try it in my local copy of ph, is there any danger (data corruption, instability) in doing so ?

discover failed , and here's the last lines :

EXCEPTION: (Exception) Failed to proc_open(): proc_open(): fork failed - Cannot allocate memory {>} (Exception) Failed to proc_open(): proc_open(): fork failed - Cannot allocate memory at [/home/sadmin/phabricator/libphutil/src/future/exec/ExecFuture.php:649]
  #0 ExecFuture::isReady() called at [/home/sadmin/phabricator/libphutil/src/filesystem/linesofalarge/LinesOfALargeExecFuture.php:103]
  #1 LinesOfALargeExecFuture::readMore() called at [/home/sadmin/phabricator/libphutil/src/filesystem/linesofalarge/LinesOfALarge.php:186]
  #2 LinesOfALarge::next() called at [/home/sadmin/phabricator/libphutil/src/filesystem/linesofalarge/LinesOfALarge.php:107]
  #3 LinesOfALarge::rewind() called at [/home/sadmin/phabricator/phabricator/src/applications/repository/daemon/PhabricatorMercurialGraphStream.php:26]
  #4 PhabricatorMercurialGraphStream::__construct(Object PhabricatorRepository) called at [/home/sadmin/phabricator/phabricator/src/applications/repository/engine/PhabricatorRepositoryDiscoveryEngine.php:184]
  #5 PhabricatorRepositoryDiscoveryEngine::discoverMercurialAncestry(Object PhabricatorRepository, 5d6c5d5a8434e649d92a59c8849c20cb0d505804) called at [/home/sadmin/phabricator/phabricator/src/applications/repository/engine/PhabricatorRepositoryDiscoveryEngine.php:166]
  #6 PhabricatorRepositoryDiscoveryEngine::discoverMercurialCommits() called at [/home/sadmin/phabricator/phabricator/src/applications/repository/engine/PhabricatorRepositoryDiscoveryEngine.php:44]
  #7 PhabricatorRepositoryDiscoveryEngine::discoverCommits() called at [/home/sadmin/phabricator/phabricator/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php:236]
  #8 PhabricatorRepositoryPullLocalDaemon::discoverRepository(Object PhabricatorRepository) called at [/home/sadmin/phabricator/phabricator/src/applications/repository/management/PhabricatorRepositoryManagementDiscoverWorkflow.php:44]
  #9 PhabricatorRepositoryManagementDiscoverWorkflow::execute(Object PhutilArgumentParser) called at [/home/sadmin/phabricator/libphutil/src/parser/argument/PhutilArgumentParser.php:396]
  #10 PhutilArgumentParser::parseWorkflowsFull(Array of size 9 starting with: { PhabricatorRepositoryManagementDeleteWorkflow => Object PhabricatorRepositoryManagementDeleteWorkflow }) called at [/home/sadmin/phabricator/libphutil/src/parser/argument/PhutilArgumentParser.php:292]
  #11 PhutilArgumentParser::parseWorkflows(Array of size 9 starting with: { PhabricatorRepositoryManagementDeleteWorkflow => Object PhabricatorRepositoryManagementDeleteWorkflow }) called at [/home/sadmin/phabricator/phabricator/scripts/repository/manage_repositories.php:22]
artek claimed this task.

I made dirty change in my local copy to allow branch filtering also for mercurial and discovery finally passed, repo started importing, issue solved :)

I have the same problem, 30K commits and about 400 branches. I only have 8GB RAM, configured PHP to allow 8GB, and saw phd go to 8GB and then fail. Here's the log:

>>> [6100346] <query> SELECT * FROM `repository_commit` WHERE repositoryID = 2 AND commitIdentifier = '6fd842d0bb48ec712d49b3011c3d0c80f4d89c8c'
<<< [6100346] <query> 296 us
>>> [6100347] <query> SELECT * FROM `repository_commit` WHERE repositoryID = 2 AND commitIdentifier = 'd51b9737a4872ede46755be27800475135f748ed'
<<< [6100347] <query> 569 us
>>> [6100348] <exec> $ git log --format='%H%x01%P%x01%ct' 'd51b9737a4872ede46755be27800475135f748ed' --
>>> [6100349] <exec> $ git log --format='%H%x01%P%x01%ct' 'd51b9737a4872ede46755be27800475135f748ed' --
[2014-02-12 08:49:25] EXCEPTION: (Exception) Failed to proc_open(): proc_open(): fork failed - Cannot allocate memory {>} (Exception) Failed to proc_open(): proc_open(): fork failed - Cannot allocate memory at [/home/ubuntu/libphutil/src/future/exec/ExecFuture.php:649]
  #0 ExecFuture::isReady() called at [/home/ubuntu/libphutil/src/filesystem/linesofalarge/LinesOfALargeExecFuture.php:103]
  #1 LinesOfALargeExecFuture::readMore() called at [/home/ubuntu/libphutil/src/filesystem/linesofalarge/LinesOfALarge.php:186]
  #2 LinesOfALarge::next() called at [/home/ubuntu/libphutil/src/filesystem/linesofalarge/LinesOfALarge.php:107]
  #3 LinesOfALarge::rewind() called at [/home/ubuntu/phabricator/src/applications/repository/daemon/PhabricatorGitGraphStream.php:25]
  #4 PhabricatorGitGraphStream::__construct(Object PhabricatorRepository, d51b9737a4872ede46755be27800475135f748ed) called at [/home/ubuntu/phabricator/src/applications/repository/engine/PhabricatorRepositoryDiscoveryEngine.php:117]
  #5 PhabricatorRepositoryDiscoveryEngine::discoverGitCommits() called at [/home/ubuntu/phabricator/src/applications/repository/engine/PhabricatorRepositoryDiscoveryEngine.php:47]
  #6 PhabricatorRepositoryDiscoveryEngine::discoverCommits() called at [/home/ubuntu/phabricator/src/applications/repository/management/PhabricatorRepositoryManagementDiscoverWorkflow.php:45]
  #7 PhabricatorRepositoryManagementDiscoverWorkflow::execute(Object PhutilArgumentParser) called at [/home/ubuntu/libphutil/src/parser/argument/PhutilArgumentParser.php:396]
  #8 PhutilArgumentParser::parseWorkflowsFull(Array of size 11 starting with: { PhabricatorRepositoryManagementDeleteWorkflow => Object PhabricatorRepositoryManagementDeleteWorkflow }) called at [/home/ubuntu/libphutil/src/parser/argument/PhutilArgumentParser.php:292]
  #9 PhutilArgumentParser::parseWorkflows(Array of size 11 starting with: { PhabricatorRepositoryManagementDeleteWorkflow => Object PhabricatorRepositoryManagementDeleteWorkflow }) called at [/home/ubuntu/phabricator/scripts/repository/manage_repositories.php:22]

Artek, what exactly is the fix? I'm not well versed in PHP to dig in...

@chang888, I think your issue is covered in T4414. I'll CC you there and add a more detailed workaround. (That might actually have been the root issue here, too.)

Thank you! I tried to track only master and now the PHP processes are busy importing. It's been a few hours and I'm seeing it go from 0% to about 40% (and CPU is near 100% utilization). I do notice that to-be-sent emails are sitting in the queue (bin/mail list-outbound), is that due to the fact that the PHPs are busy importing?