Phabricator Mercurial Repo “too many files open” error
Open, Needs TriagePublic

Description

We have a staging server that I threw Phabricator on to test it out and see how it would function. I like it for the most part and despite this error it seems to be working ok.

Since this is our staging server, I already had a working Mercurial repo that is locally available. I looked around and couldn't find anywhere where I could tell Phabricator to point to a local repo, so I did a hack by using an ssh key to localhost (ssh://user@127.0.0.1//path/to/repo) as a work around and that seems to have done the job. It's only in Observe Mode, so I figured it should be safe to mess with.

After the repo fully imported, I now get the following error when I view the status of the repo:

Error updating working copy: Failed to proc_open(): proc_open(): unable to create pipe Too many open files

Here's a screenshot, just because I can :)

That error has been there ever since and hasn't gone away.

When I view what's going on with the server with htop, I just see a wall of hg commands (that's it, just hg) and a bunch of /usr/bin/python /usr/bin/hg log --template={node} --rev 'some rev hash' The cpus all peg at 100% while this happens (every 15 seconds or so like the setting in the screenshot) and then it goes away after about 5-10 seconds. I could push the code to bitbucket and then just pull from bitbucket but wanted to see if I could resolve this first this way.

My question ultimately is, is my workaround causing this or is this a bug that needs to be looked at? Is this expected behavior on a Phabricator server?

Some other details that may be helpful:

CentOS 6.8

PHP 5.6.24

Mercurial 3.9.1

Staging Repo has about 10,000 commits and 1,000 or so branches.

phabricator b521f2349e46888514ce55d06ace1d8c5fa6df03 (Mon, Aug 22)
arcanist 89e8b48523844cc3eff8b775f8fae49e85f8fc22 (Fri, Aug 19)
phutil 237549280f08feb8852857b1d109dfb0aba345f0 (Mon, Aug 22)

n0nag0n created this task.Aug 26 2016, 3:10 PM
avivey added a subscriber: avivey.Aug 26 2016, 3:15 PM

You're work-around is fine.
What does cat /proc/sys/fs/file-max say? Is it possible you just have too many open files?

cat /proc/sys/fs/file-max
1620069

Than probably not a reasonable case of "too many open files".

Having CPU at 100% for a while while importing is expected, but failing to proc_open isn't; I suspect that the pull phase has already completed (/projects/phabricator/2 should contain a full clone of the repo), so switching to BB now will not resolve anything.

Yeah I just double checked, it is a full clone of the original repo at the time it was at, but looking at it now, it hasn't picked up a new commit for 3 days.

n0nag0n added a comment.EditedAug 26 2016, 8:15 PM

Well scratch that last comment. It looks like it has pulled in commits from the last 3 days. It looks like under recent commits when I'm viewing the repo doesn't have the most recent commits from 3 days ago, but I just installed a panel on the dashboard showing the most recent commits, and all my commits area there. The panel runs off Audit, and the repo Diffusion.....maybe there's a disconnect? idk.

If you try to run /usr/bin/python /usr/bin/hg log --template={node} --rev 'some rev hash' manually in /projects/phabricator/2, does it break or do you get anything useful back?

I suspect this is something to do with your setup, which we can't really solve - see Contributing Bug Reports for some common issues / troubleshooting tips.

n0nag0n added a comment.EditedAug 26 2016, 9:05 PM

Well I typed in one of those...

/usr/bin/python /usr/bin/hg log --template={node} --rev 'fe18e54bbeb9fde1d57ff881986172c64958ccf'

and I get this back

abort: unknown revision 'fe18e54bbeb9fde1d57ff881986172c64958ccf'!

Now that I think about it, those revisions are looking a little long. Any ideas on what I can check to make sure that I did something dumb as opposed to it being an actual bug? I'll look again at the Contributing Bug Reports link. I just barely upgraded Phabricator per the instructions here

n0nag0n added a comment.EditedAug 26 2016, 9:33 PM

/var/log/httpd/error_log shows nothing with proc_open()

I just did ./bin/repository update --trace <repository> and I saw what it was doing....

>>> [26] <query> SELECT `r`.*, r.* FROM `repository` r  WHERE (r.phid IN ('lots of rnadom hashids' and it cut off so I don't know the rest of the query...))
<<< [26] <query> 713 us
>>> [27] <exec> $ hg log --template '{node}\1{branch}\2' --rev 'head()'
>>> [28] <exec> $ hg log --template '{node}\1{branch}\2' --rev 'head() and not closed()'
<<< [27] <exec> 350,224 us
<<< [28] <exec> 405,551 us

Then we go into the hg log stuff....

>>> [29] <exec> $ hg log --template='{node}' --rev ''\''6f4eb4b4ed856ec0faeeb2ae18afec6cbef089f9'\'''
<<< [30] <exec> $ hg log --template='{node}' --rev ''\''afcc57f0ef52bdc5aa20cde8a8048c2f91148a77'\'''
# etc until 
>>> [536] <exec> $ hg log --template='{node}' --rev ''\''caac1231de800ccaa18c9698ae72f6cb7dfca637'\'''

Then it has

<<< [29] <exec> 22,239,233 us
<<< [30] <exec> 22,236,037 us
# which match until 
<<< [535] <exec> 803,143 us

There is no [536], it skips to [537] which is a query to insert status message

>>> [537] <query> INSERT INTO `repository_statusmessage`
          (repositoryID, statusType, statusCode, parameters, epoch)
          VALUES (2, 'fetch', 'error', '{\"message\":\"Error updating working copy: Failed to `proc_open()`: proc_open(): unable to create pipe Too many open files\"}', 1472246455)

And a stack trace it looks like:

[2016-08-26 15:20:55] EXCEPTION: (Exception) Failed to `proc_open()`: proc_open(): unable to create pipe Too many open files at [<phutil>/src/future/exec/ExecFuture.php:671]
arcanist(head=master, ref.master=9e82ef979e81), phabricator(head=master, ref.master=067d12d7168a), phutil(head=master, ref.master=5fd1af8b4f2b)
  #0 ExecFuture::isReady() called at [<phutil>/src/future/FutureIterator.php:198]
  #1 FutureIterator::next() called at [<phutil>/src/future/FutureIterator.php:165]
  #2 FutureIterator::rewind() called at [<phabricator>/src/applications/diffusion/query/lowlevel/DiffusionLowLevelResolveRefsQuery.php:267]
  #3 DiffusionLowLevelResolveRefsQuery::resolveMercurialRefs() called at [<phabricator>/src/applications/diffusion/query/lowlevel/DiffusionLowLevelResolveRefsQuery.php:44]
  #4 DiffusionLowLevelResolveRefsQuery::executeQuery() called at [<phabricator>/src/applications/diffusion/query/lowlevel/DiffusionLowLevelQuery.php:23]
  #5 DiffusionLowLevelQuery::execute() called at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryRefEngine.php:185]
  #6 PhabricatorRepositoryRefEngine::removeMissingCommits(array) called at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryRefEngine.php:71]
  #7 PhabricatorRepositoryRefEngine::updateRefs() called at [<phabricator>/src/applications/repository/management/PhabricatorRepositoryManagementUpdateWorkflow.php:141]
  #8 PhabricatorRepositoryManagementUpdateWorkflow::updateRepositoryRefs(PhabricatorRepository) called at [<phabricator>/src/applications/repository/management/PhabricatorRepositoryManagementUpdateWorkflow.php:78]
  #9 PhabricatorRepositoryManagementUpdateWorkflow::execute(PhutilArgumentParser) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:438]
  #10 PhutilArgumentParser::parseWorkflowsFull(array) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:330]
  #11 PhutilArgumentParser::parseWorkflows(array) called at [<phabricator>/scripts/repository/manage_repositories.php:22]

Is it possible this could be a problem if there are a lot of branches that haven't closed?

I don't know a lot about hg internals, but 1,000 branches and 10,000 commits doesn't sound like too much to process.

>>> [536] <exec> $ hg log --template='{node}' --rev ''\''caac1231de800ccaa18c96
98ae72f6cb7dfca637'\'''

Is that a space in the commit hash?

Yeah sorry a " " snuck in there while I was copy/pasting it.

Do you have any other repositories to try this on? Smaller ones, or some public repo from BitBucket?

TiTi added a subscriber: TiTi.Nov 10 2016, 2:01 PM

Hello,

Just had this issue today on our mercurial repository. Exact same exception when executing ./bin/repository update --trace <repository>
I've managed to solve it.

# maximum number of file handles in kernel
sudo sysctl fs.file-nr

Last number is really high on my machine (>300 000) so limit not likely reached. [Explanation of other numbers in first link at bottom]

# log as user running phabricator daemons
sudo su <user>

# display hard limit for open files
ulimit -Hn

# display soft limit for open files
ulimit -Sn

Explanation in this article, but basically the soft limit applies, has displayed by ulimit -n
Mine was 1024

# number of current open file handles for user <user>
lsof -u <user> | wc -l

# you can also compare that to all open files
lsof | wc -l

i've notice the current value was > 2000. Therefore > 1024

# I've upgraded the soft limit
ulimit -S -n 4096

And it solve the issue, no more repository update error.

Note: The limit seems to have switch back to 1024 after a while.

Articles:

TiTi added a comment.Feb 7 2017, 5:20 PM

You can enforce the setting in /etc/security/limits.d/<user>.conf, example :

<user> hard nofile 8192
<user> soft nofile 8192

Replace <user> with the service username