Page MenuHomePhabricator

Don't die horribly when large commit is pushed
Closed, InvalidPublic

Description

Phabricator tends to go completely down when a large commit is pushed which leads us to rebooting phabricator.

Even if the tasks that phabricator needs to do get slowed down at the very least the web ui/api should not break.

Event Timeline

jhance_ raised the priority of this task from to Needs Triage.
jhance_ updated the task description. (Show Details)
jhance_ added a project: Restricted Project.
jhance_ added subscribers: jhurwitz, angie, jhance_.

Do you have any more specific details about what happens here?

We haven't observed this behavior on this install, or on other installs. The web processes (which handle the web UI / API) are isolated from the daemon processes and they shouldn't normally impact one another, except by totally exhausting resources on the machine. Importing an individual commit is primarily a linear process which should only consume one CPU core, so I wouldn't expect this to be able to kill a machine. Further, rebooting it should just reprocess the failed work anyway.

Do you have, e.g., OOM killer log entries? CPU/disk activity/memory monitoring? A process snapshot of what the machine looks like when the issue occurs?

Generally, since we don't have access to your hosts or to any monitoring or logs, we're relatively powerless to help you resolve operational issues which can't be directly reproduced and which other installs don't experience. We can chip away at this stuff, but there's a lot of fishing involved and progress may be extremely slow (T8588 is a similar issue, where we've gone through multiple rounds of investigation and back-and-forth without truly making much real progress).

I understand that your experience is probably extremely frustrating and I want to fix it, but I don't have access to anything that can help me figure out the root cause of the problem, so I'm stuck just guessing. To my knowledge, other installs at and above your scale do not experience this sort of problem (or the other kinds of operational problems you've encountered), and the software is architected to prevent this problem and all evidence points to the implementation adhering faithfully to the architecture, so there is no theoretical issue here and no practical issue on other installs.

Even if you don't have any operational details, it would be helpful to know what "a large commit" is, and exactly what you've experienced. Does it have a lot of files (how many)? Is it a single commit, or a series of commits? Or does it have a lot of data (how much)? Are you pushing it to a hosted repository, or to an imported repository? Which VCS are you using? Are you pushing over HTTP or SSH? Does pushing large commits always require a restart, or only some of the time (how often)? How long does it take between the push and the service going down? Does "completely down" mean that connections hang, or timeout, or that they fail with an error?

You're also welcome to try pushing a large commit to this server (there should be writable test repositories available in Diffusion) to see if you can lock it up. If you can, it's likely that will give me enough information to understand and fix the underlying problem.

(If you try this, you should probably generate a large synthetic commit rather than pushing proprietary code.)

We don't have enough information to reproduce this or otherwise move forward on it.

The person who reported this is no longer around to provide more context.

angie moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Sep 10 2015, 4:55 PM