Page MenuHomePhabricator

parsing giganto-normous commits causes admin headaches
Open, WishlistPublic


  • User creates a large ~5MiB commit
  • This explodes into a 250 MiB unified diff
  • The commit can be parsed with a "reasonable" amount of memory and the script/daemon gets killed by the OS. Then the task backs off and runs off to get killed later I'm not sure how much this particular commit would need but ~8 GiB was insufficient.
  • User is sad that their diffs are not getting closed.

Ideally this would take less memory but there will probably always be bigger diffs on the horizon. Some way to skip parsing the diff (automatically?) if is 'Too Big' is one possibility.

Event Timeline

cburroughs raised the priority of this task from to Needs Triage.
cburroughs updated the task description. (Show Details)
cburroughs added a subscriber: cburroughs.

FWIW The particular commit that was causing trouble eventually (!?!) was processed. Since I could reproduce multiple times times with 4x the RAM this feels spooky. Maybe the memory consumption is non-deterministic? Or somehow very different when running in the daemon instead of the cli?


eadler added a project: Restricted Project.Sep 15 2016, 6:08 PM
epriestley triaged this task as Wishlist priority.Feb 15 2019, 3:21 AM
epriestley added a subscriber: epriestley.

As of D19748, I'm not aware of any change of size X that requires more than 8X bytes of memory to parse. This isn't ideal, but it's a fair bit better than the 32X in the original report.

It's likely fairly easy to build a diff which will fail (e.g., just make a text diff with 50GB of random nonsense -- at some point, we aren't going to be able to fit it in memory) but since this rarely arises and isn't much of a big deal when it does (you can usually just bin/worker cancel the task) it's hard to prioritize going out of the way to try to break stuff and then fix the stuff we broke on purpose.