Improve some behaviors around memory pressure when pushing many and/or large changes
ClosedPublic
Actions

Authored by epriestley on May 18 2018, 8:47 PM.

Details

Reviewers

Maniphest Tasks

T13142: Enormous change detection considers commits individually and can fail when many large commits are pushed simultaneously

Commits

rPde999af61422: Improve some behaviors around memory pressure when pushing many and/or large…

Summary

Ref T13142. When commits are pushed, we try to handle them on one of two pathways:

Normal changes: we load these into memory and potentially apply Herald content rules to them.
"Enormous" changes: we don't load these into memory and skip content rules for them.

The goal is to degrade gracefully when users push huge changes: they should work, just not support all the features.

However, some changes can slip through the cracks right now:

If you push a lot of commits at once, we'll try to cache all of the changes smaller than 1GB in memory. This can require an arbitrarily large amount of RAM.
We calculate sizes by just looking at the strlen() of the diff, but a changeset takes more RAM in PHP than the raw diff does. So even if a diff is "only" 500MB, it can take much more memory than that. On systems with relatively little memory available, this may result in OOM while processing changes that are close to the "enormous" limit.

This change makes two improvements:

Instead of caching everything, cache only 64MB of things.
- For most pushes, this is the same, since they have less than 64MB of diffs.
- For pushes of single very large changes, this is a bit slower (more CPU) since we have to do some work twice.
- For pushes of many changes, this is slower (more CPU) since we have to do some work twice, but, critically, doesn't require unlimited memory.
Instead of flagging changes as "enormous" at 1GB, flag them as "enormous" at 256MB.
- This reduces how much memory is required to process the largest "non-enormous" changes.
- This also gets us under Git's hard-coded 512MB "always binary" cutoff; see T13143.
- This is still completely gigantic and way larger than any normal change should be.

An additional improvement would be to try to reduce the amount of memory we need to use to hold a change in process memory. I think the other changes here alone will fix the immediate issue in PHI657, but it would be nice if the "largest non-enormous change" required only a couple gigs of RAM.

Test Plan

Used ini_set('memory_limit', '1G') to artificially limit memory to 1GB.
Pushed a series of two commits which add two 550MB text files (Temporarily, I added a --binary flag to trick Git into showing real diffs for these, see T13143.)
Got a memory limit error.
Applied the "cache only 64MB of stuff" and "consider 256MB, not 1GB, to be enormous" changes.
Pushed again, got properly rejected as enormous.
Added memory_get_usage() calls to measure how actual memory size and reported "size" estimate compare. For these changes, saw a 639MB diff require 31,479MB of memory, i.e. a factor of about 50x. This is, uh, pretty not great.
Allowed enormous changes, pushed again, push went through.