Page MenuHomePhabricator

String overflow errors in Herald commit processing
Closed, ResolvedPublic

Description

@frgtn is reporting string overflow errors in ExecFuture when processing some Herald commits. Specifically, what's happening here is probably:

  • git diff or similar of the commit is >2GB, the maximum size of a string in PHP.
  • The stdout buffer of ExecFuture fails once it reads 2GB of data.
  • The timeout on the git diff is excessively huge right now, but there's no guarantee we won't read 2GB of data in an arbitrarily short amount of time.

The immediate fix is probably:

  • Put a hard cap just below 2GB on ExecFuture's auto-buffer size and throw + kill when reading more data than that. Processes like ssh-exec already stream output and will not be affected by this. Processes which do not stream output should be streaming it if they expect to need to parse more than 2GB of data, since it won't work anyway.

This creates an issue, in that you can evade Herald by making a change like this:

  • Add some evil patch.
  • Add a 2GB text file.

Maybe the best fix for this is to add a "Change is too large to process" field. Installs that care could write paranoid rules or blacklist these changes from being pushed by most users.

We could also try to process these changes anyway, but we fundamentally can't regex a string >2GB and probably no one expects us to anyway. We'd have to change all the other Herald APIs to return iterators and streams, too, which seems like a big pain.

Event Timeline

epriestley claimed this task.
epriestley raised the priority of this task from to Normal.
epriestley updated the task description. (Show Details)
epriestley added projects: Herald, Diffusion.
epriestley added subscribers: epriestley, frgtn.

D7885 is a higher-level but probably more-useful fix than described in the summary. Specifically, we'll only handle the first 1GB of a change now, and set an "enormous" flag otherwise. So:

  • All the Herald stuff should just start working.
  • Changes with more than 1GB of content text will never match content rules, since we'll fail when pulling the content. If you care about this (e.g., you have important content rules like "legal really really wants no GPL code"), write a "Change is enormous" rule. This will trigger for any change we can't pull the content for, and the affected parties can review these manually. Presumably, they are exceedingly rare. You can use an "Another Rule Matches" rule to create a "Content matches, or change is enormous" rule which has additional predicates if necessary.
  • On this install, we'll probably write a Global pre-content rule which rejects these changes categorically, thereby cutting the issue off at the pass. I think this is reasonable for most installs using Phabricator hosting: normal commits should never include more than 1GB of text changes, and if we ever need to make such a commit the small hassle of disabling or poking a hole in the reject rule wouldn't be a big deal, while the peace of mind that no one is doing this the rest of the time is valuable.

Upshot: after D7885, let me know if you're still seeing this? If you are, give me a fresh stack trace?

After deploying this I no longer see String Overflow errors again and again and only a couple of (Exception) The raw text of this change is enormous.

Thanks!

Cool. Those should stop happening once the backlog clears up, except when people make new enormous commits -- let us know if you're still seeing them regularly after a day or two.