Umbrella task for discussion of PHP extensions and patches.
**Reuse Interpreter State**: We currently run a modified version of php-fpm on secure.phabricator.com which loads all the class and function definitions before handling requests. This has been stable for a year or more and makes includes free to the user (in terms of page load time) but we still pay the CPU costs. (This was a herculean achievement at Facebook because of side effects on include, but I avoided those from the start with libphutil so it was comparatively easy for us.) At some point, it would be nice to reuse these interpreters and run more than one request through them so we don't pay the CPU costs either. This is slightly dangerous because it means we can't cache any user information in static variables, but as far as I know we've been consistent about avoiding this.
**Increase Strictness**: I have a patch (P664) which raises warnings for comparisons of unlike, non-boolean types and comparisons of strings using binary operators, and for overflows on implicit string -> integer conversion (we can convert these warnings into exceptions in the error handler). I haven't tested this much, but it might be worth running more broadly. These behaviors are all pretty much garbage.
**Extensions**: We have a few pieces of code which might be nice to port to extensions, mostly for perf reasons. Maybe:
- The intraline diff algorithm.
- XHPAST might run far faster as an extension. There are "light" (export directly to a PHP datastructure, just skipping the JSON step) and "heavy" (implement the class itself) flavors of this.
- Unicode stuff; things like utf8 validation can be performed much much faster than mbstring/iconv.
- Date stuff? I'm not sure where the bottleneck is, but we spend a lot of time in this on some pages.
- Rendering stuff? Things like phutil_render_tag() / javelin_render_tag() might gain enough to be worthwhile.
- Core stuff like id(), idx(), pht(), phutil_escape_html(), mpull(), etc.
- The base85 algorithm (see T13130, D19407, D19408) is an ideal candidate for porting to C, albeit relatively low-impact.
Porting any of this increases complexity, but some of it is pretty simple and could give us big performance wins.