Page MenuHomePhabricator

Modifying PHP
Open, WishlistPublic

Description

Umbrella task for discussion of PHP extensions and patches.

Reuse Interpreter State: We currently run a modified version of php-fpm on secure.phabricator.com which loads all the class and function definitions before handling requests. This has been stable for a year or more and makes includes free to the user (in terms of page load time) but we still pay the CPU costs. (This was a herculean achievement at Facebook because of side effects on include, but I avoided those from the start with libphutil so it was comparatively easy for us.) At some point, it would be nice to reuse these interpreters and run more than one request through them so we don't pay the CPU costs either. This is slightly dangerous because it means we can't cache any user information in static variables, but as far as I know we've been consistent about avoiding this.

Increase Strictness: I have a patch (P664) which raises warnings for comparisons of unlike, non-boolean types and comparisons of strings using binary operators, and for overflows on implicit string -> integer conversion (we can convert these warnings into exceptions in the error handler). I haven't tested this much, but it might be worth running more broadly. These behaviors are all pretty much garbage.

Extensions: We have a few pieces of code which might be nice to port to extensions, mostly for perf reasons. Maybe:

  • The intraline diff algorithm.
  • XHPAST might run far faster as an extension. There are "light" (export directly to a PHP datastructure, just skipping the JSON step) and "heavy" (implement the class itself) flavors of this.
  • Unicode stuff; things like utf8 validation can be performed much much faster than mbstring/iconv.
  • Date stuff? I'm not sure where the bottleneck is, but we spend a lot of time in this on some pages.
  • Rendering stuff? Things like phutil_render_tag() / javelin_render_tag() might gain enough to be worthwhile.
  • Core stuff like id(), idx(), pht(), phutil_escape_html(), mpull(), etc.
  • The base85 algorithm (see T13130, D19407, D19408) is an ideal candidate for porting to C, albeit relatively low-impact.

Porting any of this increases complexity, but some of it is pretty simple and could give us big performance wins.

Event Timeline

epriestley triaged this task as Wishlist priority.
epriestley added subscribers: epriestley, btrahan, vrana.

We can call this project GORILLA and in a couple of years build our own "Hyper Phabricator PHP" (HPHPHP).

chad changed the visibility from "All Users" to "Public (No Login Required)".Nov 4 2015, 9:16 PM

PHP allows and evaluates these constructions:

switch (array()) { ... }
switch ((object)array()) { ... }

That is, switch is not required to accept a scalar. These constructions are technical valid and meaningful, but almost certainly never intended. D16356/PHI261 are a specific example of a bug that would have been detected with stricter language behavior.

I don't see a reasonable way to fix this at runtime without rewriting all code into switch(assert_scalar(...)) { ... } which I'm not excited about.

If we do modify PHP to increase strictness, requiring that the argument to switch (...) { ... } be either an integer or string seems reasonable. In the rare (almost completely nonexistent?) cases where there's a legitimate intent to switch on some other type, the construct can be written as a series of if (... === ...) { ... } statements without any real loss. This is perhaps even a net gain in readability because no one expects switch ($object) { ... } to be what code actually does.

(In some imaginary world where we push strictness upstream, maybe objects which implement __toString() would still be OK to switch() on, but we never do this.)

See https://discourse.phabricator-community.org/t/strpos-throwing-in-phabricatorauthpasswordengine/4335, and D21477.

In PHP, $x["12345"] performs an implicit cast to an integer (if the key is numeric). Retrieving the value later produces an integer, so the original type is lost and unrecoverable (if we read 123 out of an array key, we can not tell if 123 or "123" was inserted). Treating this as an error might be appropriate, although the correct fix isn't obvious (and might be something like a new StringMap type).