Although PHP does a lot of good for Phabricator, it also creates some tough problems. These are documented elsewhere, but for completeness:
- It's hard for users to install PHP, and they don't want to install PHP -- and macOS will stop shipping with PHP soon, and Windows doesn't ship with PHP (and never has).
- Phabricator would benefit from having access certain services (full-text search, full-codebase search, repository graph storage) that very likely aren't practical to write in PHP because they are too sensitive to performance, control over data in memory, or both.
- Likewise, Phabricator would probably benefit from having a native webserver (no Apache dependency) and notification server (no Node dependency).
- Various extensions and runtime changes (see T2312) could benefit Phabricator.
Hypothetically, Arcanist can be compiled into a native binary which has a statically linked PHP runtime and is hard-coded to run Arcanist.
A simple version of this is to replace bin/arc with a copy of php which just hard-codes the runtime arguments -f path/to/arcanist.php -- $@. This is obviously kind of goofy, but then we get this pathway forward:
- build the compiler toolchain required to produce a static bin/arc, which is just "PHP in an Arcanist costume" and make it work on macOS and Windows;
- decorate the PHP/C FFI stuff into easy C extension support;
- precompile all the native code into a single binary to sweep PHP under the rug.
A personal motivation here is that I want to make a robot that has a blinking light, and that might be simpler if I could just build an MQTT server on top of Phabricator. But I want my robot and Phabricator to call some of the same MQTT code, and PHP is bad for robots.
A general challenge is that I have no idea how building things works. Here are things I generally believe to be true or true-ish:
- A ".c" file can be compiled into a "static library" (sometimes .so?) or a "dynamic library" (sometimes .o?), maybe? What's the difference? How does this work on Windows (.dll ~= .o)? What is .dylib? What are the differences between Linux and macOS?
- Which symbols in a ".c" file are present in the library? How can you control which symbols are emitted?
- Can you enumerate symbols in an object file? How? Can you easily do this at runtime?
- How much information about symbol names is preserved? Can you meaningfully enumerate types, e.g., subclasses of X, at runtime?
- Can a binary enumerate its own symbols?
- Why does the linker (or compiler?) need ".h" files? What happens if the definition in the ".h" file isn't the same as the definition in the object file?
- Binaries generally load symbols automatically at startup time by loading dynamic libraries, I think?
- The arguments for dynamic libraries over static libraries are mostly: security and memory usage? Do these really matter in 2022, at least in desktop environments? Doesn't a single Electron app take 85GB of RAM? Why isn't more stuff compiled statically?
- Can binaries load objects at runtime? Is this rare? Why?
- How can you tell what symbols a binary depends on? How can you tell what libraries it will try to load at startup?
- What happens if a binary depends on f(int x) and loads f(float x)? Or, what prevents this?
- What happens if a binary loads x.o and y.o and they each define a symbol with the same name?
- Can we build a single binary with a bunch of data in it (e.g., a picture of a cat) without breaking anything?
- Does the system always load an entire binary into memory at startup, motivating separation of large chunks of data?
- If I compile a binary (or a .o, or a .so) on one system, how can I tell which systems it will work correctly on?
- What happens if I try to use it on the "wrong" system?
- What are the practical limits of multi-system or multi-architecture binaries?
- Can a binary built on Ubuntu14 run on Ubuntu20 on the same hardware? Can it run on Debian? In what cases will it be unable to run?
- How can PHP be built statically? How hard is this?
- Why does ./configure spend 15 minutes compiling 800 programs to figure out if my system supports integers in 2022?