Phabricator currently uses "Staging Areas" to transfer code between arc on the client and Build/CI/Deployment systems. "Staging Areas" are temporary refs in some git repository. At time of writing, they are tags named refs/tags/phabricator/diff/123.
Some advantages of staging areas are:
- they correctly transmit the entire client state without any additional work; and
- third party build systems are generally ONLY able to build refs (and often not even refs; only tags/branches), so state must be represented as a ref for third-party build systems to work.
Some disadvantages of staging areas are:
- They are cumbersome to configure, particularly if you have many repositories.
- The client needs to push them. In large repositories, this is slow. It is also messy (we currently must push a base ref to trick Git into being moderately efficient).
- They are visible to users with the git client, although users do not care about them directly. This is worse with a branch representation than a tag representation.
- The lifetime of staging areas is unclear and GC'ing them isn't straightforward.
Historically, we shipped patches around instead. This is better on all these dimensions, but we normally can't send a patch to external CI, and our "ship patches around" infrastructure has some limitations in accurately transmitting changes.
See PHI1454. This is a large install reporting that staging areas are fundamentally slow in their monorepo and suggesting using git bundle instead.
See PHI1462. This is a large install reporting that staging areas are slow (here, the visible symptom is waiting for write locks).
See PHI1141. This is an install which would like a way to retrieve client state including merge ancestry.
Related upstream issues are:
- I want to build arc save and arc load. arc load is just a fancy version of arc patch. arc save is a surgical version of "send this state somewhere", intended to replace "git push to save changes".
- T13278 is approximately an earlier version of this task which focuses more narrowly on improving staging areas.
- T8238 is an ancient version of this.
Here's the current plan I'm considering:
Introduce a new "Saved State" Object. This is visible in /differential/saves/ or something. You initially write this object with arc save and read it with arc load. If you want to "back up your work" or "share your work with someone", you arc save --as tmp123 and they arc load saves/epriestley/tmp123 or similar.
The actual object is a commit hash identifying what state it has saved, and a list of resources to build that state. These resources might be:
- Pointer to a Ref: Means the state is already pushed somewhere, and the client should fetch it. So a "Saved State" can be built from a staging area or an existing published commit trivially.
- List of base commits + patches: This is a formal way to say "check out master, then apply these patches". Here, the patches would reconstruct the original commits accurately (see T1508), not just the commit range. This is easiest to generate with git format-patch --stdout, although this may not be suitable for merge commits. git bundle is another possible alternative. This will have a long tail of mess, but nothing intractable.
From here:
- Drydock learns to build working copies from saved states.
- Drydock learns to turn a saved state into a staging area ref if downstream build systems can only work on branches/tags. This simplifies all the configuration/user-facing stuff.
- Replace arc patch with arc load.
We can still do the ref virtualization stuff, and still support direct construction of staging areas from the client if installs don't want to deal with Drydock, although I'd imagine deprecating this eventually once Drydock is easier to use.