Maniphest T11968

Decide the fate of FutureGraph
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	epriestley
	Dec 8 2016, 3:09 PM

Description

One limitation I'm concerned about with new Arcanist Hardpoint infrastructure is that it makes parallelizing service calls difficult. I'm generally satisfied with it from a power/extensibility standpoint and I think the complexity is manageable, but I'm uncomfortable about performance.

A 2013 series of changes starting with D5104 built FutureGraph, which may be a reasonable approach here. That stuff never moved forward because the motivation was parallelizing queries (which we generally have less to gain from) and we didn't get anywhere I was satisfied with on the call chaining API. However, the performance motivation is now more material and call chaining is less of a concern with the simpler Conduit futures.

Revisions and Commits

rARC Arcanist
	D21083	rARCfff2fc8bc9b3 Remove "RefQuery" and all "HardpointLoader" code
	D21082	rARCdc42f51cf70c Reroute all RefQuery callers to HardpointEngine
	D21080	rARC9e72e4ed1df5 Bring "pro" browse queries from modern hardpoint code
	D21079	rARC8bb81217d5d2 Bring a "pro" WorkingCopyState ref to "master"
	D21078	rARCadea2550f533 Introduce "arc inspect" and some of the new ref/hardpoint classes
	D21075	rARC92b29f53f3c9 Rename "getWorkingCopy()" to "getWorkingCopyIdentity()" in Arcanist
	D21074	rARC391c1643134a Trivially update "arc branch/feature" and "arc browse" for Toolsets
	D21072	rARCdeb72c37db84 Remove onboard future bulk-resolution from ConduitEngine
	D21071	rARC85141c4d909b Add new "Hardpoint" classes to support request parallelization
	D21070	rARC0b3cd39230e9 Update the "WorkingCopy" API and create a fallback "Filesystem" working copy
	D21058	rARC32005f26a4e7 Move Phage to FuturePool
	D21053	rARC099c2ae64845 Introduce "FuturePool" to make it easier to manage an ongoing pool of futures
	D21046	rARC492113370aa8 Fix two issues with Future key selection inside FutureIterator
	D21038	rARCb1a712add815 Integrate "ServiceProfiler" into the base "Future"
	D21036	rARCcb80f6971537 Make "FutureIterator" queue management more formal
	D21035	rARC6b75562c3e9b Make "exception" on Future a private property
	D21034	rARC4d55067fd87f Make the "result" property on Future private
	D21033	rARCe20dce875cf2 Resolve all futures inside FutureIterator
	D21031	rARC3df48c9257c3 Remove the "timeout" parameter from "Future->resolve()"
	D21032	rARC66a6128239e2 Remove the "preg_quote()" lint rule and update the "__CLASS__" lint rule
rP Phabricator
	D21054	rP1a59cae743c2 Update some Phabricator behaviors for changes to Futures

Related Objects
Search...

Status	Assigned	Task
Open	epriestley	T13098 Plans: Arcanist toolsets and extensions
Invalid	epriestley	T11911 Arcanist Experimental Branch
Open	epriestley	T11954 Improve performance of Arcanist/Conduit/Diffusion, primarily through better caching
Resolved	epriestley	T11968 Decide the fate of FutureGraph

Event Timeline

epriestley created this task.Dec 8 2016, 3:09 PM

avivey added a subscriber: avivey.Dec 8 2016, 6:41 PM

epriestley moved this task from Backlog to v3 on the Diffusion board.Jan 18 2017, 6:59 PM

epriestley edited projects, added Diffusion (v3); removed Diffusion.

epriestley edited projects, added Diffusion; removed Diffusion (v3).Feb 2 2017, 3:54 PM

A problem in moving forward here is that we ultimately do very little complex data access in Phabricator, and what complex data access we do perform can often be faked. We likely have more use cases in Arcanist and provisioning code: API calls are slower, workflows are more parallel/interactive, and we can't just fake it all with AJAX.

The basic "future graph" problem is that if we have a block of code like this today:

$list_a = id(new AQuery())
  ->needB(true)
  ->execute();

$list_x = id(new XQuery())
  ->needY(true)
  ->execute();

It will execute four queries in sequence:

SELECT * FROM a ...
SELECT * FROM b WHERE id IN (...)
SELECT * FROM x ...
SELECT * FROM y WHERE id IN (...)

Query "B" depends on query "A", and query "Y" depends on query "X", but the two pairs are otherwise independent of one another and could execute in parallel.

An ideal API would let the code above execute in parallel (so queries "a" and "x" were issued first, then "b" and "y" as the first queries came back) without making the block itself significantly more complex -- and it should be easy to add a new thing which also executes in parallel without changing the existing code. That ("make it easy to use") is the hard part. We also "can't" use yield. This constraint is somewhat artificial, but I'd like to preserve it.

A related issue is the "need/attach" pattern. Currently, object use needX() on Query classes to request properties be loaded, then attachX/getX. This is mostly fairly good. Two weaknesses are:

There's a moderate amount of boilerplate.
There's no nice way to say "load and attach this thing if it isn't already loaded and attached", and no way to easily build this given how the code is structured today. This leads to a low background level of "// Reload the objects to get such-and-such property." in the codebase, and public static function loadAndAttach(...).

It should be reasonable to say "this object has some named list of attachable properties", and to have a single definition of how to load-and-attach a property that can serve both needX() and loadAndAttachX(), where needX() would just become a shorthand for calling loadAndAttachX() on query result pages.

In the experimental / wilds branch, there's some "Hardpoint" code which accomplishes some of these goals. A "Ref" is like a storage object, a "HardpointLoader" is like a "Query" object. I think most of this code is largely reasonable. I'd like to move toward a world which looks like this, maybe with different names:

Ref: A storage object with hardpoints.
- DAO: A database-backed storage object with hardpoints and query-based data access.
Loader: Defines how to look up data related to a ref.
- Query: A database-backed looker-upper.

In particular, "Ref" is pretty overloaded and "[Lisk]DAO" isn't really connected to anything else these days, so I'm less excited about those names.

(A theoretical weakness is that there's currently no pattern to return some arbitrarily constrained variation of an attachment, e.g. "load and attach the first 10 diffs" or "load and attach all subprojects that start with 'q'". We never (?) need to do this today and I'm not sure we ever will -- in particular, MySQL generally can not reasonably execute this kind of query against a list of things being attached to -- but it is maybe something to keep in mind.)

So: can we build "Ref + Hardpoint + Loader" in a way that works with Futures without making them substantially more complicated?

We expect that Loader ultimately should have some function that looks like this:

public function loadHardpoints(array $objects, $hardpoint_key) {
  $phids = mpull($objects, 'getPHID');

  $data = id(new Query())
     ->withParentPHIDs($phids)
     ->execute();

  foreach ($objects as $object) {
    $data_for_object = idx($data, $object->getPHID());
    $object->attachHardpoint($hardpoint_key, $data_for_object);
  }
}

With yield, this can be made future-oriented like this:

$data_query = id(new Query())
   ->withParentPHIDs($phids);

yield $data_query;
 
$data = $data_query->resolve();

The advantage of yield is that the rest of the function state is preserved "for free". We can make this work without yield like this:

if (!$this->dataQuery) {
  $phids = mpull($objects, 'getPHID');
 
  $data_query = id(new Query())
    ->withParentPHIDs($phids);

  $this->dataQuery = $data_query;

  return $data_query;
}

But this means we need to manually store all function state in object properties, and explicitly write a function which rebuilds function state. Prior to yield, we made this work at Facebook, but it was pretty messy.

For example, if the Loader has three steps and steps 1 and 3 need $phids, you either compute it twice (slow) or store it (boilerplate/mess). The boilerplate is also all totally unique so it's super error prone.

This also overloads the meaning of return. We can fix the overload of return by replacing return $data_query with throw new YieldToQueryException($data_query), but this isn't much cleaner.

Another way to make this work looks more like this:

  $data = id(new Query())
     ->withParentPHIDs($phids)
     ->thenCall(array($this, 'continueLoadingStep2'));

  return $data;
}

public function continueLoadingStep2(...) {
  foreach ($objects as $object) {
    ...
  }
}

This is roughly the direction the original FutureGraph series went in. This allows you to potentially use argument passing to transfer state, but some of the arguments you're interested in passing ($this, the resolved result of the Future) can't be referenced when you declare the call. D5028 used this style and is okay, but there's a whole lot of legwork there.

I think complex loading sequences don't need to be able to defined in a procedural way. For example: load all objects of type X, then load all sub-objects of those objects with unusual constraint Q. We basically never actually do this. If we do, it's fine for the API to require that "unusual constraint Q" can be held in the state of some Query object and the API looks like needSubobjects($matching_this_subquery), and you just have to do some kind of weird special case if your subquery really depends on your primary object results.

That is, we might want to load all highly-utilized volumes on all downed nodes behind all west-coast load balancers. The non-future version of this looks like:

$balancers = load_balancers('coast = west');

$nodes = load_nodes($balancers, 'state = down');

$volumes = load_volumes($nodes, 'fullness > 95');

The future version has an issue where it's difficult to pass $balancers to load_nodes(...). We can do this the thenCall() style, which is approximately just de-sugared continuations:

$balancers_future = load_balancers('coast = west');

$balancers_future->thenCall('loadNodesForTheseBalancers', 'state = down', ...);

Argument passing is a mess. Argument passing is a huge mess if we don't know the state we want to load ahead of time.

We can do the continuation style (except: not supported by PHP versions we support):

$balancers_future = load_balancers('coast = west');

$nodes_future = $balancers_future->thenCall(function($future) {
  $balancers = $future->resolve();
  return load_nodes($balancers, 'state = down');
});

We can do the yield style (except: not supported by PHP versions we support):

$balancers_future = load_balancers('coast = west');

yield $balancers_future;

$balancers = $balancers_future->resolve();

$nodes_future = load_nodes($balancers, 'state = down');

yield $nodes_future;

All of these are quite messy and we essentially never actually do any of them. The far messier versions are when you want to load all highly-utilized volumes on downed nodes if more than 16 load balancers are in service, and something else if fewer than 16 load balancers are in service. This happened sometimes at Facebook, I believe, but we literally have zero queries that do this in Phabricator, I think. (There are likely a lot of reasons, but at least some of those reasons are rooted in Phabricator having much more flexibility to mutate storage, far more rarely needing to follow an Edge as the first step in a query, and almost always being able to order objects reachable via Edges by Edge age instead of "machine-learned friendship score" or whatever.)

In particular, we have zero cases like "get all of the user's friends, and all of the user's events, and all of the RVSP'd users to those events, and then intersect the attending users with the user's friends, and then order that result list by friendship score, and then load the full profiles for those friends and events, and then suggest that list to the user as a reminder that you're attending Joe's Birthday tonight with Amy and Ryan." And if we do have cases like that in the future, it's fine to write an AmyAndRyanFriendshipAttenderQuery to deal with it.

yield seems like the obviously-least-bad of these options, but requires a minimum PHP version of 5.4. This is probably realistic to require -- it was released March 1, 2012. Requiring PHP 5.4 also gets us traits, which would simplify some things.

How closely can we emulate yield without yield? Suppose this:

$data = id(new Query())
   ->withParentPHIDs($phids)
   ->execute();

...becomes this:

$query = $this->getQuery('xyz', new WhateverQuery());
if ($query->isUnresolved()) {
  $query->withParentPHIDs(...);
}
$data = $query->resolve();

The "trick" here is that resolve() throws some YieldException() the first time it executes. Maybe this isn't completely terrible?

Even with yield I think we're stuck with $data = (yield $result)->resolve(); which isn't great.

So that might leave us with:

Follow the outline in D5104 + D5105 + D5107 + D5108-ish to turn FutureIterator into FutureGraph and resolve all futures via FutureGraph.
Follow the outline on experimental to structurally define Ref and Loader APIs, but with a mind toward integration with DAO and Query in the future. I have a separate HardpointList change which is likely closer in various ways.
Make the core loading workflow future-aware.
Probably require PHP 5.4 regardless.
Use whichever of yield or throw feels less messy.

Herald added a subscriber: amckinley. · View Herald TranscriptFeb 13 2020, 3:59 AM

Probably require PHP 5.4 regardless.

Looking at this more closely: PHP 5.5 gets us finally, which is a fairly meaningful piece of sugar, so I'm inclined to make that the minimum version instead.

epriestley mentioned this in T13488: Upgrading: Early 2020 Changes to Arcanist.Feb 13 2020, 1:49 PM

epriestley mentioned this in T13395: Strip libphutil for parts.

epriestley mentioned this in D21030: Update "arc upload" for Toolsets.Feb 26 2020, 4:11 PM

Somewhere in experimental or wilds, I introduced ArcanistConduitEngine. This has some weird fake future stuff going on, so this is probably now ripe.

One adjacent issue is T13177, which amounts to "move service profiler integration to Future". This makes good sense.

epriestley mentioned this in rARC1b97f8b4086d: Update "arc upload" for Toolsets.Feb 26 2020, 4:22 PM

The core idea in D5104 + D5105 is that $future->resolve() and id(new FutureIterator(array($future)))->next() (like, roughly) execute meaningfully different code paths.

An adjacent issue is that the Future->resolve() prototype takes $timeout, which is "maximum amount of time to spend in resolution". The entire codebase appears to have no nontrivial callers of this parameter (all callers are in unit tests to make sure the parameter works, etc). Generally, any caller which wants to return control to the caller prior to resolution can reasonably bear the burden of wrapping the future in an explicit Iterator. I'm going to remove this parameter to simplify things.

epriestley added a revision: D21031: Remove the "timeout" parameter from "Future->resolve()".Feb 26 2020, 4:39 PM

epriestley added a revision: D21032: Remove the "preg_quote()" lint rule and update the "__CLASS__" lint rule.Feb 26 2020, 4:58 PM

epriestley added a revision: D21033: Resolve all futures inside FutureIterator.Feb 26 2020, 7:34 PM

epriestley added a revision: D21034: Make the "result" property on Future private.Feb 26 2020, 8:30 PM

epriestley added a revision: D21035: Make "exception" on Future a private property.Feb 27 2020, 12:16 AM

epriestley added a revision: D21036: Make "FutureIterator" queue management more formal.Feb 27 2020, 2:13 PM

epriestley added a revision: D21038: Integrate "ServiceProfiler" into the base "Future".Feb 27 2020, 2:53 PM

In the specific case of the Hardpoints, we currently often have code which loads objects but doesn't do anything with them. For example, most Query classes use didFilterResults() to fill things-that-sure-look-like-hardpoints, but few do anything with the results.

In this particular case, we can avoid the whole yield issue by reentering the graph inside the call. That is, if some callsite says:

Loader::loadSynchronous(X);

The only costs to continuing resolution inside Loader::loadSynchronous() are that, until we resolve the future:

we can't reach the code after this call; and
we can't reach code after any parent call in the callstack; and
eventually, we'll blow the callstack.

(1) doesn't matter, since the call is explicitly a synchronous barrier anyway.

I'm fine with putting a hard depth cap in place to prevent (3), callers "should never" be resolving futures thousands-deep. The two times you truly need to write "resolve all ancestors in this arbitrarily large graph" code in the codebase, you can use futures directly.

Condition (2) isn't universally free. We can have code like this:

$futures = (A, B);
foreach ($futures as $future) {
  if ($future is A) {
    $futures->addFuture(C);
  }
}

If B uses a synchronous barrier, we won't return execution to the top level and queue C until B resolves, even if A resolves much sooner.

However, maybe we have very little of this code, particularly after restructuring into Hardpoints. I think this kind of code is generally hard to read, understand, and maintain and probably not desirable in most cases.

After D21036, FutureIterator is approximately ready to become FutureGraph, so I'm going to take a closer look at how Hardpoint can be structured to work with Future.

Here's an actual example of loadHardpoints($objects, $hardpoint):

public function loadHardpoints(array $refs, $hardpoint) {
  $api = $this->getQuery()->getRepositoryAPI();


  $futures = array();
  foreach ($refs as $ref_key => $ref) {
    $hash = $ref->getCommitHash();

    $futures[$ref_key] = $api->execFutureLocal(
      'log -n1 --format=%C %s --',
      '%s%n%n%b',
      $hash);
  }

  $iterator = $this->newFutureIterator($futures);

  $results = array();
  foreach ($iterator as $ref_key => $future) {
    list($stdout) = $future->resolvex();
    $results[$ref_key] = $stdout;
  }

  return $results;
}

Without any changes, it's possible for this to be restructured so that:

newFutureIterator() adds the futures to an existing pool of futures, then returns a "View" object into a subset of that pool.
foreach ($iterator ... works on all the futures in the entire pool, but only returns execution here when a future which is part of the given "View" resolves.

This isn't perfect: foreach ($iterator ... becomes a barrier for additional work that could otherwise be done in parallel using futures that were queued higher in the callstack. But it does mean that the default behavior would be fairly good. If this piece of code was blocked by something lower in the callstack inside iteration, it wouldn't affect its behavior in any significant way.

Some of the other styles we could use to express this are:

Yield an Iterator

  $iterator = $this->newFutureIterator($futures);

+ yield $iterator;

  foreach ($iterator as ...

We can yield the iterator to mean "resolve all futures on this iterator before returning execution here".

This isn't perfect either: it allows us to continue higher in the call stack, but means we can not continue on these futures until they all resolve.

We could fix that with something like this:

Yield an Iterator + Working Set

while ($iterator->isWorking()) {
  yield $iterator;
  foreach ($iterator->getReadyFutures() as $future) {
    // ...
  }
}

Since we can $iterator->addFuture() and hold state in the method I think this allows us to resolve any sequence of futures "perfectly", but this is rapidly getting pretty icky.

We can use callbacks, which is the approach that D5111 + D5112 originally pursued:

Callbacks

$future->thenCall($this, 'processLog', $object, ...);

I stylistically dislike callbacks; elsewhere, Javascript has moved from "callback hell" to futures to async/await.

A "light" version of callbacks is to give loadHardpoints() separate "generate futures" and "resolve/attach one future" calls. This isn't necessarily bad but means hardpoints can only have one round of future dispatch in the general case. Maybe this is fine?

We can sort of implement a variant of "await" but since "await" is just sugar on futures and we have to desugar it, I think there's no benefit and we end up with syntax that looks almost exactly the same as callbacks.

We could also move the processing logic into FutureProxy objects; the vast majority of post-resolution processing can likely be handled by a few templates. But this is really another flavor of callbacks.

I'd like to choose a design here by building a general-purpose system and then sugaring it for the common cases, but there are about five different reasonable ways to build the general-purpose system and it's not totally clear that any of them would actually have any callers.

We do actually have some nontrivial loaders from arc browse in experimental. The rough logic for one is this (the others are largely part of this sequence):

You type arc browse master.
To figure out which revision master corresponds to, we load the (local, client) commit identified by the name master.
- Then, we load revisions associated with that commit's hash and tree hash.
- Also, we load that commit's message.
  - Then, we load revisions identified by the commit's message.

The longest path may result in this call sequence:

git log -n1 ... --format=X master (commit hash)
git log -n1 ... --format=Y master (commit message -- this currently uses %s%n%n%b but should be updated to %B). This could be done in one step if the first loader was slightly smarter.
differential.query (to pull the revision object)

Each service call currently blocks and resolves synchronously, although we execute all the service calls of the same kind in parallel.

In this particular case, this is probably actually near-optimal, since we expect differential.query to dominate the cost of resolution here and don't want to dispatch a separate call per-object.

The code generally looks like this:

$this->newQuery($refs)
  ->needHardpoints(
    array(
      'commitRef',
    ))
  ->execute();

$commit_refs = array();
foreach ($refs as $ref) {
  $commit_refs[] = $ref->getCommitRef();
}

$this->newQuery($commit_refs)
  ->needHardpoints(
    array(
      'message',
    ))
  ->execute();

$map = array();
foreach ($refs as $ref_key => $ref) {

...where each execute() is synchronous.

This particular case could be simplified slightly by letting needHardpoints() express more complex queries (load hardpoint A, then load hardpoint B on those hardpoints).

This whole mess is also probably mooted by the realization that looping over git log appears to be expressible with a single execution of git show -q --format=X ... A B -- (except that if one name is bad, this entire command is poisoned and fails, and we would need to fall back to individual commands).

I have some code which runs and looks plausible (i.e., not covered in piles of callback garbage), at least:

$engine->requestHardpoints(
  array($object),
  array('A', 'B'));

public function loadHardpoint(array $objects, $hardpoint) {
  $future = new ExecFuture('sleep 5 && echo %s', $hardpoint);

  yield $this->yieldFuture($future);

  list($stdout) = $future->resolvex();
  $stdout = trim($stdout);

  $result = array();
  foreach ($objects as $object_key => $object) {
    $result[$object_key] = $stdout;
  }

  return $result;
}

$ time ./test.php  --trace
>>> [0] (+0) <exec> $ sleep 5 && echo 'A'
>>> [1] (+1) <exec> $ sleep 5 && echo 'B'
<<< [0] (+5,006) <exec> 5,006,335 us
<<< [1] (+5,006) <exec> 5,005,445 us
string(1) "A"
string(1) "B"

real	0m5.048s
user	0m0.031s
sys	0m0.017s

This appears to cleanly address the "complex dependencies" use case, where you want some piece of data which exists in a well-known place but is complicated to build, like the revisions which correspond to some symbol when the user types arc browse quack.

On its own, this doesn't cover the other major use case -- where you don't have direct access to the place where the data should be attached. For example, you have a User and want to load all of their ExternalAccount objects and all the AccountIdentifier objects on all those objects. You can't directly request AccountIdentifier objects since this isn't a hardpoint on User and you don't have references to the ExternalAccount objects yet.

This use case is fairly rare in Phabricator and can likely be addressed by making the request more complex -- instead of requesting hardpoint "A", you make a structured request for "A" and hardpoint "B" on the results of that request. This runs into additional complexities (if we load some "A", should we queue "B" loads immediately, or wait for the entire set to finish? The answer depends on the nature of "A" and "B" and the query) but I think there's nothing here that's very hard to overcome.

I'm also not dealing with a bunch of other complexities here, like error propagation when a particular hardpoint fails to load. But the abstraction has room to represent hardpoint state and I suspect none of this is terribly difficult to navigate.

Authors can make one notable error here -- omitting the yield keyword. If you omit yield, your code works correctly but may not run as quickly as it could. We could reasonably add a lint warning about this ("result of yield...() method unused") since it seems fair to define a convention that methods beginning "yield..." expect to be used with yield.

ftdysa added a subscriber: ftdysa.Feb 29 2020, 10:48 PM

epriestley added a commit: rARC66a6128239e2: Remove the "preg_quote()" lint rule and update the "__CLASS__" lint rule.Mar 6 2020, 5:14 PM

epriestley added a commit: rARC3df48c9257c3: Remove the "timeout" parameter from "Future->resolve()".Mar 29 2020, 5:14 PM

epriestley added a commit: rARCe20dce875cf2: Resolve all futures inside FutureIterator.Mar 29 2020, 5:18 PM

epriestley added a commit: rARC4d55067fd87f: Make the "result" property on Future private.

epriestley added a commit: rARC6b75562c3e9b: Make "exception" on Future a private property.Mar 30 2020, 2:34 PM

epriestley added a commit: rARCcb80f6971537: Make "FutureIterator" queue management more formal.Mar 30 2020, 2:42 PM

epriestley added a commit: rARCb1a712add815: Integrate "ServiceProfiler" into the base "Future".

epriestley added a revision: D21046: Fix two issues with Future key selection inside FutureIterator.Apr 1 2020, 5:33 PM

epriestley added a commit: rARC492113370aa8: Fix two issues with Future key selection inside FutureIterator.Apr 1 2020, 5:36 PM

XHPAST currently can't build an AST for $result = yield ..., even though this is a valid construct. This is probably a straightforward fix.

epriestley added a revision: D21053: Introduce "FuturePool" to make it easier to manage an ongoing pool of futures.Apr 3 2020, 6:56 PM

epriestley added a revision: D21054: Update some Phabricator behaviors for changes to Futures.Apr 3 2020, 7:03 PM

epriestley added a commit: rARC099c2ae64845: Introduce "FuturePool" to make it easier to manage an ongoing pool of futures.Apr 3 2020, 7:05 PM

epriestley added a commit: rP1a59cae743c2: Update some Phabricator behaviors for changes to Futures.Apr 3 2020, 7:28 PM

epriestley added a revision: D21058: Move Phage to FuturePool.Apr 5 2020, 12:50 PM

epriestley added a commit: rARC32005f26a4e7: Move Phage to FuturePool.Apr 5 2020, 12:56 PM

In T11968#251717, @epriestley wrote:

XHPAST currently can't build an AST for $result = yield ..., even though this is a valid construct. This is probably a straightforward fix.

See discussion in T13492. This isn't actually a valid construct in PHP 5.5, where the syntax must be $result = (yield ...);.

epriestley added a revision: D21070: Update the "WorkingCopy" API and create a fallback "Filesystem" working copy.Apr 8 2020, 3:35 PM

epriestley added a revision: D21071: Add new "Hardpoint" classes to support request parallelization.Apr 8 2020, 3:47 PM

epriestley added a revision: D21072: Remove onboard future bulk-resolution from ConduitEngine.Apr 8 2020, 3:55 PM

epriestley added a commit: rARC0b3cd39230e9: Update the "WorkingCopy" API and create a fallback "Filesystem" working copy.Apr 8 2020, 4:22 PM

epriestley added a commit: rARC85141c4d909b: Add new "Hardpoint" classes to support request parallelization.

epriestley added a commit: rARCdeb72c37db84: Remove onboard future bulk-resolution from ConduitEngine.

epriestley added a revision: D21074: Trivially update "arc branch/feature" and "arc browse" for Toolsets.Apr 8 2020, 7:28 PM

epriestley added a revision: D21075: Rename "getWorkingCopy()" to "getWorkingCopyIdentity()" in Arcanist.Apr 8 2020, 7:37 PM

epriestley added a commit: rARC391c1643134a: Trivially update "arc branch/feature" and "arc browse" for Toolsets.Apr 10 2020, 11:23 AM

epriestley added a commit: rARC92b29f53f3c9: Rename "getWorkingCopy()" to "getWorkingCopyIdentity()" in Arcanist.

epriestley added a revision: D21078: Introduce "arc inspect" and some of the new ref/hardpoint classes.Apr 10 2020, 11:39 AM

Generators can't "return" until PHP 7:

https://wiki.php.net/rfc/generator-return-expressions

Prior to PHP 7, return from a generator fatals:

PHP Fatal error: Generators cannot return values using "return" in /core/data/drydock/workingcopy-75/repo/arcanist/src/query/ArcanistCommitUpstreamHardpointQuery.php on line 19

This is definitely not great, although probably not exceptionally hard to work around.

epriestley added a commit: rARCadea2550f533: Introduce "arc inspect" and some of the new ref/hardpoint classes.Apr 10 2020, 12:01 PM

epriestley added a revision: D21079: Bring a "pro" WorkingCopyState ref to "master".Apr 10 2020, 1:06 PM

epriestley added a commit: rARC8bb81217d5d2: Bring a "pro" WorkingCopyState ref to "master".Apr 10 2020, 1:16 PM

epriestley added a revision: D21080: Bring "pro" browse queries from modern hardpoint code.Apr 10 2020, 2:51 PM

epriestley added a commit: rARC9e72e4ed1df5: Bring "pro" browse queries from modern hardpoint code.Apr 10 2020, 3:05 PM

epriestley added a revision: D21082: Reroute all RefQuery callers to HardpointEngine.Apr 11 2020, 1:40 PM

epriestley added a revision: D21083: Remove "RefQuery" and all "HardpointLoader" code.Apr 11 2020, 1:44 PM

epriestley added a commit: rARCdc42f51cf70c: Reroute all RefQuery callers to HardpointEngine.Apr 11 2020, 1:46 PM

epriestley added a commit: rARCfff2fc8bc9b3: Remove "RefQuery" and all "HardpointLoader" code.

"FutureGraph" is dead. Long live "HardpointEngine".

The current implementation does not cover these things:

Declarative, Plan-Based Queries / "Query Language": You can't currently translate "load all users, then load all their revisions with the word 'node', then load all failing unit tests for those revisions" into a "Request" for "HardpointEngine" and run it in parallel with other queries which may need to load similar (or identical) resources. The API imagines supporting some of this eventually, but I haven't prototyped it and it may be fairly complicated. This is currently omitted mostly because there are no use cases in workflows running on it today, and it's not clear this is really a use case which needs much support anywhere in Phabricator.

(HardpointEngine can execute this request plan perfectly fine today, the code to do it just isn't a tidy, declarative "plan"-like piece of query language that can go at the application top level.)

Hardpoint Fetch Deduplication: If you issue two separate requests for ref A's hardpoint H at the same time, we may currently execute the resource fetch twice (and then discard the second result): hardpoints currently do not get marked into a "pending" state. This is probably an easy fix but I'm waiting until it shows up on a profile so I don't need to build a fake test case.

Request Deduplication: If you issue a request for branch A's commit and a separate request for branch B's commit at the same time, this currently resolves as two separate requests which can run in parallel but can't batch together. In theory, the engine can sometimes merge incoming requests for hardpoint H on object O into any similar request which has not been started yet. We likely don't issue any of these today and I'm sort of waiting for a profile here, too.

Ref Deduplication: If you issue a request for branch A's commit and symbol B's commit, and they happen to be the same commit, we currently build two separate refs for this commit. In theory, refs are append-only and the engine could generate and attach the same object in both cases. This circumstance probably never happens today. Duplicating refs like this likely doesn't matter much -- the cost of the ref duplication itself is very small. The most likely path to real costs is probably that we later load hardpoints for both refs when we don't strictly need to. If the "fetch deduplication" was fixed, most cases of this might be mooted.

Hardpoint Extensions: Hardpoints aren't currently modular, but the API imagines they will become modular.

Debugging Tools: You can use arc inspect and --trace, but there's no way to dump the request execution graph out of a HardpointEngine today. It's likely easy to add something simple, at least, but this feels somewhat premature today.

Yielding Database Queries / Phabricator Support: There's currently no yieldable database query and PhabricatorQuery does not extend ArcanistHardpointQuery. I expect to do this sooner or later (part of the motivation behind this change was to formalize "make sure hardpoint X is loaded on object O") but this is mostly aspirational for now.

Exceptions and Failures: There's no explicit support for propagating exception/failures states out of the Query layer today, but HardpointList can freely store arbitrary amounts of hardpoint metadata so I anticipate this being reasonable to deal with once the code gets there. Today, all queries in Arcanist just fail the request entirely if they fail, which I believe is a sort of broadly reasonable thing for them to do. Needs are more nuanced in Phabricator (where things can fail for policy reasons) but in Arcanist we generally don't expect git show X to fail for any reason other than "catastrophe occurred" or "X does not exist".

epriestley awarded a token.Apr 11 2020, 2:41 PM

epriestley mentioned this in T13490: Upgrade all "classic" Arcanist workflows to Toolsets.Apr 11 2020, 2:59 PM

epriestley mentioned this in 2020 Week 16 (Late April).Apr 24 2020, 3:47 PM

epriestley mentioned this in T11355: Build a flexible revision lookup API in Arcanist.Jun 9 2020, 12:32 AM

epriestley mentioned this in T13555: In "ExecFuture", "proc_get_status()" may be passed NULL, possibly leading to a daemon stall.Jul 23 2020, 5:11 PM

epriestley mentioned this in D21717: Remove history query from DiffusionRepositoryController as it is unused.Sep 4 2021, 7:16 PM

epriestley mentioned this in D21720: Provide "MethodCallFuture" to fix exception semantics in mixed-future contexts.Sep 4 2021, 9:05 PM

epriestley mentioned this in T13630: Move Phacility provisioning to Piledriver.Nov 19 2021, 10:22 PM