Page MenuHomePhabricator

Make working-copy operations service-oriented
Closed, ResolvedPublic

Description

We currently rely on being able to execute git, svn and hg operations directly from the web. This won't work with a "web tier" model, because not every web machine will have every working copy. (We could mount the disks on every machine as readonly, but I have awful experiences with network mounts and am reasonably confident that mounting 1000 disks on 100 machines will explode in a ball of fire.)

Instead, we can bind installs to specific machines in a repository pool, and have the web tier make service calls (via Conduit) to the repository tier.

This basically means turning all the Diffusion*Query classes into Conduit methods, and making Diffusion use ConduitCall to invoke them.

Then we shove some kind of routing magic into ConduitCall so it can resolve selected calls remotely.

Revisions and Commits

rPHU libphutil
D11043
D10960
rP Phabricator
D11874
D14969
D11253
D11477
D11476
D10403
D11159
D11158
D11099
D11102
D11100
D11003
D11001
D10990
D10982
D10959
D10424
D10399
D7808

Related Objects

Event Timeline

epriestley added a project: Phacility.
epriestley added a subscriber: epriestley.
epriestley edited this Maniphest Task.

The description here is still accurate. T2784 fixed many of the problems here, but we have some remaining callsites outside of the web code that task focused on.

Roughly, we have code in various places that relies on having a repository working copy on disk. We'd like to allow the working copy to be on another machine instead and still have everything work. To do this, we need to convert direct repository access to Conduit calls. Once this is done, we'll be able to have web frontends make service requests to the host which actually has the repository in order to satisfy repository queries.

It's OK to directly access repositories in these cases:

  • Anything on the PullLocal daemon (notably: DiscoveryEngine, RefEngine).
  • From Conduit calls.
  • From Queries which are only called by Conduit calls.
  • From CommitHookEngine.
  • From HeraldPreCommitContentAdapter.
  • A couple of special cases.

Other uses are improper, and need to be converted to Conduit calls so they'll work once we split repositories across machines. It looks like the state of the world today is:


Improper uses of DiffusionLowLevelCommitQuery:

  • PhabricatorRepositoryManagementLookupUsersWorkflow uses DiffusionLowLevelCommitQuery.
  • PhabricatorRepositoryGitCommitMessageParserWorker uses DiffusionLowLevelCommitQuery.
  • PhabricatorRepositoryMercurialCommitMessageParserWorker uses DiffusionLowLevelCommitQuery.
  • PhabricatorRepositorySvnCommitMessageParserWorker uses DiffusionLowLevelCommitQuery.
  • (DiffusionRequest has one too, but it's conditional and OK.)

To deal with these, we could either expand diffusion.querycommits to include this information, or we could introduce a new API method. Expanding diffusion.querycommits seems like a better approach, although it will be a little messy because this query is invoked very early in the parse process. We could add a bypassCache flag or something, although this doesn't feel great. But, at least tenatively, that's a plan of attack:

  • Add all the information needed to build DiffusionCommitRef objects to diffusion.querycommits: author name, author email, committer name, committer email, hashes.
  • Add a bypassCache flag to the call. When set, load the information using DiffusionLowLevelCommitQuery. When not set, load the information from the caches on Commit/CommitData.
  • Convert improper callers from DiffusionLowLevelCommitQuery to using new ConduitCall() to invoke diffusion.querycommits, setting the bypassCache flag as appropriate.

Improper uses of direct access:

  • PhabricatorRepositoryGitCommitChangeParserWorker makes direct git calls.
  • PhabricatorRepositoryMercurialCommitChangeParserWorker makes direct hg calls.
  • (DifferentialReleephRequestFieldSpecification makes direct calls, but can be ignored for now.)
  • (PhabricatorRepositorySvnCommitChangeParserWorker makes direct svn calls, but we do not need to fix these, since they're remote anyway.)
  • (The DifferentialLanding... classes make direct calls, but can be ignored.)

These need to be dealt with individually and may require the introduction of new methods. In some cases, we probably have an appropriate method already that will work as-is or with slight modifications (for example, the hg cat call can probably be replaced with diffusion.filecontentquery.


To get started overall, I'd tackle DiffusionLowLevelCommitQuery first. Specifically:

  • Open up DiffusionQueryCommitsConduitAPIMethod. We want to add these keys to the result dictionary (so it can be used to populate a DiffusionCommitRef):
    • authorName
    • authorEmail
    • authorPHID
    • committerName
    • committerEmail
    • committerPHID
    • hashes
  • You can start by just making it return those keys with empty values. You should be able to use the "Conduit Console" from the web UI (at /conduit/) to verify that your changes have an effect.
  • Add a bypassCache flag. When this flag is passed to the call, it should make a call to DiffusionLowLevelCommitQuery and use those results to fill all the fields (and, if required, the message field). The web UI should show your changes having an effect and populating all this data.
  • When the bypassCache flag is not set, fill in the data as best you can. Some of it is available on DiffusionRepositoryCommit or DiffusionRepositoryCommitData. For now, it's not important that this work with bypassCache off.
  • (As a followup, we can make PhabricatorRepositoryCommitMessageParserWorker fill in the rest of the data too.)
  • One at a time, convert the DiffusionLowLevelCommitQuery callsites listed above to use diffusion.querycommits instead. PhabricatorRepositoryManagementLookupUsersWorkflow might be a good place to start with, since it's easy to test by running bin/repository lookup-users.

I attempted to start converting PhabricatorRepositoryManagementLookupUsersWorkflow over to use ConduitCall, but there's no way for it to make a call as the omnipotent user at this time, until "Support Host Identity and Authentication" from T4209 gets done. This almost certainly affects conversion of the commit parsing workers as well.

My plan of attack here is to get an absolutely bare minimum system for host identification / signing working. Hosts will register themselves in the database on startup with:

bin/almanac register

This generates a private / public key pair, storing the public key in the database, and the private key on disk under conf/local/HOSTKEY (since this needs to be located in a consistent location that other areas can access). The alternative is to have a host value stored in the local.conf and use that to lookup the private key path, but this seems like overkill.

I've managed to get PhabricatorRepositoryManagementLookupUsersWorkflow using ConduitCall by implementing host identification.

D10400, D10401 and D10402 cover the implementation of the host identification / signature verification.

D10403 covers the conversion of the workflow.

I'm going to hold off converting any more workflows until I know the host identification / signature verification is a suitable solution.

  • We've fully separated the web process, which is the major issue here. Repositories no longer need to exist on web machines (given eldritch knowledge of secret, undocumented configuration).
  • We haven't fully separated the daemons yet, so we can't put daemons and repositories on different machines (notably, we can't have several repository machines for a single instance). I'm going to push this out until after Phacility because we don't need it until we run into an install which needs more than one host worth of daemons or repositories. The resources these workloads use are dissimilar (daemons use mostly CPU + memory, repos use mostly disk IO + network IO) and I don't anticipate running into scaling issues for a while.

As a heads up, when Harbormaster is available for Phacility customers, I highly recommend running those daemon tasks on a different tier. Long running tasks can easily queue up all available taskmaster daemons, and at work we have to run about 50 or so daemons to keep up with peak load (because of the way taskmasters work at the moment, there's no way we can shift only and all of the Harbormaster tasks onto a different set of machines).

epriestley closed subtask Restricted Maniphest Task as Resolved.Jan 28 2015, 10:41 PM
chad changed the visibility from "All Users" to "Public (No Login Required)".Jul 23 2015, 4:42 AM
eadler added a project: Restricted Project.Feb 18 2016, 6:31 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Mar 9 2016, 10:12 PM
epriestley claimed this task.

I believe D11874 covers the last of this for Git.

I've filed T10753 and T10754 as Mercurial / Subversion followups. We currently have no installs that I'm aware of with an interest in HA that use anything other than Git. Improving availability in the Phacility cluster will probably drive these use cases eventually, if specific interest doesn't appear before that.

I think the PullLocal daemon is still too dumb to figure out which repositories it should act on so this isn't hugely useful on its own unless you want to manually bin/phd launch things. I'll sort this out elsewhere (in T3145 and stuff attached to that), so that bin/phd start does the right thing on hosts by default.