Refactor Conduit auth to be stateless, token-based, and support wire encodings
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	epriestley
	Aug 25 2014, 2:19 AM

Description

Conduit authentication currently supports two mechanisms:

A session-based mechanism, where you use conduit.connect to establish a session.
An undocumented stateless mechanism likely used only by Facebook.

The first mechanism is unnecessarily complex and makes Conduit slow (extra round trips) and hard to use (can't use CURL, pain to write clients). We should move away from it and deprecate it.

The actual wire token is needlessly complicated. We transmit a proof-of-token, not the token itself. The proof we transmit is not a request signature, so all this does is make replay attacks moderately more difficult. In practice, it just causes a bunch of issues for users with bad client or server timestamps or goofy environmental problems. No one has ever expressed interest in upgrading to a request signature scheme.

Because the wire token is needlessly complicated, the token/handshake/certificate UIs are also needlessly complicated and users hit a bunch of issues using them. These UIs should just be "generate session", which gives you a durable token, with attendant session management/review capabilities.

Upshot:

Deprecate conduit sessions and conduit.connect.
Support direct token-based auth (?token=abdef123) and make this the standard.
Leave room for a proof-of-token + request-signature flavor of this eventually.
Support SSH auth.
Support multiple request encodings (likely BSON, protobuf, or messagepack). Leave JSON as the default, but in cases where messages can not be represented in JSON this gives us a plausible way forward.
Fix the UIs to make handshakes and session management straightforward.

Revisions and Commits

rPHU libphutil
	D10987	rPHU103dc7e39093 Support Conduit tokens in ConduitClient
rARC Arcanist
	D12750	rARC6c5d12d83993 Make --conduit-token work without requiring .arcrc
	D12717	rARC111b9b035aec Add a --conduit-token parameter to `arc`
	D10988	rARC511898775788 Support simpler, token-based Conduit authentication in Arcanist
rP Phabricator
	D14780	rP0692115953fa Remove all references to the Conduit ConnectionLog
	D12770	rP3a34d948b9d3 Show how to call Conduit API methods from clients
	D10990	rPf18ee5c237fe Generate and use "cluster" Conduit API tokens
	D10989	rP288498f8d099 Add conduit.getcapabilities and a modern CLI handshake workflow
	D10986	rP0507626f01e5 Accept Conduit tokens as an authentication mechanism
	D10985	rP39f2bbaeea1b Add Conduit Tokens to make authentication in Conduit somewhat more sane

Related Objects
Search...

Status	Assigned	Task
Resolved	epriestley	T8089 Unprototype Harbormaster (v1)
Resolved	epriestley	T8097 Allow external systems to report results into Harbormaster
Resolved	None	T7419 Provide a straightforward way to create and update CI test statuses for a revision
Wontfix	epriestley	T4591 Allow Harbormaster to source GitHub pull requests as buildables
Resolved	epriestley	T5479 Unbeta Phragment
Duplicate	None	T7869 Support CircleCI webhooks for Test results (so that one can run unit tests asynchronously)
Resolved	epriestley	T1049 Implement Harbormaster
Open	None	T6139 Allow designation of "summary" build logs which are sorted to the top or otherwise promoted
Resolved	epriestley	T9456 Evaluate upstream support for third-party build systems
Resolved	epriestley	T9123 Build Phabricator in Harbormaster (v2)
Open	None	T11402 Garbage collect and/or compress/archive harbormaster build unit messages
Open	None	T5822 Implement garbage collection / automatic archiving for Harbormaster logs
Open	None	T9124 Support uploading build log data via the Harbormaster API
		Restricted Maniphest Task
Open	epriestley	T550 Build an SSH conduit client
		Restricted Maniphest Task
Resolved	epriestley	T4377 Long-running bots which don't make Conduit calls for 24 hours fail all future calls
		Restricted Maniphest Task
		Restricted Maniphest Task
Open	None	T6633 'differential.creatediff' exception when repository paths contain SHIFT-JIS characters
Duplicate	None	T7288 configuring certificates for bots is hard/confusing
Duplicate	None	T10803 PhabricatorApplicationTransactionPublishWorker task failing due to malformed UTF-8
Resolved	epriestley	T5955 Refactor Conduit auth to be stateless, token-based, and support wire encodings
Resolved	btrahan	T5603 Implement a separate "Can Leave Project" policy

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

epriestley added a revision: D10989: Add conduit.getcapabilities and a modern CLI handshake workflow.Dec 13 2014, 12:45 AM

epriestley added a revision: D10990: Generate and use "cluster" Conduit API tokens.Dec 13 2014, 6:23 PM

At least for now, this no longer blocks Phacility or T2783.

epriestley mentioned this in T4571: Allow Phabricator to run in Read-Only Mode.Dec 14 2014, 12:00 PM

epriestley added a commit: rPHU103dc7e39093: Support Conduit tokens in ConduitClient.Dec 15 2014, 7:12 PM

epriestley added a commit: rARC511898775788: Support simpler, token-based Conduit authentication in Arcanist.

epriestley added a commit: rP39f2bbaeea1b: Add Conduit Tokens to make authentication in Conduit somewhat more sane.

epriestley added a commit: rP0507626f01e5: Accept Conduit tokens as an authentication mechanism.Dec 15 2014, 7:14 PM

epriestley added a commit: rP288498f8d099: Add conduit.getcapabilities and a modern CLI handshake workflow.

epriestley added a commit: rPf18ee5c237fe: Generate and use "cluster" Conduit API tokens.

epriestley mentioned this in T5726: Add more information to story POSTs during the feed.http-hooks hook.Dec 28 2014, 4:27 PM

epriestley mentioned this in D11120: Remove support for old Conduit versions.Jan 5 2015, 7:36 PM

The utf8 stuff hasn't been coming up much; the token-based stuff is just blocked on docs and also hasn't come up much; I touched the protocol enough that I'm pretty confident we can move forward without backward compatibility breaks.

btrahan merged a task: T5022: "arc diff" crashes when using against hg repo with non-ascii characters in commit messages on Windows.Feb 13 2015, 12:46 AM

btrahan added a subscriber: • ulrik.johansson.

epriestley added a parent task: T7288: configuring certificates for bots is hard/confusing.Feb 17 2015, 12:13 AM

epriestley mentioned this in Starmap.Apr 15 2015, 11:01 AM

epriestley mentioned this in T7909: Allow ascending order for Conduit's maniphest.query's "order" parameter (order-created, order-modified).Apr 24 2015, 9:27 PM

epriestley mentioned this in T7343: arc install-certificate doesn't work with bot users.May 4 2015, 3:12 PM

epriestley merged a task: T7343: arc install-certificate doesn't work with bot users.

epriestley added subscribers: mbishopim3, avivey, nchammas.

epriestley added a revision: D12717: Add a --conduit-token parameter to `arc`.May 5 2015, 1:33 PM

epriestley added a commit: rARC111b9b035aec: Add a --conduit-token parameter to `arc`.May 5 2015, 9:02 PM

Is the above commit able to be used or are there a few steps to go? Gave it a quick try and it still wanted my bot user to install a certificate despite providing an API token with the above parameter.

I'd expect it to work on relatively up-to-date Phabricator. You may still need to have some valid-ish ~/.arcrc file, which we might be reading and then ignoring. I'll check that and fix it if it's the case -- we shouldn't require a ~/.arcrc if you specify --conduit-token.

Great, thanks. It appears that it does still need the certificate in a .arcrc file somewhere, as we removed that file as part of the above test and received the certificate prompt.

As a goofy workaround you can probably write some dummy file

dummy.arcrc

{
  "hosts": {
    "http://blah.yourcompany.com": {}
  }
}

...and specify that with --arcrc-file, but this is stupid; I'll just fix the actual problem.

Yeah that's what we were doing actually, but given our environment and arcanist's checks regarding the file permissions (that it's 600, etc.) it was messy due to shell restrictions and other stuff where our scripts are running. This will be great for build agents posting back Harbormaster statuses.

epriestley added a revision: D12750: Make --conduit-token work without requiring .arcrc.May 7 2015, 1:10 PM

epriestley added a commit: rARC6c5d12d83993: Make --conduit-token work without requiring .arcrc.May 7 2015, 6:10 PM

epriestley added a revision: D12770: Show how to call Conduit API methods from clients.May 8 2015, 1:35 PM

epriestley added a commit: rP3a34d948b9d3: Show how to call Conduit API methods from clients.May 8 2015, 7:19 PM

epriestley merged a task: T8669: Conduit can't transfer diffs that have invalid UTF8 characters, because json_encode returns FALSE.Jun 25 2015, 11:34 AM

epriestley added a subscriber: hach-que.

epriestley mentioned this in D13444: Provide phutil_json_encode().Jun 25 2015, 5:56 PM

epriestley mentioned this in rPHU3753a09dfc7e: Provide phutil_json_encode().Jul 7 2015, 12:56 PM

epriestley merged a task: T8857: Make Git commit / message parsers eventually fail or de-prioritize on failure.Jul 16 2015, 7:42 PM

@epriestley Is there an ETA on this? It's been 7 days since I cleared out failing Git parser tasks and there's another 20 I need to clear out now. This issue has a significant impact on our background processing because commits without UTF8 data in them effectively block the queue.

Herald added a subscriber: eadler. · View Herald TranscriptJul 22 2015, 5:42 AM

commits without UTF8 data in them effectively block the queue.

This is unexpected. What are you seeing that makes you think they're blocking the queue?

Tasks with at least any failures automatically de-prioritize and are executed after tasks with no failures. The expectation is that the screenshot in T8857 does not show a block queue, just a queue with some cruft at the end of it.

Generally we find that the commit / message parser tasks tend to get done first, and take up taskmasters that could be processing Harbormaster builds. This delays builds because we have less taskmasters available.

If it were the case where the commit / message parser tasks become deprioritized, I would expect to see the currently leased tasks quickly replaced entirely with Harbormaster tasks when a build with 50+ targets starts. But we don't see that; instead we see the same commit / message parser tasks being attempted (with failure counts >2000) being leased, with the remaining slots being used for builds (and search and other tasks).

Okay, can you file a separate issue for that and I'll try to reproduce it? That's bad and not the expected queue behavior.

devurandom added a subscriber: devurandom.Aug 19 2015, 5:49 AM

epriestley mentioned this in T6193: Allow Arcanist to allocate resources through Drydock.Aug 24 2015, 4:49 PM

epriestley mentioned this in T9319: Accessing image file in Diffusion must not generate error (unhandled exception).Sep 2 2015, 2:30 PM

epriestley mentioned this in T9456: Evaluate upstream support for third-party build systems.Sep 21 2015, 9:12 PM

epriestley mentioned this in D14250: Allow editing hosting policies via command line.Oct 9 2015, 6:59 PM

epriestley moved this task from Backlog to vNext on the Conduit board.Dec 8 2015, 10:53 PM

epriestley added a revision: D14780: Remove all references to the Conduit ConnectionLog.Dec 14 2015, 8:52 PM

epriestley added a commit: rP0692115953fa: Remove all references to the Conduit ConnectionLog.Dec 14 2015, 11:25 PM

eadler added a project: Restricted Project.Jan 8 2016, 10:27 PM

eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jan 8 2016, 10:42 PM

epriestley mentioned this in T10148: diffusion.rawdiffquery can not encode/transport all diffs (e.g. to non-utf8 files).Jan 14 2016, 1:30 PM

epriestley added a parent task: T9124: Support uploading build log data via the Harbormaster API.Mar 3 2016, 11:27 PM

epriestley mentioned this in T9124: Support uploading build log data via the Harbormaster API.

epriestley merged a task: T10557: callConduitWithDiffusionRequest failed with non-latin characters parameters.Mar 10 2016, 2:54 PM

epriestley added a subscriber: • clark.woo.12.

How can i use DiffusionBrowseController.php to brow my files with non-latin character names correctly?

nchammas removed a subscriber: nchammas.Mar 10 2016, 3:45 PM

hach-que mentioned this in T10803: PhabricatorApplicationTransactionPublishWorker task failing due to malformed UTF-8.May 3 2016, 12:53 AM

hach-que added a parent task: T10803: PhabricatorApplicationTransactionPublishWorker task failing due to malformed UTF-8.

chad merged a task: T10803: PhabricatorApplicationTransactionPublishWorker task failing due to malformed UTF-8.May 3 2016, 1:35 AM

chad added subscribers: chad, nickhutchinson.

erich added a subscriber: erich.May 20 2016, 9:29 PM

scode added a subscriber: scode.Jun 3 2016, 4:27 PM

• joakim.plate added a subscriber: • joakim.plate.Jun 7 2016, 3:22 PM

fcoelho added a subscriber: fcoelho.Jun 8 2016, 5:39 PM

epriestley mentioned this in T4631: Allow Differential to raise warnings on the server side via Conduit.Jun 17 2016, 2:48 PM

hach-que mentioned this in T4788: Allow "Edit Dependencies" both ways (blocking and depending).Jul 4 2016, 3:34 AM

• bjorngi added a subscriber: • bjorngi.Jul 4 2016, 8:28 AM

jasonfsmitty added a subscriber: jasonfsmitty.Jul 14 2016, 10:54 AM

epriestley mentioned this in T11524: Import-time diff generation fails for particular commits in cluster mode.Aug 24 2016, 11:09 AM

epriestley mentioned this in T4045: Store diffs as binary, not UTF-8.Dec 13 2016, 5:44 PM

avivey mentioned this in Q567: JSON Example for Conduit method maniphest.edit? (Answer 510).Feb 15 2017, 4:24 PM

epriestley mentioned this in T12447: Missing documentation for crafting raw Conduit API requests.Mar 25 2017, 12:38 AM

urzds added a subscriber: urzds.Jul 12 2017, 11:13 AM

epriestley mentioned this in T13088: Plans: Harbormaster UI usability and interconnectedness.Feb 21 2018, 3:43 PM

epriestley mentioned this in T13152: Storage on `secure` was filled by binlogs from looping (?) rebuild-identities script.Jun 7 2018, 4:02 PM

epriestley mentioned this in T12164: Put an indirection layer between author/committer strings and user accounts.Jun 8 2018, 1:52 PM

scp added a subscriber: scp.Jun 20 2018, 7:05 PM

• yingshu added a subscriber: • yingshu.Jan 16 2019, 4:37 AM

epriestley mentioned this in T13257: Plans: Unbeta New Harbormaster Log UI.Feb 26 2019, 12:36 PM

Support multiple request encodings (likely BSON, protobuf, or messagepack). Leave JSON as the default, but in cases where messages can not be represented in JSON this gives us a plausible way forward.

I am increasingly coming to the viewpoint that we shouldn't do this.

Much of the binary data we potentially transmit over the API is also arbitrarily gigantic. Examples include diffs (like the output of git diff) and file content. JSON is clearly not a great format for uploading or downloading gigantic binary blobs, and the API has generally moved away from this anyway: methods like diffusion.filecontentquery return a pointer to a File object and support timeouts and size limits.

In most cases, I think this model is probably the right one for downloading binary blobs: we stream the data into Files, then return a pointer to a File object. Recent APIs which need to do this generally work this way anyway.

For uploading binary blobs, we'd do the reverse: have the client stream the blob into Files, then call an API method with a reference to the object. For convenience, we could continue supporting providing the data inline in cases where it is small and can be represented in JSON. There is generally not much need for this today. This has been discussed elsewhere.

That leaves us with a small number of cases where data can not be represented in JSON and is also reasonable to want to transmit over the API as a field in some sort of datastructure. An example of this kind of data is repository path names: a repository may have binary path names that can not be represented in JSON, but it's obviously silly to have an API call like diffusion.browse return a list of file pointers with each path name stored in an individual file.

In the cases, I think it's probably reasonable to just encode the data inline:

[
  "path": {
    "readable": "path/to/fil<?>e.txt",
    "raw.base64": "ba!nfl1namef!NLF1f13"
  }
]

The handful of clients that care about correctness in the face of binary paths can then implement handling correctly, and the majority of clients that don't can use the transcribed name and get the right behavior almost always.

I think the only real trick we're left with is that we need to be mindful of this in exposing possibly-binary data to the API, but I think there are fairly few sources which really matter very much outside of paths: basically path names, branch names, and committer/author names. And then maybe some day stuff like "imported JIRA task titles", but I think this kind of case is unlikely to be important.

Herald added a subscriber: amckinley. · View Herald TranscriptJun 25 2019, 4:55 PM

epriestley mentioned this in T13337: Resolve structural problems with Conduit API methods related to large input/output sizes and binary data.Jul 12 2019, 4:23 PM

Deprecate conduit sessions and conduit.connect.
Support direct token-based auth (?token=abdef123) and make this the standard.

Although not all of this is dead, we've effectively moved to token-based authentication.

Leave room for a proof-of-token + request-signature flavor of this eventually.

This has been supported internally for some time. Particularly, JSON/HTTP has supported authentication via signing with SSH keys for a while, and this is how we authenticate in the Phacility cluster (ConduitClient::AUTH_ASYMMETRIC). There's little external interest in this today so it's not well documented, but it works fine.

Support SSH auth.

This has been supported for a while, although T550 is a more focused followup. The major limitation today is that calls are one-shot so you pay connection overhead for each call.

Support multiple request encodings (likely BSON, protobuf, or messagepack). Leave JSON as the default, but in cases where messages can not be represented in JSON this gives us a plausible way forward.

I no longer plan to support request or response encodings. See T13337 for the anticipated pathway forward.

Fix the UIs to make handshakes and session management straightforward.

This got cleaned up a while ago and no longer seems to be much of a pain point.

For uploading binary blobs, we'd do the reverse: have the client stream the blob into Files, then call an API method with a reference to the object.

Does anything actually use a flow like this today? Just looking for an example to see how hard it would be to create similar methods for Harbormaster blob content.

Yeah, diffusion.rawdiffquery and diffusion.filecontentquery both do this.

They have some mild weirdness (timeout/limit handling feels kind of ad-hoc) but the actual blob part is in pretty good shape, I think.

Am I going crazy, or do those methods only handle the "blob fetching" part of this equation? I was looking for examples of the "create a File, do a chunked upload, attach said File to an existing object" flow.

Oh, sorry, I misread which half you were asking about.

arc upload does the upload part via ArcanistFileUploader, which is basically file.allocate to create the file, then file.uploadchunk calls in a loop to stream it up. There's some additional code that uses file.querychunks to resume uploads, and falls back to file.upload for old server versions.

It doesn't attach files, but they don't need to be attached by the client if the same user is making a subsequent call to some other api.do-something-with-a-file method, since they'll be the author of the file and always be able to read it.

Refactor Conduit auth to be stateless, token-based, and support wire encodingsClosed, ResolvedPublicActions

Description

Revisions and Commits

Related ObjectsSearch...

Event Timeline

Refactor Conduit auth to be stateless, token-based, and support wire encodings
Closed, ResolvedPublic
Actions

Related Objects
Search...