Page MenuHomePhabricator

Distribution mechanism for arc extensions
Open, WishlistPublic

Assigned To
None
Authored By
champo
May 14 2014, 6:59 PM
Referenced Files
None
Tokens
"Dislike" token, awarded by svemir."Mountain of Wealth" token, awarded by jparise."Like" token, awarded by featherless."Mountain of Wealth" token, awarded by joshuaspence.

Description

The idea is to have a way to distribute libphutil libraries for arcanist in a simpler way than emailing a tar-ball to all users.

Discussion at https://secure.phabricator.com/chatlog/channel/6/?at=137189

The two big use-cases are:

  • Get the company's extensions/configuration to all users in an easy way (Without adding them to each repo)
  • Distribute 3rd party extensions, just like apt-get/npm/etc.

A more formal list of requirements (Mostly gathered from epriestley's comments around):

  • It should support installing arcanist extensions, Phabricator applications, and libphutil libraries.
    • i.e., it should handle configuration for phabricator and arcanist
    • Maybe it should even support installing third-party stuff like linters.
    • Maybe it should even support installing third-party dependencies like Node?
  • Packages should be signed by the author, and you should only need to trust the author to trust the package.
    • totally compromising a Phabricator install should be insufficient to compromise users of that install by tainting packages. If you (@avivey) sign a package, I (@epriestley) should be unable to taint it, even if you distribute it through secure.phabricator.com.
  • Packages should be able to define dependencies, and it should handle installing them.
  • for arcanist, packages may be specified via either by the project (.arcconfig) or by global configuration (.arcrc)
  • It should handle running different versions of the same package in different projects.
  • Have a way to require/alert users it's time to upgrade a package
  • Should not require the phab-marketplace to know about my extension (Because it's internal to my company and has all my secrets).
  • Support Linux, Mac OS X, and Windows.
  • "List all the things I have loaded/installed"
  • Should work in an environment where arc is mounted in a read-only location.

Important challenges:

  • Organization: Dumping directories next to things won't last very long and will run into issues with everything else here, as well as making it hard for us to do things like "list all the stuff that is installed". We would quickly need to have better rules about where stuff goes.
  • Versioning: How do we know something needs to be updated? How do we organize, store, and include multiple versions of a package?
  • Dependencies: How do we manage dependencies? How do we deal with cases like "diamond dependencies", where A depends on B and C, and B and C depend on different versions of D?
  • Security: How do we make sure that compromised user accounts don't lead to remote code execution on all users' machines? Code signing is probably the solution here, but it's complicated.

Revisions and Commits

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Changes connected to T8116 implement the initial server-side version of this. It's still very skeletal, but we probably need to make some client changes to move forward. In particular, the next object to implement is probably Signature, but signature algorithms should live in the client since the client will need to be able to verify signatures.


Roughly, arc will get new/expanded workflows:

arc upgrade: Today, this means "upgrade Arcanist". In the future, it will potentially mean several things:

  • Upgrade Arcanist, the client.
  • Upgrade extensions installed in Arcanist (very rare?).
  • Upgrade the current working directory (impossible/never?).
  • Upgrade extensions installed in libraries in the current working directory.
  • Upgrade global system software (future?)

I expect these to all live in the upgrade command. The default behavior will either become "upgrade everything" or "prompt, asking the user what to upgrade".

arc install: New command. This now gets several meanings:

  • Install a new extension into Arcanist.
  • Install a new extension into the Arcanist configuration for the current project.
  • Install a new extension or application into a library in the current working directory.
  • Install software on the system globally.
  • Download configured extensions for the current project or library.

There's some ambiguity here too, but arc install with no arguments probably means "synchronize everything so it is up to date", while arc install <package> probably means either "guess" or "prompt".

We can narrow down what arc install <package> means by giving packages types, like "Arcanist Extension", "Phabricator Application", "Library", "System Package", etc. It can then select a narrower range of reasonable install behaviors.

arc sign: New command. Sign a publisher, package, or version. This is used when publishing or attesting to the correctness of packages.

arc version: Today, this means "show Arcanist version". In the future, it will likely mean "show versions of all installed stuff" instead.

We also probably need these capabilities, but can figure them out in the future:

  • Search packages? arc search jslint?
  • Add a new package source (URI of an installed version of Packages)? Just arc set-config for now?
  • Remove a package -- some --remove flag on arc install? Separate arc uninstall?

Making major additions and changes to arc workflows dovetails heavily with T10329 and adjacent tasks. I expect to pursue that first, provide a more solid foundation for arc workflow to build upon, and then implement the new workflows.

Specifically, the next pieces I expect to build are:

  • Package types ("Arcanist Extension", "Phabricator Application", "Library", "System Software" (future)).
  • PackageSignature on the server, and arc sign on the client (initially, only for Publishers and Packages, probably).
  • Additional properties on Versions so they can actually point at a Git repository to clone (this will be modular in the future, but only support "clone a Git repo" for now).
  • Client-side support for cloning repos (arc install, arc version) and loading extensions.
  • Some sense of an upgrade channel / pathway and arc upgrade. Currently, "Versions" are not related to one another, so there's no way to specify how to upgrade version X. We can limp along without this initially since "look it up, then arc install" is fine for administrators while this is a prototype and I don't expect anyone to publish and sign 200 versions of an extension in the first week.

Upshot:

  • To move forward, Packages needs a mixture of client and server changes.
  • Arcanist workflows are getting modernized before the client changes (T10329).
  • After that, Packages can move forward on both the client and server.

It might be beneficial to generally support gpg signed commits / tags in phabricators git repos and then use the same mechanism for the arcanist packages.
Since github recently started pushing this feature a bit (https://github.com/blog/2144-gpg-signature-verification) quite a few library maintainers started signing their release tags.
And i just recently spoke to a lib maintainer about extending composer to verify all packages (on install or upgrade) against a list of authors trusted by the user (or an enterprise wide list).

I've not experimented with this stuff yet and setting up gpg is still a pain in the ass. But the integration into git afterwards is quite straight-forward.
https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work

You could optionally use the users gpg keyring in addition to an arcanist specific one.

I generally expect all signing to be external, at least for the foreseeable future.

One of the major concerns I have with Composer is that it conflates the software developer and the software packager, often assuming they are one in the same (and generally having no mechanism to identify or verify the packager).

For example, I think having strong trust mechanism for the developer is of limited use (and potentially quite misleading) if the packager can release an "update" from v2.7.3 to v2.7.4 which actually reverts to v2.7.2, re-opening a widely disclosed and easily exploited security hole. If the developer removes the v2.7.2 tag from their repository as dangerous, the publisher can copy the repository elsewhere, restore the tag (which will have a valid signature!), and then point the package at the new repository. Generally, there's no way that I'm aware of to "unpublish" a GPG signature, but over time many signed versions of software become trivially unsafe as vulnerabilities are discovered and disclosed.

I want to primarily focus on trusting the publisher, and making it clear to the user that this is who they are trusting, and that they are trusting the publisher more or less completely. It's possibly even desirable not to show any developer-signature information to the user, as this implies that the barrier of trust the publisher must meet is lower: seeing that the package is signed by qmysteryman but the code is signed by Facebook "seems" trustworthy, but is not actually much different from only seeing that the package is signed by qmysteryman. I think a sophisticated attacker with complete control of a package is not made substantially less dangerous by only being able to undo security fixes vs being able to deploy arbitrary code.

Showing this kind of information to the publisher at the time they sign a version (to make it easier for publishers to perform due diligence before signing a package) could be useful, but it's probably some ways away.

This specific attack may not be entirely possible in practice (I'm not familiar with the Composer workflows), but I think it's broadly difficult to establish a clear capability gap between an attacker who can deploy any code at all and an attacker who can "only" deploy any code which a particular developer ever signed. At any given time, most code which a particular developer has ever signed is probably unsafe to run. I think the cryptographic assertion that the developer considered it safe to run at one time is not a very strong one: I would have made this assertion about all versions of Phabricator as I released them in the past, but would no longer make this assertion about those versions because users have discovered and reported security issues since then.

The whole "identify the packager problem" with composer is indeed a big one and i'm not suggesting you use composer in any way for the arcanist/phabricator packages! Generally i agree that i want to primarily trust the packager and not the developer. This is the way this is handled with os packages (rpm/deb) and companies are used to the workflow: Getting new packages from a somewhat trusted/signed source (e.g. redhat/canonical), testing them yourself, signing them with your own key as well and then distributing to the internal repositories and "end users" only trust your own internal key.

I don't know of any package manager that currently supports "revoking" a package signature, but it would be a cool feature.
GPG generally allows for "unpublishing" a key or your signature of a key by using revocation certs. I think you would have to use a different key (maybe subkey) for every release and publish your trust to them with your "publisher" key. When a new version is released you would publish a revocation cert for the old key stating that you no longer trust it and signature validation should fail. But as GPG is a big box of black magic i'd need to test this properly. I might be wrong and validation will succeed and the key can only no longer be used for future signatures which wouldn't help much.

Going the other way and using a X.509 CA with a new key/cert for every release using CRLs / OCSP for revocation should work just as well.

I think I'm going to start working on the Arcanist side of this soon...

Here's the high-level of what I'm planning:

  • Distribution would just be zip/tgz files for each package
  • Signatures would be separate objects, signing the zip file after-the-fact. So anyone can sign any package by downloading it and signing.
    • The public key won't necessarily be available to Phabricator (because it's kinda funny to have the public key and signature in the same place). We'll just register the fingerprints I guess?
    • Arc will have some mechanism to install a signature file from a side-channel
  • Packages will have a manifest file, with enough information to import them into a Phabricator install
  • Signature verification would (only) happen during the "install" phase.
  • "installing" a package is basically just extracting it to ~/.cache/arcanist/<publisher>-<package>-<version>/.
    • We'll select which installed package to load at run-time using relevant configuration.

Update: I just found the old discussion, where we talked at length about using git for distribution and package uri rather then publisher.package naming.
Let's see if 2015 avivey can convince 2020 avivey...

Distribution would just be zip/tgz files for each package

I haven't thought about this in too much detail, but I suspect a package version should have multiple possible variants (e.g., a zip file, a git repository, a mercurial repository, a .tgz, etc). A "package format" is some collection of methods like "get the data for this package reference", "check this signature against the reference you downloaded", "convert the wire data into disk data [e.g., decompress it]", etc.

Signatures would be separate objects, signing the zip file after-the-fact. So anyone can sign any package by downloading it and signing.

(This assumes that "hashing" and "signing" are distinct operations, e.g. you transform distribution data into a hash, then sign the hash.)

Yeah, the signature should be fully computable locally. Exactly what input data you're hashing would depend on the distribution format, but if it's a zip file you just sign the file content, presumably.

This specific signature is possibly dangerous: can two zip files with the same, say, SHA1 decompress to have different data?

In PDFs, the attack is:

  • Find two inputs with the same hash, X1 and X2.
  • Build two versions of the PDF. One looks like this:
good.pdf
if (X1 === X1) {
  print good/safe content
} else {
  print evil/bad content
}
evil.pdf
if (X1 === X2) {
  print good/safe content
} else {
  print evil/bad content
}

These files differ only in the X1/X2 bytes so they have the same hash (under some hashing algorithms) if X1 and X2 have the same hash, but they have different content. I think there's a specific example of this attack here:

I'm not sure if zip files are conceptually vulnerable to the same attack or not, but it seems like they might be: perhaps there is some length field which you can make valid in the "good.zip" and invalid in the "evil.zip".

Even if this is possible, finding collisions is still hard in SHA256 and there's probably no need to be more paranoid about how things are signed.

I suspect the best approach here in general is to say "a signature is a type, like 'sha256-of-raw-files-on-disk' or 'sha256-of-zip', plus a value" and "distribution objects have zero or more signatures". Then signers can sign the wire format (".zip"), or the entire directory of raw files on disk, or a Git or Mercurial hash, or all of them, using whatever hash algorithms they prefer, and clients can accept or reject signature types. If a SHA256 collision is discovered, clients can eventually be updated to reject SHA256 signatures, etc.

The public key won't necessarily be available to Phabricator (because it's kinda funny to have the public key and signature in the same place). We'll just register the fingerprints I guess?

I think the public key has to be available -- you can't verify signatures otherwise.

Arc will have some mechanism to install a signature file from a side-channel

I'd expect this to all happen over HTTP with the package directory app, e.g. the only thing users install is a distribution channel.

There might be plumbing-level commands to trust a specific public key but I don't think users need to do this in general.

Signature verification would (only) happen during the "install" phase.

It might be nice-to-have to support explicit verification later, but, yeah, I wouldn't expect to verify-on-execution in the general case.

The public key won't necessarily be available to Phabricator (because it's kinda funny to have the public key and signature in the same place). We'll just register the fingerprints I guess?

I think the public key has to be available -- you can't verify signatures otherwise.

The scenario I'm worried about:

  • Users expect to download public key for verification from the repository
  • Repository packages.dino.com is computerized by attacker
  • Attacker replaces public signature file for epriestley
  • Attacker publishes new version of, say, dinosaurs package and signs it with new fake key
  • New user is told "install dinosaurs, you can trust epriestley.
  • New user downloads new (fake) public key, latest (evil) package, and signature from packages.dino.com, and they all match.

To cut this vector, I suggest to make users concise of "can I really trust this public key", by requiring it to be installed by some other channel.

New user is told "install dinosaurs, you can trust epriestley.

I imagine they aren't. They're told some version of this instead:

Trust newly discovered public key "ab:cd:..." which "Package Server" claims is owned by "Dinosaurs, Inc"?

  • No entities you trust have signed this key.

Really trust this key? [y/N]

The trust mechanism in is a local web of trust. The server facilitates building that web, but the client does not trust the server.

In most cases, I imagine you will bootstrap trust for a small number of keys (e.g., "Phacility, Inc" and "Your Employer, Inc" through other channels) and use their signatures to decide whether or not to trust other publishers and extensions.

This is basically a "CA" + Certificates system, similar to the SSL system.

In practice, I imagine this works out as a small set of extensions which Phacility would sign through the "Phacility Developer Program" (similar to Apple or Microsoft signing applications in their app stores), and a relatively straightforward way for "Your Employer, Inc" to distribute extensions and/or audit and then approve third-party extensions, and then a wild west of random people publishing left_pad.js.exe. But that seems sort of reasonable?

In the basis case (there are no signatures you know), this degrades to "use some other channel to verify the key". But in the most common case this should be much better: whoever is deploying Phabricator at Acme, Inc can sign packages with the Acme, Inc key to approve them for employees and not have to worry about maintaining a fingerprint list somewhere or trying to convince users to actually verify that packages are on the list.