HomePhabricator

Allow applications to require a High Security token without doing a session…

Description

Allow applications to require a High Security token without doing a session upgrade

Summary:
Ref T13222. See PHI873. Currently, when applications prompt users to enter MFA, their session upgrades as a side effect.

In some cases (like managing your email addresses) it makes sense to upgrade your session for a little while since it's common to make multiple edits in sequence (add a new address, make it primary, remove an old address). We generally want MFA to stay out of the way and not feel annoying.

In other cases, we don't expect multiple high-security actions in a row. Notably, PHI873 looks at more "one-shot" use cases where a prompt is answering a specific workflow. We already have at least one of these in the upstream: answering an MFA prompt when signing a Legalpad document.

Introduce a "token" workflow (in contrast to the existing "session") workflow that just does a one-shot prompt without upgrading your session statefully. Then, make Legalpad use this new workflow.

Note that this workflow has a significant problem: if the form submission is invalid for some other reason, we re-prompt you on resubmit. In Legalpad, this workflow looks like:

  • Forget to check the "I agree" checkbox.
  • Submit the form.
  • Get prompted for MFA.
  • Answer MFA prompt.
  • Get dumped back to the form with an error.
  • When you fix the error and submit again, you have to do another MFA check.

This isn't a fatal flaw in Legalpad, but would become a problem with wider adoption. I'll work on fixing this (so the MFA token sticks to the form) in the next set of changes.

Roughly, this is headed toward "MFA sticks to the form/workflow" instead of "MFA sticks to the user/session".

Test Plan:

  • Signed a legalpad document with MFA enabled.
  • Was prompted for MFA.
  • Session no longer upgraded (no purple "session in high security" badge).
  • Submitted form with error, answered MFA, fixed error, submitted form again.
    • Bad behavior: got re-prompted for MFA. In the future, MFA should stick to the form.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13222

Differential Revision: https://secure.phabricator.com/D19843

Event Timeline

This isn't a fatal flaw in Legalpad, but would become a problem with wider adoption. I'll work on fixing this (so the MFA token sticks to the form) in the next set of changes.

IMO -> this really should be fixed by client side validation prior to submission of the form. Depending on the scope of the problem requiring fixes, you may still need to ask for a new MFA token if the current one is outside of the allowable time window (usually +/- 1 min, but not sure what Phabricator has configured it too), so you it may just be adding complexity and a little bit of risk for very limited gain.

(If you missed it, see also some discussion in D19843.)

this really should be fixed by client side validation prior to submission of the form

It's impossible for us to perform client-side validation in the general case.

The most common reason we can't validate on the client is that the error is a UNIQUE KEY collision on insert: you try to create a project named "XYZ", but between the time you load the form and save the form, someone else creates a project named "XYZ" first. No matter when we test to see if a project named "XYZ" exists, it's possible for it to be created between the time we check on the client and the time we actually issue the INSERT. (And we have enough users that they do occasionally actually hit races when we aren't totally race-proof, e.g. see T13054, so I was only able to sweep this problem under the rug for a few years.)

Another case where we can't easily validate on the client is when the client is not a web browser. For example, users can interact with Phabricator via email or the CLI or the API. Although we don't MFA these interactions today, and will probably never MFA email, I anticipate possibly building CLI MFA eventually and possibly building some flavor of API MFA. (In theory, the CLI could validate to some degree, of course.)

In cases where we could validate on the client, I generally prefer avoiding client validation where there isn't a huge usability benefit. We need server-side validation no matter what (so the user can't just disable Javascript and submit invalid objects) so client validation needs to be in addition to server validation. At least in this codebase, this kind of supplemental client validation generally has a large maintenance cost. We can't cross-compile server-side code to the client so the client validation is inevitably a copy/pasted version of whatever's happening on the server, ported to Javascript. We don't have good test infrastructure for Javascript interactions. This would also make third-party extensions more difficult to write in the future, since authors would need to write both PHP and Javascript. Some kinds of checks that are relatively easy to perform on the server (e.g., checks that need to load related data) but very difficult to perform on the client (we'd potentially need to expose weird internals to the API, or have some weird AJAX endpoints purely for validation). Generally, there are a lot of reasons that client-side validation is burdensome to write and maintain. In most cases, I believe this cost isn't worthwhile: if you submit a form and are told "Device name must contain only lowercase letters" or whatever, that's usually about exactly as good as having the field highlighted in red after it loses focus for most users. In cases like Legalpad where the client control is literally just a checkbox, the client-side and server-side flavors of this validation behave identically to the user (control is highlighted after they press "submit"). (Of course, in some cases, client validation provides a dramatically better UX, and can justify the additional cost of the validation, but I currently believe these are quite rare.)

Perhaps more importantly, we don't need client validation at all. We just need to validate on the server before we prompt for MFA. Currently, the MFA prompt is very early in the workflow (since it was built as a separate component and the "edit" and "MFA" flows don't currently know about one another), but D19843 discusses moving it later in the flow, into the TransactionEditor. This is desirable, probably necessary for much of what we want to do anyway, and should give us the same behavior as client-side validation without writing any new code. The only times when you'll submit and hit a validity issue are when we couldn't have performed the validation on the client anyway: usually, UNIQUE KEY collisions at the last moment. And these validations will all apply to the CLI/API for free.

Depending on the scope of the problem requiring fixes, you may still need to ask for a new MFA token if the current one is outside of the allowable time window (usually +/- 1 min, but not sure what Phabricator has configured it too), so you it may just be adding complexity and a little bit of risk for very limited gain.

Partly, we need much of this complexity anyway for T9770 (which has some additional context but is resolving more like "bind challenges to sessions to defuse spyglass attacks against MFA").

Per D19843, the major case I want to avoid here isn't "enter another code", it's "sit here and wait for 60 seconds until your TOTP rotates and shows you another code". I don't think occasionally prompting the user with "enter another code" is too bad, but "sit and wait for 60 seconds" feels really really awful to me. The actual "wait" UX also would be quite bad naively: we can't easily give the user a countdown that turns into an input after 60 seconds without writing a bunch of new Javascript, so it would be a dialog with a button that says "(Try Again)" and clicking the button refreshes the countdown ("Wait 17 more seconds") or finally shows the control.

So I'm actually trying to avoid writing code or adding complexity around the "wait for a new token to be available" use case. We could reasonably just give up on it (users will almost never hit it once MFA executes later in the flow) but I think it's such an unwelcome and hostile UX that I want to avoid it wherever possible.

We're still going to end up with some version of this anyway (if you submit MFA form A, then try to submit MFA form B within ~60 seconds on a TOTP factor) but I'm not sure if/when that will arise, and maybe we can work around it on a workflow-by-workflow basis.

We could also just show the control and accept the same code again. I think this is sort of okay, especially for TOTP, but likely unintuitive for push factors like SMS (and push-to-app use cases may throw the token away, e.g. the user presses "authorize session X" and doesn't have an option to press the button again), and it won't fix the "users/researchers believe tokens are reusable in violation of the RFC" part of T9770. Part of the problem I want to solve in dealing with T9770 is that security researchers report this is an issue, even though no one can describe a practical attack.

Anecdotally, I also recently had to wait 7 minutes for an FedEx SMS factor to arrive on my phone.

Certainly, there are a lot of tradeoffs here and I don't see a single clearly correct way forward. I'm trying to balance flexibility (particularly, gracefully accommodating future factor types with different characteristics around factory delivery and reuse), a UX which "obviously" doesn't allow token reuse (so we don't have to deal with reports in the vein of T9770 anymore), not giving users a "wait X seconds/minutes" dialog, accommodating future cases where you may have multiple MFA factors in the same prompt (already possible today, just somewhat silly since only TOTP is supported), and actual security.

Of these, "actual security" is at the forefront and driving the design, but a naive implementation of "actual security" leaves users sitting at "wait a bunch of seconds for another factor" dialogs when I think no plausible attack is being mitigated, which I'd like to avoid.

The code to store challenges, mark them responded to, bind them to sessions/workflows, and test their expiration needs to exist either way: these components help defuse the "security theater" part of T9770, and defuse, as much as possible, the actual "attacks" in T9770 (effectively: look at the victim's phone using a spyglass, read the code off, then quickly pull their ethernet cable and submit a form yourself). The additional code to let responses remain valid for a short period of time if used by the same session on the same workflow should be small, but I believe it will have a large UX impact in cases where users would otherwise hit "wait for 60 seconds".

Outside of the "wait for 60 seconds" UX, I think the "multiple factors" UX (where you have several different MFA factors configured and must answer multiple prompts to pass the challenge) will also be significantly improved in the future. If you configure two MFA factors (say, a push code and a fingerprint), the groundwork here lets us re-prompt you for just the missing ones if you get some of them wrong. I'm not sure multiple MFA factors is ever terribly useful, but there's at least one good use case (changing/upgrading factors) and this is the kind of "0 or N" architecture decision which is usually easy to make upfront and very hard to fix later (see also T6703).

I missed the discussion in by @amckinley in D19843, which addressed my concern (same as you did above)... sorry for the duplicate question.

FYI -> I was never advocating for JS only validation... just for catching people being stupid. I agree that doing validation earlier in the chain would likely be the best compromise between usability, and development complexity. (Statement of the obvious: not a good choice in authentication flows, but just fine when we are talking about these use-cases – and I doubt you would ever allow it to be implement as such).

not a good choice in authentication flows, but just fine when we are talking about these use-cases – and I doubt you would ever allow it to be implement as such

Just as a note here, this is how the authentication flow currently works: you login first, your credentials are validated, and then you answer an MFA prompt to upgrade to a full session.

In most cases, users are not logging into Phabricator directly (they're using some kind of OAuth instead), so this is the only way authentication can reasonably work, since we can't couple the credential and MFA checks together because the credential check happens on a third-party server, and, in the general case, there's no way for OAuth flows to always succeed to the user and secretly tell the server whether the user actually entered the right password or not.

In cases where the credential check is first party (upstream: LDAP or Password), we could couple the checks, but currently do not. It's not necessarily obvious to me that coupling the checks is desirable in all cases or even by default. Decoupled checks allow attackers to try guessing passwords, but we have other mitigations against that (these are somewhat imperfect today, but I believe password authentication is also very rare in the wild). Coupled checks allow attackers to try guessing MFA tokens, and also may allow attackers to DOS MFA by invalidating every token. At least with SMS MFA, it seems fairly bad to send a challenge before credential validation, since it lets you text anyone an annoying MFA message and there's nothing they can do about it. Even if these challenges are good from a security perspective, I also suspect users would feel like the flow wasn't very robust if they get SMS/push auth when anyone types in their username.

We could perhaps let factors have a "pretend to challenge" mode where we secretly validate the credential and don't actually issue a real challenge if the credential does not validate. In the case of SMS, we'd prompt the user for a code but not send them a text. In the case of TOTP, we'd prompt the user for a code but not bind the challenge to their session. I think this sort of thing tends to be very leaky, but maybe it's practical here.

One consideration is that password installs are more likely to be public / open source, and it's fairly bad if a random user can send hundreds of text messages by "security researching" the login form with a scanner.

I'm beginning to consider biting the bullet on T6703 as part of this series of changes since much of the testing overlaps and the planned architecture for factor configuration is similar to the desired architecture for auth providers. Depending on how far this goes, I'll take a closer look at coupling the validations and figure out if we can reasonably issue "pretend" challenges so we can more practically couple these checks in the first-party cases without too much collateral damage. If we can't pretend to challenge without leaking that we're just pretending, my inclination is that coupling the checks may actually do more harm than good in the general case of SMS / push factors (major UX costs for an extremely small security benefit).