Page MenuHomePhabricator

When accepting a TOTP response, require it respond explicitly to a specific challenge
ClosedPublic

Authored by epriestley on Dec 17 2018, 5:19 PM.

Details

Summary

Depends on D19890. Ref T13222. See PHI873. Currently, we only validate TOTP responses against the current (realtime) timestep. Instead, also validate them against a specific challenge.

This mostly just moves us toward more specifically preventing responses from being reused, and supporting flows which must look more like this (SMS/push).

One rough edge here is that during the T+3 and T+4 windows (you request a prompt, then wait 60-120 seconds to respond) only past responses actually work (the current code on your device won't). For example:

  • At T+0, you request MFA. We issue a T+0 challenge that accepts codes T-2, T-1, T+0, T+1, and T+2. The challenge locks out T+3 and T+4 to prevent the window from overlapping with the next challenge we may issue (see D19890).
  • If you wait 60 seconds until T+3 to actually submit a code, the realtime valid responses are T+1, T+2, T+3, T+4, T+5. The challenge valid responses are T-2, T-1, T+0, T+1, and T+2. Only T+1 and T+2 are in the intersection. Your device is showing T+3 if the clock is right, so if you type in what's shown on your device it won't be accepted.
  • This may get refined in future changes, but, in the worst case, it's probably fine if it doesn't. Beyond 120s you'll get a new challenge and a full [-2, ..., +2] window to respond, so this lockout is temporary even if you manage to hit it.
  • If this doesn't get refined, I'll change the UI to say "This factor recently issued a challenge which has expired, wait N seconds." to smooth this over a bit.
Test Plan
  • Went through MFA.
  • Added a new TOTP factor.
  • Hit some error cases on purpose.
  • Tried to use an old code a moment after it expired, got rejected.
  • Waited 60+ seconds, tried to use the current displayed factor, got rejected (this isn't great, but currently expected).

Diff Detail

Repository
rP Phabricator
Branch
mfa11
Lint
Lint OK
Unit
Unit Tests OK
Build Status
Buildable 21355
Build 29064: Run Core Tests
Build 29063: arc lint + arc unit

Event Timeline

epriestley created this revision.Dec 17 2018, 5:19 PM
epriestley requested review of this revision.Dec 17 2018, 5:21 PM
epriestley updated this revision to Diff 47501.Dec 17 2018, 5:26 PM
  • Actually, just provide the clearer text ("recently issued a challenge which has expired") for now. We can refine this later if it makes sense, but this doesn't leave us with a rough edge hanging around.
  • Requested a challenge, waited 60-120 seconds, answered it, was told to wait because it had expired, waited, answered the next challenge.
amckinley accepted this revision.Dec 18 2018, 12:11 AM

Your device is showing T+3 if the clock is right, so if you type in what's shown on your device it won't be accepted.
...
If this doesn't get refined, I'll change the UI to say...

This feels pretty darn rough, and I'd naively expect to have seen similar warnings in the wild when using MFA on other sites if they had a similar issue. On the other hand, I don't think I've regularly used any devices with enough clock skew, or typed slowly enough, to result in a 60s delay when entering MFA responses. So is this presumably just a rough edge in the spec and everyone has this problem, or do other MFA implementations handle this differently? Or have we just kind of painted ourselves into a corner by requiring this "response must match challenge" check for both time-based tokens and push/SMS tokens, when other sites only require an exact challenge/response match in the latter case?

src/applications/auth/factor/PhabricatorTOTPAuthFactor.php
303

Do I remember correctly that you don't believe in using UNIQUE keys at the DB level? Even if you don't, it seems like it might be a better idea to have the challenge-writing INSERT happen inside a transaction that first looks for existing challenges matching the session/factor pair. Throwing at this point seems less helpful from a potential future debugging standpoint.

313–317

Maybe move this logic into getValidResponseTimestep?

319–320

What's the attack where you respond with a response that's valid for the challenge but invalid for the current timestamp? I read the correct response off your phone with my telescope, immediately kill your internet access, and then fix a sandwich before using your now-expired code?

319–320

This part of the TODO is still accurate, right?

327

$valid_timestep_and_response? Or just $valid? I had to go read getValidResponseTimestep before being sure that this wasn't accidentally skipping one of the two checks. Or maybe just rename the method to getTimestepAndResponseValid.

This revision is now accepted and ready to land.Dec 18 2018, 12:11 AM

do other MFA implementations handle this differently?

I'm not exactly sure, but I generally believe almost no one is defusing the general set of attacks on the window between the time the user unlocks their phone and submits the MFA form because the RFC doesn't say you have to, and pretty much all the attacks in this class are roughly nonsense over HTTPS anyway and boil down to social engineering, spyglasses, spilling coffee on people, sending them really cute kittens that they can't possibly ignore via IM at the right time, etc.

src/applications/auth/factor/PhabricatorTOTPAuthFactor.php
303

I like UNIQUE keys (and we use tons of them). I'm not so hot on FOREIGN keys.

I think we can't put either type of key on the database because SMS/push factors can freely have multiple challenges for the same session/workflow with overlapping validity windows. Only TOTP has this restriction. (A lot of this logic is living in the TOTP factor instead of the base class because it's specific to how TOTP works.)

We could do transactions here, but then the persistence code would have to live in the factor, and you're only ever racing against yourself in the same session, which seems like a hard race to hit.

Or we could maybe add an optional unique key that only gave us uniqueness, and not populate it for SMS/push, but populate it for TOTP. I'd sort of like to see a user hit this first, though.

313–317

It can't easily move because we array_intersect_key() the two windows. getValidResponseTimestep() could have a signature like array $list_of_timesteps_and_the_response_must_belong_to_all_of_them but that feels less intuitive?

319–320

Something like that, yeah. You somehow interrupt the user and have a relatively longer time to steal the code (up to twice as much time-ish?). This is most useful if their device is also somehow skewed way backward since it means the oldest code we'd accept becomes unacceptable after 30 seconds instead of after 150 seconds if I did the math right. But, really, all the attacks connected to all of this are pretty hard for me to take very seriously.

319–320

Yes, until the next change (D19894).

327

Maybe getTimestepAtWhichResponseIsValid()?

epriestley updated this revision to Diff 47520.Dec 18 2018, 8:16 PM
  • Hopefully, improve clarity.
This revision was automatically updated to reflect the committed changes.