Page MenuHomePhabricator

When accepting a TOTP response, require it respond explicitly to a specific challenge

Authored by epriestley on Dec 17 2018, 5:19 PM.



Depends on D19890. Ref T13222. See PHI873. Currently, we only validate TOTP responses against the current (realtime) timestep. Instead, also validate them against a specific challenge.

This mostly just moves us toward more specifically preventing responses from being reused, and supporting flows which must look more like this (SMS/push).

One rough edge here is that during the T+3 and T+4 windows (you request a prompt, then wait 60-120 seconds to respond) only past responses actually work (the current code on your device won't). For example:

  • At T+0, you request MFA. We issue a T+0 challenge that accepts codes T-2, T-1, T+0, T+1, and T+2. The challenge locks out T+3 and T+4 to prevent the window from overlapping with the next challenge we may issue (see D19890).
  • If you wait 60 seconds until T+3 to actually submit a code, the realtime valid responses are T+1, T+2, T+3, T+4, T+5. The challenge valid responses are T-2, T-1, T+0, T+1, and T+2. Only T+1 and T+2 are in the intersection. Your device is showing T+3 if the clock is right, so if you type in what's shown on your device it won't be accepted.
  • This may get refined in future changes, but, in the worst case, it's probably fine if it doesn't. Beyond 120s you'll get a new challenge and a full [-2, ..., +2] window to respond, so this lockout is temporary even if you manage to hit it.
  • If this doesn't get refined, I'll change the UI to say "This factor recently issued a challenge which has expired, wait N seconds." to smooth this over a bit.
Test Plan
  • Went through MFA.
  • Added a new TOTP factor.
  • Hit some error cases on purpose.
  • Tried to use an old code a moment after it expired, got rejected.
  • Waited 60+ seconds, tried to use the current displayed factor, got rejected (this isn't great, but currently expected).

Diff Detail

rP Phabricator
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

  • Actually, just provide the clearer text ("recently issued a challenge which has expired") for now. We can refine this later if it makes sense, but this doesn't leave us with a rough edge hanging around.
  • Requested a challenge, waited 60-120 seconds, answered it, was told to wait because it had expired, waited, answered the next challenge.

Your device is showing T+3 if the clock is right, so if you type in what's shown on your device it won't be accepted.
If this doesn't get refined, I'll change the UI to say...

This feels pretty darn rough, and I'd naively expect to have seen similar warnings in the wild when using MFA on other sites if they had a similar issue. On the other hand, I don't think I've regularly used any devices with enough clock skew, or typed slowly enough, to result in a 60s delay when entering MFA responses. So is this presumably just a rough edge in the spec and everyone has this problem, or do other MFA implementations handle this differently? Or have we just kind of painted ourselves into a corner by requiring this "response must match challenge" check for both time-based tokens and push/SMS tokens, when other sites only require an exact challenge/response match in the latter case?


Do I remember correctly that you don't believe in using UNIQUE keys at the DB level? Even if you don't, it seems like it might be a better idea to have the challenge-writing INSERT happen inside a transaction that first looks for existing challenges matching the session/factor pair. Throwing at this point seems less helpful from a potential future debugging standpoint.


Maybe move this logic into getValidResponseTimestep?


What's the attack where you respond with a response that's valid for the challenge but invalid for the current timestamp? I read the correct response off your phone with my telescope, immediately kill your internet access, and then fix a sandwich before using your now-expired code?


This part of the TODO is still accurate, right?


$valid_timestep_and_response? Or just $valid? I had to go read getValidResponseTimestep before being sure that this wasn't accidentally skipping one of the two checks. Or maybe just rename the method to getTimestepAndResponseValid.

This revision is now accepted and ready to land.Dec 18 2018, 12:11 AM

do other MFA implementations handle this differently?

I'm not exactly sure, but I generally believe almost no one is defusing the general set of attacks on the window between the time the user unlocks their phone and submits the MFA form because the RFC doesn't say you have to, and pretty much all the attacks in this class are roughly nonsense over HTTPS anyway and boil down to social engineering, spyglasses, spilling coffee on people, sending them really cute kittens that they can't possibly ignore via IM at the right time, etc.


I like UNIQUE keys (and we use tons of them). I'm not so hot on FOREIGN keys.

I think we can't put either type of key on the database because SMS/push factors can freely have multiple challenges for the same session/workflow with overlapping validity windows. Only TOTP has this restriction. (A lot of this logic is living in the TOTP factor instead of the base class because it's specific to how TOTP works.)

We could do transactions here, but then the persistence code would have to live in the factor, and you're only ever racing against yourself in the same session, which seems like a hard race to hit.

Or we could maybe add an optional unique key that only gave us uniqueness, and not populate it for SMS/push, but populate it for TOTP. I'd sort of like to see a user hit this first, though.


It can't easily move because we array_intersect_key() the two windows. getValidResponseTimestep() could have a signature like array $list_of_timesteps_and_the_response_must_belong_to_all_of_them but that feels less intuitive?


Something like that, yeah. You somehow interrupt the user and have a relatively longer time to steal the code (up to twice as much time-ish?). This is most useful if their device is also somehow skewed way backward since it means the oldest code we'd accept becomes unacceptable after 30 seconds instead of after 150 seconds if I did the math right. But, really, all the attacks connected to all of this are pretty hard for me to take very seriously.


Yes, until the next change (D19894).


Maybe getTimestepAtWhichResponseIsValid()?

  • Hopefully, improve clarity.
This revision was automatically updated to reflect the committed changes.