Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigation of the One-and-Done side-channel attack (USENIX Security'18) #6276

Closed
wants to merge 3 commits into from
Closed

Conversation

milosprv
Copy link
Contributor

The One&Done attack, which is described in a paper to appear in the USENIX Security'18 conference, uses EM emanations to recover the values of the bits that are obtained using BN_is_bit_set while constructing the value of the window in BN_mod_exp_consttime. The EM signal changes slightly depending on the value of the bit, and since the lookup of a bit is surrounded by highly regular execution (constant-time Montgomery multiplications) the attack is able to isolate the (very brief) part of the signal that changes depending on the bit. Although the change is slight, the attack recovers it successfully >90% of the time on several phones and IoT devices (all with ARM processors with clock rates around 1GHz), so after only one RSA decryption more than 90% of the bits in d_p and d_q are recovered correctly, which enables rapid recovery of the full RSA key using an algorithm (also described in the paper) that modifies the branch-and-prune approach for a situation in which the exponents' bits are recovered with errors, i.e. where we do not know a priori which bits are correctly recovered.

The mitigation for the attack is relatively simple - all the bits of the window are obtained at once, along with other bits so that an entire integer's worth of bits are obtained together using masking and shifts, without unnecessarily considering each bit in isolation. This improves performance somewhat (one call to bn_get_bits is faster than several calls to BN_is_bit_set), so the attacker now gets one signal snippet per window (rather than one per bit) in which the signal is affected by all bits in the integer (rather than just the one bit). This forces the attacker to try to match the signal snippet against billions of possibilities (rather than just two) using approximately the same amount of side-channel signal. This not only dramatically increases the computational requirements for the attack, it also makes the attack much less likely to succeed because the signal snippet when using one bit and when using several bits exhibits value-dependent changes of similar magnitude, so wile that amount of change may be sufficient to distinguish between two levels (for 0 and for 1), it is not sufficient to distinguish between many levels where each is very similar to (practically indistinguishable from) many others.

This code has been tested on ARM and x86 systems (with -no-asm so this code actually ends up being used), and in all tested systems it improves performance slightly. We also verified that it indeed prevents the One&Done-style attacks (the attack's recovery of the exponent's bits is now around 50%, i.e. no better than random guessing).

Checklist
  • documentation is added or updated
  • tests are added or updated

@openssl-machine openssl-machine added the hold: cla required The contributor needs to submit a license agreement label May 16, 2018
@richsalz
Copy link
Contributor

Closing/re-opening to kick the CLA bot.

@richsalz richsalz closed this May 17, 2018
@richsalz richsalz reopened this May 17, 2018
@openssl-machine openssl-machine removed the hold: cla required The contributor needs to submit a license agreement label May 17, 2018
@kaduk
Copy link
Contributor

kaduk commented May 17, 2018

Thanks for this work! Please take a look at https://www.openssl.org/policies/codingstyle.html (especially the "spaces around =, %, +, etc." part).

@@ -1041,30 +1039,45 @@ int BN_mod_exp_mont_consttime(BIGNUM *rr, const BIGNUM *a, const BIGNUM *p,
}

bits--;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In such case code would much more readable if bits is not biased by 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to preserve as much of the original code as makes sense. But I agree that the original bits-- before the loop and the new bits+1 when calling bn_get_bits can be refactored out without losing anything. Will test and update the code accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

* (the index is masked now so that only the actual window
* bits are used, not the entire word returned by bn_get_bits,
* and this masking should not be done earlier because that
* would facilitate One&Done-like side-channel attacks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you actually confirm that you get readable signal if you mask earlier? Incredible as it is, but reported reading is about calls with relatively long periods in between, and I can imagine that signal is caused rather by masking the resulting bit at the end of BN_is_bit_set, or difference between values as the bit gets merged into wvalue. Rather than actual value of the wvalue in each iteration. BTW, did you try to move call to bn_get_bits next to MOD_EXP_CTIME_COPY_FROM_PREBUF?

For reference, why question? For example in x86_64 assembly code path masking is integral part of window fetch, and I wonder if it might give a reading. Well, not that EM emission is very big concern in most typical x86_64 deployments, but figuring it out won't hurt. After all there are x86_64 IoT devices... Besides, I don't see the reason for compiler to not move the masking operation, so that it's probably not given that masking is separated by loop...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tried masking just after bn_get_bits returns and we were not able to extract any bits from such a signal. However, the only attack mode we tried is "simple" EM attack (SEMA), i.e. we try to extract bits from a signal that corresponds to a single run of RSA signing/decryption (one "use" of the secret key), and more sophisticated attacks might be able to resolve smaller differences among signals. So our reasoning was that more possibilities for wval is a good thing, bn_get_bits already gives us a full integer worth of bits, so let's not reduce that to fewer bits until we actually need to.

That said, I can see how the compiler might hoist the wvalue&wmask computation up, so that wvalue gets masked anyway as soon as bn_get_bits returns. In the code I submitted I figured that not doing so on purpose might prevent future attacks, but if it does happen due to compiler optimization it's still safe against our attack so no harm done.

If the consensus is that the code would be more readable if the masking is done right after bn_get_bits returns, when it also makes sense to move the call to bn_get_bits so it happens just before the wvalue is actually needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So our reasoning was that more possibilities for wval is a good thing, bn_get_bits already gives us a full integer worth of bits, so let's not reduce that to fewer bits until we actually need to.

My line of thought is that if it's possible to correlate EM reading with actual window value, then it shouldn't matter when exactly do you mask, right after fetch from memory or right before passing it further. Though I can imagine that holding a few-bit value for long time might give some kind of recognizable bias. Is this it? In such case having bn_get_bits call as close to final masking as possible should work just as well. Because actual window value would be kept for shortest possible time. Good thing is that compiler won't be in position to move the call, unlike masking that is. And predictability is always better than uncertainty. [Well, unless you allow compiler to perform link-time optimizations, but we shouldn't recommend that.] One probably should keep in mind that window value is effectively kept for whole MOD_EXP_CTIME_COPY_FROM_PREBUF time. Not as direct value, but as collection of masks. And one can wonder if it would give more reliable reading if you slow it down by flushing cache [in creatively favourable manner]. In other words calling bn_get_bits and masking just before MOD_EXP_CTIME_COPY_FROM_PREBUF can't be that much worse than calling bn_get_bits early and masking late.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The latest commit to this pull request moves the bn_get_bits to just before the wvalue is used.

…d move bn_get_bits to just before calling MOD_EXP_CTIME_COPY_FROM_PREBUF
bits--;
for (wvalue = 0, i = bits % window; i >= 0; i--, bits--)
wvalue = (wvalue << 1) + BN_is_bit_set(p, bits);
/* The exponent may not have a whole number of fixed-size windows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Styling nit. Preferred style for multi-line comment is to open with /* on line of its own. This naturally applies to another block comment further down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

if (!MOD_EXP_CTIME_COPY_FROM_PREBUF(&tmp, top, powerbuf, wvalue,
window))
goto err;

wmask= (1 << window) - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Styling nit, missing space after mask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

@dot-asm dot-asm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of styling nits.

@dot-asm
Copy link
Contributor

dot-asm commented May 27, 2018

Ping @milosprv on couple of styling nits.

@dot-asm dot-asm added the approval: review pending This pull request needs review by a committer label May 30, 2018
@richsalz richsalz added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels May 30, 2018
levitte pushed a commit that referenced this pull request May 30, 2018
The One&Done attack, which is described in a paper to appear in the
USENIX Security'18 conference, uses EM emanations to recover the values
of the bits that are obtained using BN_is_bit_set while constructing
the value of the window in BN_mod_exp_consttime. The EM signal changes
slightly depending on the value of the bit, and since the lookup of a
bit is surrounded by highly regular execution (constant-time Montgomery
multiplications) the attack is able to isolate the (very brief) part of
the signal that changes depending on the bit. Although the change is
slight, the attack recovers it successfully >90% of the time on several
phones and IoT devices (all with ARM processors with clock rates around
1GHz), so after only one RSA decryption more than 90% of the bits in
d_p and d_q are recovered correctly, which enables rapid recovery of
the full RSA key using an algorithm (also described in the paper) that
modifies the branch-and-prune approach for a situation in which the
exponents' bits are recovered with errors, i.e. where we do not know
a priori which bits are correctly recovered.

The mitigation for the attack is relatively simple - all the bits of
the window are obtained at once, along with other bits so that an
entire integer's worth of bits are obtained together using masking and
shifts, without unnecessarily considering each bit in isolation. This
improves performance somewhat (one call to bn_get_bits is faster than
several calls to BN_is_bit_set), so the attacker now gets one signal
snippet per window (rather than one per bit) in which the signal is
affected by all bits in the integer (rather than just the one bit).

Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from #6276)
@dot-asm
Copy link
Contributor

dot-asm commented May 30, 2018

Merged. Thank you! I took liberty to copy most of accompanying description to commit message.

BTW, there were couple of spaces at the end of lines. I'm going to remove them in a follow-up request that would harmonize changes with assembly-supported code paths.

@dot-asm dot-asm closed this May 30, 2018
@milosprv
Copy link
Contributor Author

milosprv commented May 30, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval: done This pull request has the required number of approvals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants