Mitigation of the One-and-Done side-channel attack (USENIX Security'18) #6276

milosprv · 2018-05-16T18:33:56Z

The One&Done attack, which is described in a paper to appear in the USENIX Security'18 conference, uses EM emanations to recover the values of the bits that are obtained using BN_is_bit_set while constructing the value of the window in BN_mod_exp_consttime. The EM signal changes slightly depending on the value of the bit, and since the lookup of a bit is surrounded by highly regular execution (constant-time Montgomery multiplications) the attack is able to isolate the (very brief) part of the signal that changes depending on the bit. Although the change is slight, the attack recovers it successfully >90% of the time on several phones and IoT devices (all with ARM processors with clock rates around 1GHz), so after only one RSA decryption more than 90% of the bits in d_p and d_q are recovered correctly, which enables rapid recovery of the full RSA key using an algorithm (also described in the paper) that modifies the branch-and-prune approach for a situation in which the exponents' bits are recovered with errors, i.e. where we do not know a priori which bits are correctly recovered.

The mitigation for the attack is relatively simple - all the bits of the window are obtained at once, along with other bits so that an entire integer's worth of bits are obtained together using masking and shifts, without unnecessarily considering each bit in isolation. This improves performance somewhat (one call to bn_get_bits is faster than several calls to BN_is_bit_set), so the attacker now gets one signal snippet per window (rather than one per bit) in which the signal is affected by all bits in the integer (rather than just the one bit). This forces the attacker to try to match the signal snippet against billions of possibilities (rather than just two) using approximately the same amount of side-channel signal. This not only dramatically increases the computational requirements for the attack, it also makes the attack much less likely to succeed because the signal snippet when using one bit and when using several bits exhibits value-dependent changes of similar magnitude, so wile that amount of change may be sufficient to distinguish between two levels (for 0 and for 1), it is not sufficient to distinguish between many levels where each is very similar to (practically indistinguishable from) many others.

This code has been tested on ARM and x86 systems (with -no-asm so this code actually ends up being used), and in all tested systems it improves performance slightly. We also verified that it indeed prevents the One&Done-style attacks (the attack's recovery of the exponent's bits is now around 50%, i.e. no better than random guessing).

Checklist

documentation is added or updated
tests are added or updated

richsalz · 2018-05-17T16:46:08Z

Closing/re-opening to kick the CLA bot.

kaduk · 2018-05-17T19:54:27Z

Thanks for this work! Please take a look at https://www.openssl.org/policies/codingstyle.html (especially the "spaces around =, %, +, etc." part).

dot-asm · 2018-05-17T21:30:24Z

crypto/bn/bn_exp.c

@@ -1041,30 +1039,45 @@ int BN_mod_exp_mont_consttime(BIGNUM *rr, const BIGNUM *a, const BIGNUM *p,
        }

        bits--;


In such case code would much more readable if bits is not biased by 1.

I tried to preserve as much of the original code as makes sense. But I agree that the original bits-- before the loop and the new bits+1 when calling bn_get_bits can be refactored out without losing anything. Will test and update the code accordingly.

dot-asm · 2018-05-17T22:07:37Z

crypto/bn/bn_exp.c

+             * (the index is masked now so that only the actual window
+             *  bits are used, not the entire word returned by bn_get_bits,
+             *  and this masking should not be done earlier because that
+             *  would facilitate One&Done-like side-channel attacks)


Can you actually confirm that you get readable signal if you mask earlier? Incredible as it is, but reported reading is about calls with relatively long periods in between, and I can imagine that signal is caused rather by masking the resulting bit at the end of BN_is_bit_set, or difference between values as the bit gets merged into wvalue. Rather than actual value of the wvalue in each iteration. BTW, did you try to move call to bn_get_bits next to MOD_EXP_CTIME_COPY_FROM_PREBUF?

For reference, why question? For example in x86_64 assembly code path masking is integral part of window fetch, and I wonder if it might give a reading. Well, not that EM emission is very big concern in most typical x86_64 deployments, but figuring it out won't hurt. After all there are x86_64 IoT devices... Besides, I don't see the reason for compiler to not move the masking operation, so that it's probably not given that masking is separated by loop...

We tried masking just after bn_get_bits returns and we were not able to extract any bits from such a signal. However, the only attack mode we tried is "simple" EM attack (SEMA), i.e. we try to extract bits from a signal that corresponds to a single run of RSA signing/decryption (one "use" of the secret key), and more sophisticated attacks might be able to resolve smaller differences among signals. So our reasoning was that more possibilities for wval is a good thing, bn_get_bits already gives us a full integer worth of bits, so let's not reduce that to fewer bits until we actually need to.

That said, I can see how the compiler might hoist the wvalue&wmask computation up, so that wvalue gets masked anyway as soon as bn_get_bits returns. In the code I submitted I figured that not doing so on purpose might prevent future attacks, but if it does happen due to compiler optimization it's still safe against our attack so no harm done.

If the consensus is that the code would be more readable if the masking is done right after bn_get_bits returns, when it also makes sense to move the call to bn_get_bits so it happens just before the wvalue is actually needed.

So our reasoning was that more possibilities for wval is a good thing, bn_get_bits already gives us a full integer worth of bits, so let's not reduce that to fewer bits until we actually need to.

My line of thought is that if it's possible to correlate EM reading with actual window value, then it shouldn't matter when exactly do you mask, right after fetch from memory or right before passing it further. Though I can imagine that holding a few-bit value for long time might give some kind of recognizable bias. Is this it? In such case having bn_get_bits call as close to final masking as possible should work just as well. Because actual window value would be kept for shortest possible time. Good thing is that compiler won't be in position to move the call, unlike masking that is. And predictability is always better than uncertainty. [Well, unless you allow compiler to perform link-time optimizations, but we shouldn't recommend that.] One probably should keep in mind that window value is effectively kept for whole MOD_EXP_CTIME_COPY_FROM_PREBUF time. Not as direct value, but as collection of masks. And one can wonder if it would give more reliable reading if you slow it down by flushing cache [in creatively favourable manner]. In other words calling bn_get_bits and masking just before MOD_EXP_CTIME_COPY_FROM_PREBUF can't be that much worse than calling bn_get_bits early and masking late.

Agreed. The latest commit to this pull request moves the bn_get_bits to just before the wvalue is used.

…d move bn_get_bits to just before calling MOD_EXP_CTIME_COPY_FROM_PREBUF

dot-asm · 2018-05-21T18:26:42Z

crypto/bn/bn_exp.c

-        bits--;
-        for (wvalue = 0, i = bits % window; i >= 0; i--, bits--)
-            wvalue = (wvalue << 1) + BN_is_bit_set(p, bits);
+        /* The exponent may not have a whole number of fixed-size windows.


Styling nit. Preferred style for multi-line comment is to open with /* on line of its own. This naturally applies to another block comment further down.

dot-asm · 2018-05-21T18:27:09Z

crypto/bn/bn_exp.c

        if (!MOD_EXP_CTIME_COPY_FROM_PREBUF(&tmp, top, powerbuf, wvalue,
                                            window))
            goto err;

+        wmask= (1 << window) - 1;


Styling nit, missing space after mask.

dot-asm

Couple of styling nits.

dot-asm · 2018-05-27T09:02:11Z

Ping @milosprv on couple of styling nits.

The One&Done attack, which is described in a paper to appear in the USENIX Security'18 conference, uses EM emanations to recover the values of the bits that are obtained using BN_is_bit_set while constructing the value of the window in BN_mod_exp_consttime. The EM signal changes slightly depending on the value of the bit, and since the lookup of a bit is surrounded by highly regular execution (constant-time Montgomery multiplications) the attack is able to isolate the (very brief) part of the signal that changes depending on the bit. Although the change is slight, the attack recovers it successfully >90% of the time on several phones and IoT devices (all with ARM processors with clock rates around 1GHz), so after only one RSA decryption more than 90% of the bits in d_p and d_q are recovered correctly, which enables rapid recovery of the full RSA key using an algorithm (also described in the paper) that modifies the branch-and-prune approach for a situation in which the exponents' bits are recovered with errors, i.e. where we do not know a priori which bits are correctly recovered. The mitigation for the attack is relatively simple - all the bits of the window are obtained at once, along with other bits so that an entire integer's worth of bits are obtained together using masking and shifts, without unnecessarily considering each bit in isolation. This improves performance somewhat (one call to bn_get_bits is faster than several calls to BN_is_bit_set), so the attacker now gets one signal snippet per window (rather than one per bit) in which the signal is affected by all bits in the integer (rather than just the one bit). Reviewed-by: Andy Polyakov <appro@openssl.org> Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from #6276)

dot-asm · 2018-05-30T21:12:51Z

Merged. Thank you! I took liberty to copy most of accompanying description to commit message.

BTW, there were couple of spaces at the end of lines. I'm going to remove them in a follow-up request that would harmonize changes with assembly-supported code paths.

milosprv · 2018-05-30T21:23:27Z

Thank you!

…

On Wed, May 30, 2018 at 5:13 PM Andy Polyakov ***@***.***> wrote: Merged. Thank you! I took liberty to copy most of accompanying description to commit message. BTW, there were couple of spaces at the end of lines. I'm going to remove them in a follow-up request that would harmonize changes with assembly-supported code paths. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6276 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGAJhX-uvuHfZOmLjhKy_WBUaBcFPNq2ks5t3wt8gaJpZM4UB0l7> .

Mitigation of the One-and-Done side-channel attack (USENIX Security'18)

28a35d0

openssl-machine added the hold: cla required The contributor needs to submit a license agreement label May 16, 2018

richsalz closed this May 17, 2018

richsalz reopened this May 17, 2018

openssl-machine removed the hold: cla required The contributor needs to submit a license agreement label May 17, 2018

dot-asm reviewed May 17, 2018

View reviewed changes

Edits to conform to style, remove biasing-by-one of bits variable, an…

c4157f7

…d move bn_get_bits to just before calling MOD_EXP_CTIME_COPY_FROM_PREBUF

dot-asm reviewed May 21, 2018

View reviewed changes

Edits to address styling nits

f2e8f48

dot-asm approved these changes May 30, 2018

View reviewed changes

dot-asm added the approval: review pending This pull request needs review by a committer label May 30, 2018

richsalz approved these changes May 30, 2018

View reviewed changes

richsalz added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels May 30, 2018

dot-asm closed this May 30, 2018

real-or-random mentioned this pull request Apr 5, 2024

Signed-digit multi-comb ecmult_gen algorithm bitcoin-core/secp256k1#1058

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitigation of the One-and-Done side-channel attack (USENIX Security'18) #6276

Mitigation of the One-and-Done side-channel attack (USENIX Security'18) #6276

milosprv commented May 16, 2018

richsalz commented May 17, 2018

kaduk commented May 17, 2018

dot-asm May 17, 2018

milosprv May 21, 2018

milosprv May 21, 2018

dot-asm May 17, 2018

milosprv May 21, 2018

dot-asm May 21, 2018

milosprv May 21, 2018

dot-asm May 21, 2018

milosprv May 29, 2018

dot-asm May 21, 2018

milosprv May 29, 2018

dot-asm left a comment

dot-asm commented May 27, 2018

dot-asm commented May 30, 2018

milosprv commented May 30, 2018 via email •

edited

		@@ -1041,30 +1039,45 @@ int BN_mod_exp_mont_consttime(BIGNUM rr, const BIGNUM a, const BIGNUM *p,
		}

		bits--;

Mitigation of the One-and-Done side-channel attack (USENIX Security'18) #6276

Mitigation of the One-and-Done side-channel attack (USENIX Security'18) #6276

Conversation

milosprv commented May 16, 2018

Checklist

richsalz commented May 17, 2018

kaduk commented May 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dot-asm left a comment

Choose a reason for hiding this comment

dot-asm commented May 27, 2018

dot-asm commented May 30, 2018

milosprv commented May 30, 2018 via email • edited

milosprv commented May 30, 2018 via email •

edited