Randomised Algorithms for Reducing Side Channel Leakage

Slides:



Advertisements
Similar presentations
Is there Safety in Numbers against Side Channel Leakage? Colin D. Walter UMIST, Manchester, UK
Advertisements

C ● O ● M ● O ● D ● O RESEARCH LAB Longer Keys may Facilitate Side Channel Attacks (Bradford, UK) Colin.
C. Walter, Data Integrity for Modular Arithmetic, CHES 2000 CHES 2000 Data Integrity in Hardware for Modular Arithmetic Colin Walter Computation Department,
Introduction to Cryptography and Security Mechanisms: Unit 5 Theoretical v Practical Security Dr Keith Martin McCrea
Hashing General idea: Get a large array
Copyright © Cengage Learning. All rights reserved.
Cryptanalysis. The Speaker  Chuck Easttom  
Data Representation Number Systems.
Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.
Lecture for Week Spring.  Numbers can be represented in many ways. We are familiar with the decimal system since it is most widely used in everyday.
CS1Q Computer Systems Lecture 8
Issues of Security with the Oswald-Aigner Exponentiation Algorithm Colin D Walter Comodo Research Lab, Bradford, UK Colin D Walter.
9th IMA Conference on Cryptography & Coding Dec 2003 More Detail for a Combined Timing and Power Attack against Implementations of RSA Werner Schindler.
Basic Concepts in Number Theory Background for Random Number Generation 1.For any pair of integers n and m, m  0, there exists a unique pair of integers.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Some Security Aspects of the Randomized Exponentiation Algorithm (Bradford, UK) Colin D. Walter M IST.
Sliding Windows Succumbs to Big Mac Attack Colin D. Walter
Precise Bounds for Montgomery Modular Multiplication and Some Potentially Insecure RSA Moduli Colin D. Walter formerly: (Manchester,
M IST : An Efficient, Randomized Exponentiation Algorithm for Resisting Power Analysis Colin D. Walter formerly: (Manchester, UK)
IEEE ARITH 17 Cape Cod, 27th – 29th June 2005 Data Dependent Power Use in Multipliers Colin D. Walter David Samyde
M IST : An Efficient, Randomized Exponentiation Algorithm for Resisting Power Analysis Colin D. Walter (Manchester, UK)
Precise Bounds for Montgomery Modular Multiplication and Some Potentially Insecure RSA Moduli Colin D. Walter formerly: (Manchester,
1/16 Seeing through M IST given a Small Fraction of an RSA Private Key Colin D. Walter Comodo Research Lab (Bradford, UK)
WISA 2007 Jeju Island, Korea, 27th – 29th Aug 2007 Longer Randomly Blinded RSA Keys may be Weaker than Shorter Ones Colin D. Walter
Dr. Saatchi, Seyed Mohsen 1 Arab Open University - AOU T209 Information and Communication Technologies: People and Interactions Sixth Session.
Number Systems. The position of each digit in a weighted number system is assigned a weight based on the base or radix of the system. The radix of decimal.
Revision. Cryptography depends on some properties of prime numbers. One of these is that it is rather easy to generate large prime numbers, but much harder.
Simple Power Analysis of
Public Key Encryption Major topics The RSA scheme was devised in 1978
NUMBER SYSTEMS.
Dr. Clincy Professor of CS
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Subject Name: File Structures
Digital Logic & Design Dr. Waseem Ikram Lecture 02.
Hashing, Hash Function, Collision & Deletion
Multi-Step Equations How to Identify Multistep Equations |Combining Terms| How to Solve Multistep Equations | Consecutive Integers.
UNIVERSITY OF MASSACHUSETTS Dept
Copyright © Cengage Learning. All rights reserved.
COMPUTING FUNDAMENTALS
Subject Name: File Structures
Review Graph Directed Graph Undirected Graph Sub-Graph
Data Structures Mohammed Thajeel To the second year students
ELEMENTARY NUMBER THEORY AND METHODS OF PROOF
Recent from Dr. Dan Lo regarding 12/11/17 Dept Exam
Distinguishing Exponent Digits by Observing Modular Subtractions
Hidden Markov Models Part 2: Algorithms
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Objective of This Course
Unconventional Fixed-Radix Number Systems
Subtraction The arithmetic we did so far was limited to unsigned (positive) integers. Today we’ll consider negative numbers and subtraction. The main problem.
Dr. Clincy Professor of CS
Indexing and Hashing Basic Concepts Ordered Indices
Digital Logic & Design Lecture 02.
Fundamentals of Data Representation
Copyright © Cengage Learning. All rights reserved.
Overview Part 1 – Design Procedure Part 2 – Combinational Logic
Chapter 3 DataStorage Foundations of Computer Science ã Cengage Learning.
UNIVERSITY OF MASSACHUSETTS Dept
ECE 352 Digital System Fundamentals
William Stallings Computer Organization and Architecture 10th Edition
UNIVERSITY OF MASSACHUSETTS Dept
Recent from Dr. Dan Lo regarding 12/11/17 Dept Exam
Computer Organization and Architecture Designing for Performance
Floating Point Numbers
Breaking the Liardet-Smart Randomized Exponentiation Algorithm
EGR 2131 Unit 12 Synchronous Sequential Circuits
Colin D. Walter Comodo CA, Bradford, UK
Some Security Aspects of the Randomized Exponentiation Algorithm
Lecture-Hashing.
Presentation transcript:

Randomised Algorithms for Reducing Side Channel Leakage Colin D. Walter Abstract: Exponentiation is the algorithm which will be put under the spotlight here. Blinding of arguments provides only limited protection against side channel leakage. Such solutions will lead into the main topic of the talk, namely randomisation of the operations performed by an algorithm. Applications to RSA and ECC will be treated and will include addition/subtraction chains. www.comodo.com (Bradford, UK) colin.walter@comodo.com

Overview Randomisation as a possible solution to DPA: Liardet/Smart Oswald/Aigner Ha/Moon Equalising ECC Add and Double Code. Another side channel attack on exponentiation which defeats standard counter-measures: Big Mac. MIST Division Chains Overlapping Windows (Itoh et al.) We concentrate on five different randomised algorithms: three by Liardet/Smart, Oswald/Aigner and Ha/Moon respectively. These have limited applicability – other counter-measures, particularly exponent blinding, are still required. There are theoretical grounds for believing that single m-ary exponentiations can be cracked (the Big Mac attack). We finish with two stronger algorithms which seem to fit the bill so far. There are no known attacks on them yet (although I have a paper in preparation which includes a remark indicating that Itoh’s is weak). Marc Joye gave alternative algorithms in his (second) talk: the Montgomery Powering Ladder and Common SPA Atomicity methods. The former is a square-and-multiply-always method, and so is relatively slow (except for a co-ordinate trick when used in ECC). The latter looks fine but is very difficult to implement: the key is read bit by bit. In H/W this may mean sometimes it is rotated by one bit, sometimes not at all, and DPA would distinguish these. Moreover, it is still subject to the BigMac attack which works with leaked Hamming weights. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Randomisation of Inputs Using blinded expnt D+rφ(N) with random r is one solution. For ECC it is expensive: r needs perhaps 32 bits, which adds 32 extra bits to a typical key length of 160 or 192 bits – around 20%. Message blinding prevents known ciphertext attacks: C is replaced by rEC before decryption, the result rDECD = rM is multiplied by r–1 to recover M. (rE and r–1 are stored, and their squares used the next time.) Randomised point representations in ECC may also help: for random r in the field the projective representation (x,y,z) can be replaced with (rx,ry,rz), which represents the same point. Randomisation is a key concept for reducing side channel leakage. It helps prevent an attacker from knowing exactly what calculations are being performed on the input data and thereby deducing key bits so that his simulation matches the calculation being performed by the system under attack. There are two aspects to randomisation: inputs to the algorithms can be modified randomly, and the algorithms themselves can behave randomly (but correctly, of course) in their method of computation. Clearly, a key to this is having a good, working random number generator (RNG) – see Werner Schindler’s talks. The RNG is essential not only for key generation but also for performing the usual message whitening and key blinding, and, in this case, the cryptographic operation of exponentiation. The main techniques for randomising the inputs are given on the slide. Blinding the exponent: the result of using exponent D+rφ(N) is the same as if D were used. Blinding of the message is done multiplicatively by a random number. Randomisation of the coordinates in ECC also helps. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Randomisation of Algorithms Randomisation of the algorithms themselves (as opposed to their inputs), particularly exponentiation, is used to prevent the averaging normally employed to perform SPA/DPA. A different (random) sequence of operations is designed to ensure the average has no key-dependent bias: squares and multiplies (adds and doubles in ECC) occur in different orders every time even for the same inputs. Randomised Algorithms may help solve the problems of DPA, perhaps saving the cost of key blinding (which also randomly changes the sequence of squares and multiplies.) Need & Seed for Random Number Generators (RNGs) Randomisation of algorithms is in its infancy. There are few good, trustworthy such algorithms yet for the component operations of RSA or ECC. Exponentiation (or point multiplication) is the area where most development has taken place. The first property an exponentiation algorithm should have is having a sequence of square and multiply operations which does not admit a strong connection between it and the secret key, i.e. no function which maps the sequence of S & M operations into a computationally feasible search space of possible secret keys. Randomisation of algorithms may solve our problems without resorting to blinding of input arguments as well, but this is probably not true for any algorithms currently in circulation. These algorithms all require random numbers generators. Although NIST has specifications which define excellent RNGs in terms of one way functions such as hashing functions or encryption functions, there is a real problem in starting such RNGs with a truly random “seed”. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

The Liardet-Smart Expn Algm Decryption/Signing in ECC: Q = kP for point P and secret key k. k is represented with randomly varied 2-power bases mi and corresponding digits ki: k = ((...((kn)mn–1+kn–1)mn–2+...)m1+k1)m0+k0 Digits are generated in the order k0, k1,… kP is computed by processing digits from most to least significant: k = m0(m1(... mn–2(mn–1(knP) +kn–1P) +...) +k1P) +k0P Digits are used in the order kn, kn–1,… We will now look at two algorithms which implement randomisation. Undoubtedly they have some strength in particular contexts. However, we will see that the context needs to be clearly specified. There are situations in which they give no protection at all. This slide shows the notation for both. The Liardet-Smart algorithm simply produces an alternative representation of the key to the usual binary or m-ary (i.e. base m) representation. It is generated randomly according to the algorithm on the next slide. This gives many different representations for the secret key k. Each leads to a different evaluation scheme, but the same result. Its use might avoid the need to randomise one on the inputs - the key or the plaintext - which reduces the cost of generating random numbers. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Liardet-Smart Recoding (CHES 2001) While k > 0 do { If (k mod 2) = 0 then { mi  2 ; ki  0 ; } else { Choose base mi  {21,22,...2R} randomly ; ki  k minmod mi ; } // minmod returns the least abs value residue. k  (k–ki) / mi ; i  i+1 ; } ; The slide shows the precise choice for exponent digits. The random choice of base can be biased from a uniform distribution to change the efficiency and security properties. Here minmod returns the least absolute value residue rather than the usual least non-negative residue given by mod. This halves the space required for holding pre-computed multiples of P: ki is between –m/2 and +m/2 for base choice m and so between –2R–1 and +2R–1. This is a standard trick in ECC for reducing the space occupied by the pre-computed table of plaintext powers, and is possible because point subtraction is easy. It doesn’t work in RSA because computing the inverses is expensive. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Efficiency The odd point multiples kiP (1 ≤ ki ≤ 2R–1) are pre-computed. For uniformly random choice of mi , there is 1 double and 1/(R+1) adds per key bit. For R = 1 this is the usual “square-and-multiply” algorithm (“double-and-add”) with half an add per bit on average. If mi = 2R always for odd digits ki and mi = 2, ki = 0 for even digits ki, this becomes sliding window exponentiation. This is a derivative of m-ary exponentiation (m = 2R) which halves the space necessary for pre-computed values (only odd multiples need be stored). This slide is self-explanatory. The efficiency figures are easily calculated. Note that the bigger R is, the more efficient the scheme is timewise (but not spacewise). This is the only mention I will make of sliding windows as an exponentiation method. Usually the digits are the least non-negative odd residues on division by m = 2R, except that if the remainder on division by 2 is 0, then the next base to choose is mi = 2 and the digit is 0. The main advantage of sliding windows is that it halves the number of pre-computed multiples that have to be stored: only odd multiples need be stored. Overall, therefore, Liardet-Smart enables one to get away with storing only ¼ of the pre-computed multiples that one might expect, namely the odd multiples from 1 up to half of 2R. The negative multiples are not needed: a point subtraction is performed for them instead. In a good implementation the cost of point additions and point subtractions is the same, and they should be indistinguishable from the DPA viewpoint. Point blinding and exponent blinding might not be necessary when this method is used. So there would be savings from that. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Power Analysis Attacks First, note that the negative digits mean an inverse must be computed. This is free for ECC (subtraction and addition of points cost the same) but requires work in RSA. So this algorithm is really only suitable for ECC. With standard projective or affine representation for points, adds and doubles in ECC are easily distinguished using timing differences in a single power trace because the code for them is different. Adds may represent any of many digits, so reconstructing the exponent from one trace creates an infeasibly large search space. Averaging does not produce useful information. At first sight, this seems to fix our DPA worries without extra work. The re-coding is cheap and it doesn’t seem to require expensively longer code for doubles (so that they look like adds), nor the “add- always” fix, nor exponent blinding. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Blinding etc. is necessary Unfortunately, if i) the same key is re-used, and ii) blinding of the key is omitted, and iii) doubles & adds can be distinguished (on a single expn), then their patterns over several expns do reveal the secret key. So either: i) blinding should be used, or ii) code used for adds and doubles must appear identical to SPA/ DPA/ SEMA/ DEMA, or iii) the add should always be done, and the result chosen appropriately. Under the given assumptions, it is possible to attack the implementation. Adds and doubles might be distinguished from side channel leakage because of the different number of field operations in optimised code. Suppose the patterns of adds and doubles are written as sequences of As and Ds. There is one D for each bit of the exponent. Lemma. If exactly i sequential instances of D occur at some point, and the corresponding prefix k' of k satisfies k'  1 mod 2i then k'  2i+1 mod 2i+1. This enables an attacker to find the maximal i for which k'  2i+1 mod 2i+1. from which he can deduce the next i+2 bits of k are 10…01. Repeating this at each point gives him all the bits of k. There is a probability of error if there are few traces or the measurements are inaccurate, but it can be computationally feasible to correct these. From the comments on the slide, one deduces that the Liardet-Smart algorithm does not enable one to avoid other tamper-resistant measures; some other protection is still necessary. The cost of key blinding is probably still necessary, unless the algorithm is used in a context where the key is different each time. Reference: CDW, “Breaking the Liardet-Smart Randomized Exponentiation Algorithm”, Proc. CARDIS 2002. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Balanced Code R[0] ← 2R[0] ; Code for which adds and multiplies are indistinguishable appeared at CHES 2001 and CHES 2002. (See Nigel Smart’s talks for details.) Performing the add every time and selecting the required result can be done as follows (the required value is in R[0]): R[0] ← 2R[0] ; If ki = 0 then x ← 1 else x ← 0 ; R[x] ← R[0] + P ; While this minimises the power differences (if compiled suitably), it may be susceptible to EMA because of the different locations which emit EMR when assigning to R[x]. Use random register re-location for this. In their talks, Nigel Smart and Marc Joye will have covered methods for writing code that minimises the differences between adds and doubles. Papers in the CHES conference proceedings also provide a number of ways for doing this. The slide above shows one of the other possible counter-measures to key blinding. It shows one way to implement the “Add-always” (= “square-and-always-multiply”) counter-measure. There are alternative formulations, such as the Montgomery powering ladder. The (left-to- right) method here simply writes the values used by the normal square- and-multiply algorithm into register R[0] and writes those unnecessary extra multiplications that are to be discarded into register R[1]. So R[0] eventually contains the desired result and R[1] contains junk. Actually, the code for the algorithm should be compiled with addresses for R[0] and R[1] which cannot be distinguished by looking at their Hamming weights as they are sent along the bus. Alternatively, random register renaming could be employed so that the addresses are unpredictable. Whatever the solution, there is the problem that fixed addresses might be distinguished by an attacker, enabling him to tell which argument is selected at each step of an exponentiation algorithm. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Oswald-Aigner (CHES 2001) Finite Automaton: P and Q are points on the elliptic curve: P initialised to pt at infty, Q initialised to input point This is another randomised exponentiation algorithm aimed at ECC applications rather than RSA because of the need to compute inverses. Again, there is a random re-coding of the exponent from binary. The new digits are +1 or –1. (Strictly speaking, one might also consider 2 to be one of the choices, but this is just a technicality.) Essentially any string of consecutive 1s may be replaced by an initial 1, a string of 0s and a final –1, e.g. 1112 = 7 = 8–1 = 100(–1)2. The new representation has the same value as before, so the result of the exponentiation is the same. The key is processed from least to most significant bit. States labelled 0 and 3 have processed a key suffix with most significant bit 0, those labelled 1 and 2 have processed a suffix with leading bit 1. At nodes 0, 1 and 3 the output key so far equals the input key suffix exactly. However, in state 2, there is a difference of 1 between the suffix processed so far and the re-coding which has been generated: outputting a 1 for the next bit without reading more input bits will make the input suffix equal the output key so far. This is because an output digit –1 was chosen to replace an input bit +1 (transition from state 1 to state 2 or from 3 to 2). Then a carry is owed to output. The balance is redressed if a 1 is output when a 0 is input (transition from 2 to 3), or a 2 when a 1 is input (transition from 2 to 1), but the difference continues if a 0 is output when a 1 is input (transition 2 to 2). rb is a random bit used to select a dotted transition for each key bit. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Oswald-Aigner (CHES 2001) Key k is represented with randomly varied digits ki {–1,0,1,2}: k = 2nkn + 2n–1kn–1 +...+2k1+k0 Digits are chosen in the order k0, k1,… and processed as they are generated, using the formulae on the FA transitions. After processing i bits, Q contains the value 2iP0 where P0 is the initial input point. This is required for adding 2ikiP0: it is added to P if ki = 1, doubled again first then added if ki = 2, and subtracted if ki = –1. The variable P in the FA contains the multiple of the initial point P0 formed so far, i.e. (2iki +...+2k1+k0)P0 and so eventually contains the desired output kP0. Once the representation of k has been randomly changed, the point multiple is created as described on the slide. In fact, the multiple can be created immediately as the new digits become available. The FA (finite automaton) diagram uses Oswald’s own notation. I hope the use of P here for the accumulating result is not confusing. We’ve usually followed standard notation of P for the initial point, but here it is renamed to P0 to distinguish it. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Efficiency For most reasonable distributions of random bits choices, this gives 1 double and just over ½ add per key bit. With mild alterations, and suitable choice of random bits, the FA generates a “NAF” – a non-adjacent form in which no two successive digits are non-zero. Again, the generation cost for this representation is minimal, and the cost of computing kP is similar to that of the standard square-and-multiply algorithm. This is a very cheap way of introducing some randomness. For not much cost at the re-coding stage, a random representation is obtained which costs little more than the binary exponentiation algorithm to execute. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Power Analysis Attacks Again, this is for ECC not RSA because inverses must be found. With standard algorithms for point operations, adds and doubles are easily distinguished using timing differences in a single power trace because the code for them is different. But, again, adds may represent any of several digits, so reconstructing the exponent from one trace creates an infeasibly large search space. Again, averaging does not produce useful information. As one would expect, the remarks on the possibility of mounting a DPA attack are the same as for the Liardet-Smart algorithm. If any timing differences are addressed, then power and EMR might be used instead to investigate Hamming weights of data and addresses in order to determine what is happening in each elliptic curve operation, and hence deduce the secret key k. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Blinding etc. is necessary Unfortunately, if i) the same key is re-used, and ii) blinding of the key is omitted, and iii) doubles & adds can be distinguished (on a single expn), then their patterns over several expns reveals the secret key. So either: i) key blinding should be used, or ii) code used for adds and doubles must appear identical to SPA/ DPA/ SEMA/ DEMA, or iii) the add should always be done, and the result chosen appropriately. Thus, the same problem arises as with Liardet-Smart. The details of extracting the secret key are slightly different under the same assumptions, but the result is the same: some additional counter-measures are essential. The frequency of zero, one and two adds between consecutive doubles at a given point in the set of traces determines the bit pair of k at the corresponding point in the binary represenation of k. In fact, the same recommendations as before for maintaining side channel resistance are appropriate. Reference: CDW, “Issues of Security with the Oswald-Aigner Exponentiation Algorithm”, CT-RSA 2004, LNCS 2964, Springer, 2004, pp. 208–221 (see bibliography). Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

The Ha-Moon Expn Algm Decryption/Signing in ECC: Q = kP for point P and secret key k. k is represented with random recoding to 2-power base m and digits ki: k = ((...((kn)m+kn–1)m+...)m+k1)m+k0 Digits are generated in the order of use kn, kn–1 ,… Digit multiples k'P are computed over chosen digit range. kP is computed by processing digits from most to least significant: k = m0(m1(... mn–2(mn–1(knP) +kn–1P) +...) +k1P) +k0P There are actually two Ha-Moon algorithms, published in Ha et al, CHES 2002, LNCS 2523, and in Yen et al, ICICS 2004, LNCS 3506. The second “improved” version is described here. Like the previous two algorithms, it is a re-coding from binary to a 2-power base m. It is certainly stronger than Liardet-Smart and Oswald-Aigner, but requires more execution time, and some space for pre-computed multiples of P. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Yen-Ha-Moon Recoding (ICICS 2004) Assume binary coding k = b2n–1b2n–2…b1b0. Pre-compute & store R[i]  iP for 0 ≤ i ≤ 14P. r  b2n–1b2n–2 ; For i from n–2 downto 0 do { borrow  4r ; r  (Random  {1,2,3}) ; R[0]  4R[0] + R[borrow – r + b2i+1b2i ] } ; R[0]  R[0] + R[r] This is one example of the possible choices for the improved Ha-Moon algorithm of Yen et al. The base is m = 4 and the digit range 1 to 14. r is always 1, 2 or 3, so borrow is always 4, 8 or 12, and so borrow–r+b2i+1b2i is in the range 1 to 14. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Efficiency Positive digits, so suitable for RSA as well as ECC. No non-zero digits, so time as for square-and-always-multiply. Re-coding can be done “on-the-fly”, i.e. the re-coded exponent does not need to be stored before use. Base and digit range are chosen to fit available resources, but more space is required than Oswald-Aigner to achieve greater security. This slide is self-explanatory, except for the last point, which is apparent from the example case that requires storage for 13 multiples of P. Apart from improved resistance to side channel analysis, these efficiency points are the main improvements over the original version of the algorithm. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Power Analysis Attacks The previous problems with Liardet-Smart and Oswald-Aigner have been solved: The pattern of adds and doubles is always the same. If Hamming weight of operands can be observed, then digit values can be deduced and the secret key reconstructed. If the same key is re-used, weak or inconclusive leakage can be pooled over many scalar multiplications to recover the key. D. J. Park and P. J. Lee published an attack on this algorithm in WISA 2005, LNCS 3786. It assumes known or chosen ciphertext. Here, only very weak leakage is enough to recover the key if it is re-used many times without blinding. Movement of data along a bus uses power which depends slightly on its Hamming weight. So it is possible to recognise an increased probability that two operands are the same. This means the digits in the re-coding might be the same. Suppose the first few bits of the secret key have been recovered. We consider how to recover the next pair of bits. From several scalar multiplications with the same key, it is possible to work out which digit might have been chosen for this iteration in each case. We also know from previous bits what digits were probably chosen and what the borrow might be. This should give a range of three digits being most likely, namely those corresponding to the three random choices for r. From this we can deduce the next pair of bits in k. The more scalar multiplications are performed without blinding of the key, the more information we can collect from side channel leakage, thereby increasing the liklihood of recovering the key without too much further computation to eliminate any errors in the deduction of bits. The “BigMac” attack next shows how to make use of Hamming weight leakage for a single scalar multiplication. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Blinding etc. is necessary Unfortunately, if i) the same key is re-used, and ii) blinding of the key is omitted, then weak Haming weight leakage over several expns reveals the secret key. So random blinding of the key should be used. This is the obvious conclusion from the remarks on the previous slide. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

“Big Mac” Attack on RSA & ECC Summary: A variation of DPA is used to determine the secret exponent in an embedded RSA or ECC cryptosystem. Assumption: The implementation uses a small multiplier whose power consumption is data dependent and measurable. Properties: i) On average, a Mult-Acc opn a×b+c has data dependent contributions roughly linear in the Hamming weights of a & b; ii) Required random variation occurs because of the initial state set up by the previous mult-acc opn. iii) No knowledge of I/O required. Reference: CHES 2001. To show how leaked data can be combined in a less obvious way than in standard SPA or DPA attacks, here is the “Big Mac” attack. In fact, the title is more generic than this particular attack: it just means that the bits of the key are bitten off independently in any order (like the layers of a “Big Mac” from MacDonalds because these burgers are too big for all layers to fit in one’s mouth at once). It is typical of the attacks that are now coming forward. It makes no assumptions about known or chosen cipher texts or known “public” modulus or “public” key E. These are the type of assumptions that standard blinding techniques should now render obsolete. So, we will attempt to recover a private key D from a single power trace, assuming no knowledge of the I/O data, nor any control over it, nor any knowledge of the RSA modulus. All we have is the power trace and some experience from previous reading of literature and experiments on other similar chips. We also assume that the implementation performs the computation using a single small multiplier. Typically in a smart card this will be an 8- or 16-bit multiplier. We need the power trace to depend (mainly) on the data being fed into this multiplier. This is fine in the context of a RISC processor. However, if there is a processor/ co-processor set-up, the processor might be performing calculations to hide what the cryptographic co-processor is doing. The attack may average this away if the key length is large enough. Otherwise, we might measure EMR (electro-magnetic radiation) instead, using a probe over the multiplier, as described by Marc Joye. Recall that, apart from memory and communications, the multiplier probably occupies more area than anything else, and it certainly makes the largest demands on power. The original paper on this is in CHES 2001. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Combining Traces I The long integer product A×B in an exponentiation contains a large number of small digit multiply-accumulates: ai×bj+ck Identify the power sub-traces of each ai×bj+ck from the power trace of A×B; Average the power traces for fixed i as j varies: this gives a trace tri which depends on ai but only on the average of the digits of B. We view an exponentiation as if it were a “Big Mac”: the half buns at the top and bottom represent the pre- and post- processing, the layers in between (beefburger, tomato, lettuce or sauce) are the digits, with the filling type being the digit value. ___________________________________________________ Each long integer multiplication consists of a sequence of digit multiplications. We assume enough experience to be able to break the trace of each long integer multiplication into individual traces for digit multiplications. This is generally very easy because of things like clock edges being distributed, data being sent along buses etc. We will average the sub-traces which involve a fixed input digit ai. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Combining Traces a0b0 a0b1 a0b2 a0b3 Randomised Algorithms Here is an illustration of how this might be done. (It’s a simulation, rather than an actual trace.) We start by breaking the original side channel trace into a concatenation of sub-traces, one for each execution of the multiplier-accumulate operation. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Combining Traces a0b3 a0b2 a0b1 a0b0 Randomised Algorithms The sub-traces corresponding to each ai are added together: Here’s the first sub-trace. Add in the next sub-trace, and the next, and …. up to the last. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Combining Traces Randomised Algorithms We need to average them…. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Combining Traces Average the traces: a0(b0+b1+b2+b3)/4 There’s the average. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Combining Traces tr0 _ b is effectively an average random digit; So trace is characteristic of a0 only, not B. tr0 _ a0b This averaged trace depends on both a0 and something we can think of as an “average” digit, \bar{b}. If it is truly average, the trace is characteristic of a0, and has little dependence on B, or \bar{b}. Call this trace tr0. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Combining Traces II The dependence of tri on B is minimal if B has enough digits; Concatenate the average traces tri for each ai to obtain a trace trA which reflects properties of A much more strongly than those of B; The smaller the multiplier or the larger the number of digits (or both) then the more characteristic trA will be. In the same way as for tr0 we can obtain averaged traces tr1, tr2… for the other digits a1, a2… of A. These are concatenated together to give an averaged trace for A, trA. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Combining Traces trA tr0 tr1 tr2 tr3 Question: Is the trace trA sufficiently characteristic to determine repeated use of a multiplicand A in an exponentiation routine? tr0 tr1 tr2 tr3 trA Here’s a simulated example. Take the averaged sub-trace tr0 for digit a0. Append the sub-trace tr1 for digit a1, and sub-trace tr2 for digit a2, … and all the sub-traces tri up to the last digit; a3 in this example. This gives the trace trA which we will use to determine uses of A as a multiplicand in the exponentiation. Is the processed trace sufficiently characteristic for this to be done? Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Distinguish Digits? Averaging over the digits of B has reduced the noise level; In m-ary exponentiation we only need to distinguish: squares from multiplies the multiplicands A(1), A(2), A(3), …, A(m–1) For small enough m (the radix of the exponent) and large enough number of digits they can be distinguished in a simulation of clean data. There is a danger that this “theoretical” attack could be made practical. The answer to the question on the previous page is yes: the trace is sufficiently characteristic of the multiplicand to identify it if enough leakage is present and noise levels are not too great. Some simulation figures are given in the referenced paper for values of m that we might find in practice. Next we show how to do this. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Distance between Power Traces current trA0 trA1 i We take the (vectors of) measured points from two (averaged) traces, and simply take the Euclidean distance between them. This is a good metric to use because it emphasises the contribution from points where the power consumption differs most. (In practice, the best results are obtained by identifying the points in each clock cycle where the data dependency is greatest and then throwing away the other points, where there is little or no data dependency. For all of the latter part of the clock cycle, all the gates should have settled so that the data dependency is minimal. For the first part of the clock cycle, the gates are switching along the various routes.) For the simulation, gate switch counts were used, rather than actual power measurements. This gave cleaner data, without influences from noise such as other operations on the chip, inconsistently aligned measurements, etc. n d(A0,A1) = (  i=0(trA0(i)trA1(i))2 )½ n Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Simulation Results Equal exponent digits can be identified – their traces are close; Unequal exponent digit traces are not close; Squares can be distinguished from multns – their traces are not close to any other traces; There are very few errors for typical cases. Pre-computations A(i+1)  A  A(i) mod N provide traces for known multiplicands. So: We can determine which multive opns are squares; We can determine the expnt digit for each multn; Minor extra detail for i = 0, 1 and m–1; This can be done independently for each opn. From a simulation we can deduce the results listed on the slide above. In an exponentiation, each occurrence of exponent digit i will involve the same multiplicand A(i), say. So, by the observations above, we can tell from the traces which operations correspond to equal exponent digits. Assuming that every exponent digit turns up with roughly its expected frequency, there will be plenty of traces which are close together, all corresponding to the use of a common multiplicand. However, squaring operations are quite different. Their traces are not close to those of any other operation because they don’t share their multiplicand with any other operation. So they should be easily identified. Moreover, they always occur in groups of log2m sequential operations. Finally, note that if the measurements can be made to detect even a minimal level of data dependency, because of the quantity of data it is likely that few errors will occur in determining which operations are which, and which multiplications share the same exponent digit. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Conclusions on “Big Mac” The independence of deducing each digit value means attack time proportional to secret key length; Longer modulus means better discrimination between traces; So no greater safety against this attack from longer keys! With the usual DPA averaging already done, it may be possible to use a single exponentiation to obtain the secret key; So blinding expnt D+rφ(N) with random r may be no defence. With this approach, each exponent digit requires the same amount of work to recover. So the total work involved is proportional to the key length assuming the modulus length is unchanged. However, if the modulus is increased there are more digit multiplications: n digits in the modulus means n2 digit multiplications, and so n sub-traces to average for each multiplicand digit. Hence the average is less biased by the other longer integer input B. Moreover, there are n averaged sub-traces to concatenate. As n increases, the trace trA becomes both longer and more accurate for characterising A. Hence the discrimination between multipliers becomes easier. This means exponent digits can be determined more accurately, and so a longer key length means greater weakness against this attack. (See my paper in SAC 2003.) Thus, unexpectedly, Shorter keys are better! Aside: In case you are interested, the attack called “Big Mac” partly because I’m Scottish (the “Mac”), and partly after that hugely thick item from Macdonald’s which can only be eaten one layer at a time (but in any order!) The key here is consumed by the attacker one digit at a time, in any order. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

MIST – Yet Another Expn Algm There are currently two expn algorithms which offer some hope. They appear stronger than those described earlier. Both are described in Ches 2002. The first is MIST. It is based on the concept of division chains, which are a special type of addition chain (which is a means of describing an expn scheme.) The other is by Itoh et al. which is based on digit blinding of the secret key. Both these are suitable for RSA since they do not require inverses: digits are non-negative. MIST was first described at CT-RSA 2002. Both CT-RSA and CHES have proceedings published in the Springer LNCS. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

m-ary Expn (Reversed) { To compute: P = CD } Q  C ; P  1 ; While D > 0 do Begin d  D mod m ; If d  0 then P  Qd × P ; Q  Qm ; D  D div m ; { Invariant: CD.Init = QD × P } End { Output: P = CD for the initial value of D } The example of 23510 is given in the notes. Here is m-ary exponentiation for a representation of exponent D with base m, but reversed to read & process the digits of D in the reverse of the usual order, i.e. from right to left. Successive values of d are the digits base m from least to most significant. Variable P accumulates the required product, while Q contains a pure mth power of the input text, namely Cm^i for some i. Example: For D = 235 base 10, written 23510, the successive divisions of D by 10 give 235, 23, 2 and 0. So the digits d are generated in the order 5, 3 and 2. Q takes the values C, C10 and C100, and so Qd takes the values C5, C30 and C200. Thus P takes the sequence of values 1, C5, C35 and C235, the last of these being the output. This direction for processing digits is not used in practice, except for m = 2, because of the cost of computing Qd in each loop iteration. For the opposite order, these powers can be pre-computed and stored. Compare this slide with the next one to see how close it is to MIST. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

The MIST Expn Algorithm { To compute: P = CD } Q  C ; P  1 ; While D > 0 do Begin Choose a random base m, e.g. from {2,3,5} ; d  D mod m ; If d  0 then P  Qd × P ; Q  Qm ; D  D div m ; { Invariant: CD.Init = QD × P } End { Output: P = CD for the initial value of D } All that MIST does, having set things up this way, is to choose m randomly in each loop, usually from a pre-determined set for which the computation of Qd is cheap. So one might reasonable call this randomary exponentiation. Historically, m is called a divisor, and the sequence of pairs (m,d) a division chain for D. This is the terminology of the paper in the CHES proceedings. Here I have re-cast it in the traditional notation of cryptography and exponentiation, viewing it via a “change of base” algorithm. Generally, m is selected from a small set, such as {2,3,5}, with known security strength and for which additional data has been pre-calculated. (Unlike the previous algorithms, the base choice need not be a power of 2.) As before, P accumulates the partial product and eventually contains the input base C raised to the power D. As before, Q contains C raised to the power of the product of the previous values of m. At any given point, D contains the unprocessed part of the initial exponent, which is its initial value divided (integer division) by the product of the previous values of m. It is easy to prove the correctness of the algorithm: see the IEEE TC paper or the CT-RSA 2002 paper. It requires establishing the loop invariant. As can be seen from the loop invariant, D really does contain the unprocessed part of its initial value D.Init. Making a random choice for m at each loop iteration provides a different exponentiation scheme for each exponentiation. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

“Randomary” Exponentiation The main computational part of the loop is: If d  0 then P  Qd × P ; Q  Qm To provide the required efficiency, a set of possible values for m are chosen so that there is always an efficient addition chain for m which contains d, e.g. 1+1=2, 2+1=3, 2+3=5 is an addition chain for base m=5 suitable for digits d = 0, 1, 2 or 3. Comparable to the 4-ary method regarding time complexity when the base set is {2,3,5}. The main problem with this algorithm is the expense of computing both Qd and Qm. These are not done sequentially and independently as implied by the above code: an addition chain for computing Qm is selected to guarantee that Qd is computed en route, so that it comes for free. Consequently, like the m-ary method of exponentiation, all the work required is contained in raising something to the power m (equivalent to the squarings of the binary method) plus an extra multiplication when the exponent digit d is non-zero. The slide gives an example of an addition chain for 5: 1+1=2, 2+1=3, 2+3=5. Its key property is that in each equation the numbers on the left side appeared on the right side of earlier equations. So the 2 and 3 in the last equation were the values computed in the first two equations. (To start the process it is assumed that 0 and 1 are always available to use.) Each equation states the relationship between the exponents when two powers of the same number are multiplied together. Thus 2+3=5 corresponds to A2×A3 = A5. So an addition chain defines a scheme for exponentiation, with the powers on the left side of each multiplication having been computed earlier and stored for subsequent use. Since repeatedly raising to the power 2 provides the highest power for the fewest multiplications, it is not clear that this method will out-perform the standard binary algorithm. However, always taking m = 2 just gives one of the normal, efficient, square-and-multiply methods. If we have a bias towards m=2 and frequently pick other ms for which d = 0, then we save a multiplication. This over-compensates for the extra cost of raising to the power m. As a result, the method becomes about as efficient as 4-ary exponentiation. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Example Fix the base set = {2, 3, 5}. Consider D = 23510. D m, d Q (before) Qd Qm (next Q) P (after) 235 3, 1 C 1 (C 1)1 = C 1 (C 1)3 = C 3 1×C 1 = C 1 78 2, 0 C 3 (C 3)0 = 1 (C 3)2 = C 6 C 1 39 5, 4 C 6 (C 6)4 = C 24 (C 6)5 = C 30 C1×C24 = C 25 7 2, 1 C 30 (C 30)1 = C 30 (C 30)2 = C 60 C25×C30 = C 55 3 3, 0 C 60 (C 60)0 = 1 (C 60)3 = C 180 C 55 1 2, 1 C 180 (C 180)1= C180 (C 180)2= C 360 C55×C180= C 235 The exponent is pre-computed as 23510 = 120312450213. Here is an example of the algorithm for computing C235 (where 235 has base 10). Each value of D introduces a new loop iteration in the algorithm, with a fresh random base m and the digit d = (D mod m). The values of D drop by a factor m down the column. On the other hand, the values of Q and P increase. They show the powers of C that are computed in the two registers at each point: Q contains C raised to a power equal to the product of all choices of m up to that point; P contains the product of all the Qd values. The example uses 9 multiplications to create C180 and just 3 more to create the final P. Thus 12 long integer multiplicative operations are performed in total – similar to the 1.5×log2235 ≈ 10.5 expected from the binary method. In fact, here 23510 = 111010112, so that 12 multiplications are also required for the usual square and multiply algorithm. For security, the code is not executed as written but re-ordered. The exponent is pre-computed and stored as 23510 = 120312450213 in order to avoid attacks using observations of the pauses between long integer multiplications while the next exponent digit is chosen. Then the digits string is used to drive the exponentiation phase. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Choice of Base Set Security: Bases must be chosen so that sequences of squares & multiplies or opd sharing do not reveal m. (This is the information which BigMac recovers from leakage.) Efficiency: Bases m must be chosen so that raising to the power m is (time) efficient enough. Space is required to store addition chains. As few registers as possible should be used for the exponentiation. One Solution: Take the set of bases {2,3,5}. As noted earlier, there are two main problems to solve: one is time/space efficiency, the other is security. The pattern of squares and multiplies must not reveal the exponent. If we were to choose just {2} as the base set from which each m must be chosen, then we would obtain the usual right-to-left binary algorithm and the sequence of squares and multiplies would yield the exponent directly. This is insecure. We need at least two or more bases to provide the choice necessary to enable different addition chains to be generated. In this regard, the more the merrier. However, a large base may have a distinctive pattern for raising to the power m, so that it can be recognised. Hence small bases may be better. The study of how sequences of squares and multiplies can reveal the exponent is both fascinating and extensive. There was insufficient space in the CT-RSA 2002 paper to cover this important topic. We are going to look at it briefly next. It is covered in a paper at CHES 2002. For efficiency, 2 and numbers of the form 2n+1 are generally the most desirable choices for m. {2,3,5} turns out to be a good choice for further investigation. With a fixed set of bases to choose from, the addition sub-chains for raising to the powers m and d can be pre-determined to maximise efficiency and security. Other choices of m would need such details to be computed dynamically, and this would probably make the process too inefficient for intended applications and perhaps insecure. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Choice of Base Example algorithm (see CT-RSA 2002): m  0 ; If Random(8) < 7 then If (D mod 2) = 0 then m  2 else If (D mod 5) = 0 then m  5 else If (D mod 3) = 0 then m  3 ; If m = 0 then Begin p  Random(8) ; If p < 6 then m  2 else If p < 7 then m  5 else m  3 End How do we select m randomly to provide an efficient exponentiation scheme? Efficiency is still a problem. With the base 3 we need two multiplications to raise to the power 3, and in 2 out of 3 cases an extra multiplication to update P. So we have, on average 22/3rds multiplications for a reduction in the exponent by a factor close to 3. This means almost 2n multiplicative operations for an exponent of n bits. This is much less efficient than base 2 which requires an average of 3n/2 multiplicative operations. Similarly base 5 is inefficient, though not quite by as much. Hence the previous worries that the method might be too inefficient. Thus we may wish to bias the choice of base. When the digit d is 0 we save a multiplication, so this would make an efficient preference. Hence, in most cases (7 out of 8, say) we select such a base, choosing the most efficient base of 2 in preference to the next most efficient, namely 5, and the least efficient 3 last. As multiples of 3 and 5 are reasonably frequent, we save over the binary algorithm’s choice of 2 when D is odd. (In the slide, Random(n) returns a random integer in the range 0..n–1). If a zero digit is not possible, we again bias the choice of bases towards efficiency. Surprisingly, 2 is still the most efficient choice, although it requires the most multiplications for the number of bits it removes from the exponent. This apparent contradiction arises because, when combined with the average number of multns to obtain the same overall bit reduction as would be given by choosing 3 or 5, it is actually better than either of these choices. The choice of m given here requires about 1.42n multiplications. Counter-intuitively, this is better than the binary algorithm. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

An Operand Re-Use Theorem THEOREM : For MIST, the search space for exponents with the same operand sharing sequence as D has size approx D1/3. These are the hypotheses resulting from the assumption that a “Big Mac” attack is possible. It yields the knowledge about operand sharing which is what breaks m-ary exponentiation. Under the same hypotheses, the search space for m-ary expn has size D0, i.e. 1! Oswald reduces the time by performing a prioritized search Exponent blinding is still recommended with this algorithm. There are several papers on MIST which look at its efficiency and security properties. The CT-RSA 2002 paper looks at the efficiency, the CHES 2002 is a first look at the security issues, one at CT-RSA 2003 extended a standard attack to MIST, and several more papers are in the pipeline. These issues are somewhat more complex for this algorithm than for either Liardet-Smart or Oswald-Aigner. In comparison with MIST, when the above operand sharing knowledge is assumed for m-ary exponentiation, the entire exponent is revealed immediately: equal operands means equal exponent digits. So MIST is much stronger in this regard. The theorem says that a 1024-bit exponent still has a search space involving approximately 340 bit choices. (However, knowing half the bits of an RSA seccret key actually gives the ability to factor the modulus, so that only half the bits need to be “guessed” – see bibliography: CDW, Seeing through Mist given a Small Fraction of an RSA Private Key, CT-RSA 2003.) Oswald has a web paper “Markov Model Side-Channel Analysis” which looks at the probability of strings of choices (m,d) and selects the most likely first in a search of the key space. This reduces the average time to search the space to less than that described in the theorem. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Overlapping Windows (Itoh et al.) 1 1 0 0 1 0 0 1 0 ……. 1 0 1 1 1 0 ← D (the secret key) 1 0 1 0 ← w0 = 1100 – random 10 1 0 1 0 ← prev rand 10 . copy dn 10 0 1 1 1 ← w1 = 1010 – random 11 1 1 0 1 ← prev rand 11 . copy dn 01 1 0 1 1 ← w2 = 1101 – random 10 1 0 0 ……. ← prev rand 10 . copy dn 0... 1 0 1 1 1 0 0 1 ← wn–1 = 1011 – random 10 1 0 1 0 ← prev rand 10 . copy dn 10 1 0 1 0 ← wn (= remaining value) The greater complexity of MIST seems to provide increased security. However, the overlapping windows algorithm of Itoh et al. is perhaps more straight-forward to understand and probably has similar strength to the (similar) Ha-Moon algorithm. Like Ha-Moon, we express it as a re-coding from left to right, so that exponentiation can be done simultaneously with the re-coding of the secret key. To illustrate the algorithm, we consider the selection of the digit w1. We have a randomly chosen 2-bit remainder of 10 (the overlap width) which determined the choice of w0. Copy down and append the next 2 bits 10 to make up the window width of 4 bits. This gives 1010. Each wi is chosen by selecting a random number with a bit-length equal to the overlap. It is 11 for w1. This is the value of the remainder used in the next iteration of the process, and so it determines w1 = 1010 – 11 = 0111. So far there has been little published material on the strength on this algorithm. Thus, as always, we warn that: New algorithms should be given time to be thoroughly investigated before being used in practice. In fact, some forms of this algorithm (e.g. the unenhanced version given here) can be broken using Big Mac with m = 2h, but allowing digits (and pre-computed values) up to 2k. Windows of width k=4 bits and with overlap h=2 bits Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Efficiency If the windows have fixed width of k bits, say, the overlap is fixed at h bits, and m = 2k–h, then D = (((w0m+w1)m+w2)+…)m+wn–1)m´+wn for suitable m´ which depends on the number of bits left. Evaluation is done as for m-ary exponentiation (left to right) where m = k–h is the non-overlap width, but the powers from 1 to 2k–1 have to be pre-computed, i.e. all possible values for digits wi. Space efficiency is larger than for m-ary exponentiation because of the space needed to store 2k pre-computed values. Time efficiency depends entirely on m, and so is equivalent to that of m-ary exponentiation. Time efficiency is comparable to the m-ary method where m is the difference between the window width and the overlap, but more memory is required for the larger number of pre-computed values. (Extra time is needed to compute these, of course, but this is negligible compared to the whole exponentiation.) Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Enhancements For security, h > ½k is recommended: an overlap of at least half the width of the window (overlap = amount of randomness). The windows have a fixed width (determined by the chosen digit range), but the overlap widths can be varied if desired. A further refinement of this is to offset the table of pre-computed values by Cr. (C = ciphertext, r = random blinding factor, say b=32 bits). In this refinement, for random wi of log2k bits, the digit in the representation of D is 2bwi+r. This is the value subtracted from D in the first slide, and these are the values that are stored in the table as well. This leaves a bottom digit of b bits to process – does that leak? It is early days yet in the assessment of the strength of this algorithm, but it seems at least to be very promising as a method against DPA. However, recommendation of it is subject to including enhancements which block the Big Mac attack described on earlier slides. The third enhancement above, namely applying a random offset to the pre-computed table, seems to suffice in this respect if the random offset has enough bits – but perhaps leakage from processing the bits in the last digit might help to reveal it. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Conclusions Exponentiation leaks very easily. We are only beginning to understand the many ways the secret key might be reconstructed from side channel leakage. Randomisation has the potential of removing the predictability required for current state-of-the-art in DPA and DEMA attacks. There are a number of randomised exponentiation algorithms available to help solve the problems. Some are known to offer little, others are unproven, others seem to be secure – perhaps. Almost certainly these algorithms should be used only when thoroughly investigated, and only in conjunction with other standard counter-measures, such as message and key blinding, “always add”, balanced code for add/double, etc. Above is a very short summary of the topics covered and the conclusions drawn during this lecture. It is worth remarking that I presented a paper at CHES 2004 in which I used a randomisation counter-measure to attack an implementation of ECC which has balanced code for double and add. The underlying hardware was assumed to use an unsecure implementation of Montgomery modular multiplication, and randomisation generated cases which were sufficiently extreme to reduce the search space to a feasibly computable size. Thus, randomisation itself may be a threat to the security of a crypto-system. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Not an end – just a beginning! Randomised Algorithms Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Bibliography IV This is a list of key references which discuss the topics covered in more detail (see also those for the other talk). Those with CDW as an author are mostly available at http://www.comodogroup.com/research/crypto/publications.html E. Brier & M. Joye, Weierstraß Elliptic Curves and Side-Channel Attacks, PKC 2002, LNCS 2274, Springer, 2002, 335–345. P.-Y. Liardet & N.P. Smart, Preventing SPA/DPA in ECC Systems Using the Jacobi Form, CHES 2001, LNCS 2162, Springer, 2001, pp. 391–401. CDW, Breaking the Liardet-Smart Randomized Exponentiation Algorithm, Proc. Cardis 2002, Usenix Assoc, Berkeley, CA, 2002, 59–68. E. Oswald & M. Aigner, Randomized Addition-Subtraction Chains as a Countermeasure against Power Attacks, CHES 2001, LNCS 2162, Springer, 2001, pp. 39–50. CDW, Issues of Security with the Oswald-Aigner Exponentiation Algorithm, RSA 2004, LNCS 2964, Springer, 2004, 208–221. The papers in these two bibliographic slides cover the four randomised exponentiation algorithms of the lecture, and the Big Mac attack which shows that such algorithms are necessary. Classical exponentiation algorithms are described in excellent detail in Knuth’s book, The Art of Computer Programming, volume 2: “Semi-Numerical Algorithms”. The SAC 2003 conference proceedings (published in LNCS) contain an assessment of the weakness of longer keys under an implementation attack. The attacks on the first two randomised exponentiation algorithms are described in two of the papers of this slide. Successful attacks on particular combinations of SW and HW continue to appear. CHES 2004 contains another by CDW in which all the standard blinding techniques and the excellent balanced double and add code of Brier-Joye are used for ECC but on leaky hardware using square-and-multiply. Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms

Bibliography V CDW, Sliding Windows succumbs to Big Mac Attack, CHES 2001, LNCS 2162, Springer, 2001, pp. 286–299. CDW, MIST: An Efficient, Randomized Exponentiation Algorithm for Resisting Power Analysis, CT-RSA 2002, LNCS 2271, Springer, pp. 53–66. CDW, Some Security Aspects of the MIST Randomized Exponentiation Algorithm, CHES 02, LNCS 2523, Springer 2002, pp. 276–290. CDW, Longer Keys may facilitate Side Channel Attacks, SAC 2003, LNCS, vol. 3006, Springer-Verlag, pp. 42-57. CDW, Seeing through Mist given a Small Fraction of an RSA Private Key, CT-RSA 2003, LNCS 2612, Springer 2003, pp. 391–402. K. Itoh, J. Yajima, M. Takenaka & N. Torii, DPA Countermeasures by Improving the Window Method, CHES 02, LNCS 2523, pp. 303–317. E. Oswald, Markov Model Side-Channel Analysis, SCA-Lab Technical Report Series IAIK, TR 2004/03/01, at http://www.iaik.tugraz.at/aboutus/people/oswald/papers/TR2004-03-01-MarkovModelSCA.pdf Colin D. Walter, Comodo Research Lab, Bradford, UK Next Generation Digital Security Solutions Randomised Algorithms