Precise Bounds for Montgomery Modular Multiplication and Some Potentially Insecure RSA Moduli Colin D. Walter formerly: (Manchester, UK) future: (Bradford, UK)
RSA 2002C.D. Walter, UMIST2 Motivation Modular multiplication is the foundation of most arithmetic-based cryptography: efficiency and security are important. Montgomery modular multiplication is one highly favoured method. To avoid full length comparisons or timing attacks, conditional modular reductions are skipped, but the price is a higher bound, often 2M for modulus M, and perhaps extra iterations. For typical, standard key and word lengths, 2M will overflow into the next word by just 1 bit. So an extra word may have to be processed: inefficient. Perhaps the overflow bit can be detected and allow a power analysis attack.
RSA 2002C.D. Walter, UMIST3 History P. L. Montgomery Modular multiplication without trial division Maths of Comp n 44 (1985), 519–521 C. D. Walter Montgomery Exponentiation Needs No Final Subtractions Electronics Letters 35 (1999), 1831–1832 G. Hachez & J.-J. Quisquater Montgomery Exponentiation with No Final Subtractions: improved results CHES 2000, LNCS 1965, 293 – 301
RSA 2002C.D. Walter, UMIST4 Montgomery Mod r Mult n { Pre-condition: 0 A < r n } P 0 ; For i 0 to n 1 do Begin q (p 0 +a i b 0 )(-m 0 -1 ) mod r ; P (P + a i B + qM) div r ; { Invariant: 0 P < M+B } End ; { Post-conditions: Pr n A×B mod M, ABr –n P < M + ABr –n }
RSA 2002C.D. Walter, UMIST5 Loop Invariants I Suppose P < M+B at the start of the loop. At the end of the loop, the new value of P is (P + a i B + qM) div r < ((M+B)+(r–1)B+(r–1)M)/r = M+B So the invariant holds. If B was bounded by 2M, the output would be bounded by 3M. Eitherwe perform a conditional subtraction orwe perform another iteration to keep input less than 2M. The former is banned to avoid timing attacks. If the last a i is small enough, the bound becomes M+B/2 < 2M and another iteration would be unnecessary. To achieve that we require a i r/2 for the top digit: — unlikely if A M and M uses all bits of the top word.
RSA 2002C.D. Walter, UMIST6 Loop Invariants II More accuracy is possible. Define: Then i+1 = ( i + a i )/r < 1 by induction. Suppose P i is the value of P at the start of the iteration using i. Then it is easy to establish: i+1 B P i+1 < M + i+1 B because i+1 B = ( i B + a i B)/r < (P i + a i B + q i M)/r = (P i + a i B + q i M) div r = P i+1 and similarly for the upper bound.
RSA 2002C.D. Walter, UMIST7 Post-Condition At the end of the last iteration: So the loop invariant gives: ABr –n P < M + ABr –n This is the tightest interval possible since its width is only M. It improves on the previous upper bound M+B since Ar –n < 1. It is much better if A is known to be smaller, e.g. less than M.
RSA 2002C.D. Walter, UMIST8 Stability Under what conditions will a bound on A and B be preserved? Then output from one MMM can be re-used as input without adjustment. Suppose A and B are bounded by (1+ )M. We require M + ABr –n (1+ )M always for such stability, i.e. M + (1+ ) 2 M 2 r –n (1+ )M This means (1+ ) 2 Mr –n which we can solve for suitable. It has real solutions exactly when: 4M r n
RSA 2002C.D. Walter, UMIST9 First Results The condition 4M r n for I/O remaining bound improves on those given by the papers cited earlier. When the condition is satisfied we can choose so that A and B are bounded by 2M or by ½r n as appropriate. Intermediate values of P are bounded above by ¾r n. For such M with n digits, there is no extra processing required to compensate for removing the final subtraction. For standard key lengths, we need to take n to be 1 more than the number of digits in M in order to satisfy the bound.
RSA 2002C.D. Walter, UMIST10 Standard Key Lengths We have seen the need for increasing n for standard key lengths. This means one more iteration than the number of digits in M. It is the cost of deleting the final subtraction. How many bits of the corresponding extra digit are required? We know the bound 2M means at most one bit is needed. Is it necessary? Its occasional existence may provide a handle for a timing or power analysis attack. The frequency of the top bit being non-zero is different for squares and multiplies. This was reported at RSA (This bit is what prompts the final conditional subtraction.)
RSA 2002C.D. Walter, UMIST11 The Extra Bit The frequency of the top bit becoming set is around 25% – 30% when n has not been increased. Increasing n decreases the upper bound M + ABr –n making it less likely to set the topmost bit, i.e. the next bit after the top bit of M. We need to discover its frequency of being 1 to determine if a difference for squares and multiplies is measurable. We will see when it is always zero. Since n is being increased by 1, we have ¼r n–1 < M < r n–1 and want I/O to be less than r n–1.
RSA 2002C.D. Walter, UMIST12 Conditions for no overflow bit The condition of interest is M + ABr –n < r n–1 when A, B < r n–1. So we need M such that M + (r n–1 ) 2 r –n < r n–1 i.e. M < r n–1 (1–r –1 ) Thus the arguments and output of MMM will have the same number of words as M unless the top word of M is all 1s. Hence, when the final conditional subtraction is omitted from MMM, there is no “overflow” bit against which a power analysis attack can be mounted unless the top word of M is all 1s.
RSA 2002C.D. Walter, UMIST13 The Unlikely Event The potentially dangerous case is therefore when the top word of M is r – 1, which is reassuringly uncommon, and the worst case is M = r n–1. By solving our previous quadratic in, the best bound on the inputs to achieve stability in that worst case is (1+ )M = ½r n (1–(1–4r –1 ) ½ ) = r n–1 + r n–2 + 2r n–3 + 5r n– With the reasonable assumptions that residues mod M are uniformly distributed, at most about r –1 of outputs will exceed r n–1. So, for a 16-bit architecture, and limited smartcard life, the overflow bit is too rare to be of use in power analysis. One could safely re-introduce a conditional subtraction here to avoid the need for extra hardware.
RSA 2002C.D. Walter, UMIST14 Exponentiation We end by noting that no final subtraction is needed in the case of MMM exponentiation: To compute T e mod M, pre-processing generates Tr n mod M so that subsequent multiplications are all larger than from standard modular multiplication by a factor of r n mod M. The output is therefore A = T e r n mod M. Post-processing removes the extra factor r n by an MMM multiplication by 1. The output is bounded above by M + Ar –n where A < 2M < ½r n. So the output is M. Of course, equality with M is impossible, since that could only arise from T = 0 which would result in output 0. So no final modular reduction is needed for exponentiation.
RSA 2002C.D. Walter, UMIST15 Conclusion Precise output bounds have been obtained for Montgomery Modular Multiplication. This gives I/O bounds for MMM in the context of exponentiation when the final conditional subtraction is omitted. All numbers have the same word size as the modulus M when 4M r n and M has n words. Otherwise, MMM must perform another iteration, but overflow bits are then too rare to be in danger from power analysis attacks. No final modular subtraction is required for exp n.