1 Montgomery Multiplication David Harris and Kyle Kelley Harvey Mudd College Claremont, CA 91711 {David_Harris,

Slides:



Advertisements
Similar presentations
Cryptography and Network Security Chapter 9
Advertisements

Are standards compliant Elliptic Curve Cryptosystems feasible on RFID?
Computer Science CSC 405By Dr. Peng Ning1 CSC 405 Introduction to Computer Security Topic 2. Basic Cryptography (Part II)
OOP/Java1 Public Key Crytography From: Introduction to Algorithms Cormen, Leiserson and Rivest.
Public-key Cryptography Montclair State University CMPT 109 J.W. Benham Spring, 1998.
ECOMMERCE TECHNOLOGY SUMMER 2002 COPYRIGHT © 2002 MICHAEL I. SHAMOS Cryptographic Security.
Public Key Crytography1 From: Introduction to Algorithms Cormen, Leiserson and Rivest.
ECOMMERCE TECHNOLOGY FALL 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Cryptography.
Design of a Reconfigurable Hardware For Efficient Implementation of Secret Key and Public Key Cryptography.
A Dual Field Elliptic Curve Cryptographic Processor Laboratory for Reliable Computing (LaRC) Electrical Engineering Department National Tsing Hua University.
Dr. Lo’ai Tawalbeh Fall 2005 Chapter 10 – Key Management; Other Public Key Cryptosystems Dr. Lo’ai Tawalbeh Computer Engineering Department Jordan University.
Public Encryption: RSA
WS Algorithmentheorie 03 – Randomized Algorithms (Public Key Cryptosystems) Prof. Dr. Th. Ottmann.
Cryptography & Number Theory
Cryptography1 CPSC 3730 Cryptography Chapter 9 Public Key Cryptography and RSA.
Fall 2010/Lecture 311 CS 426 (Fall 2010) Public Key Encryption and Digital Signatures.
Public Key Algorithms 4/17/2017 M. Chatterjee.
1 Pertemuan 08 Public Key Cryptography Matakuliah: H0242 / Keamanan Jaringan Tahun: 2006 Versi: 1.
ELECTRONIC PAYMENT SYSTEMSFALL 2001COPYRIGHT © 2001 MICHAEL I. SHAMOS Electronic Payment Systems Lecture 6 Epayment Security II.
CSE 246: Computer Arithmetic Algorithms and Hardware Design Numbers: RNS, DBNS, Montgomory Prof Chung-Kuan Cheng Lecture 3.
Public Key Cryptography RSA Diffie Hellman Key Management Based on slides by Dr. Lawrie Brown of the Australian Defence Force Academy, University College,
Lecture 6: Public Key Cryptography
Introduction to Public Key Cryptography
Public Key Model 8. Cryptography part 2.
Public Key Encryption and the RSA Public Key Algorithm CSCI 5857: Encoding and Encryption.
Andreas Steffen, , 4-PublicKey.pptx 1 Internet Security 1 (IntSi1) Prof. Dr. Andreas Steffen Institute for Internet Technologies and Applications.
Montgomery multiplication Algorithm Mohammad Farmani Under supervision of : Dr. S. Bayat-sarmadi 2 nd. Semister, Sharif University of Technology.
1 Fluency with Information Technology Lawrence Snyder Chapter 17 Privacy & Digital Security Encryption.
Page 1 Secure Communication Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
The RSA Algorithm Rocky K. C. Chang, March
Elliptic Curve Cryptography
1 Network Security Lecture 6 Public Key Algorithms Waleed Ejaz
Montgomery Multipliers & Exponentiation Units
Cryptography: RSA & DES Marcia Noel Ken Roe Jaime Buccheri.
CS 627 Elliptic Curves and Cryptography Paper by: Aleksandar Jurisic, Alfred J. Menezes Published: January 1998 Presented by: Sagar Chivate.
Prelude to Public-Key Cryptography Rocky K. C. Chang, February
1 Lecture 9 Public Key Cryptography Public Key Algorithms CIS CIS 5357 Network Security.
Public Key Encryption CS432 – Security in Computing Copyright © 2005, 2008 by Scott Orr and the Trustees of Indiana University.
Chapter 21 Public-Key Cryptography and Message Authentication.
1HMC VLSI Lab Very High Radix Montgomery Multiplication David Harris, Kyle Kelley and Ted Jiang Harvey Mudd College Claremont, CA Supported by Intel Circuit.
Cryptography and Network Security Chapter 10 Fifth Edition by William Stallings Lecture slides by Lawrie Brown.
Public Key Cryptography. symmetric key crypto requires sender, receiver know shared secret key Q: how to agree on key in first place (particularly if.
1 Public-Key Cryptography and Message Authentication.
Cryptography and Network Security Chapter 9 - Public-Key Cryptography
CS461/ECE422 Spring 2012 Nikita Borisov — UIUC1.  Text Chapters 2 and 21  Handbook of Applied Cryptography, Chapter 8 
PUBLIC-KEY CRYPTOGRAPH IT 352 : Lecture 2- part3 Najwa AlGhamdi, MSc – 2012 /1433.
Chapter 3 (B) – Key Management; Other Public Key Cryptosystems.
Chapter 3 – Public Key Cryptography and RSA (A). Private-Key Cryptography traditional private/secret/single-key cryptography uses one key shared by both.
Public Key Cryptosystems RSA Diffie-Hellman Department of Computer Engineering Sharif University of Technology 3/8/2006.
Elliptic Curve Cryptography
Public Key Algorithms Lesson Introduction ●Modular arithmetic ●RSA ●Diffie-Hellman.
CS Modular Division and RSA1 RSA Public Key Encryption To do RSA we need fast Modular Exponentiation and Primality generation which we have shown.
Cryptography and Network Security Chapter 4. Introduction  will now introduce finite fields  of increasing importance in cryptography AES, Elliptic.
CS 4803 Fall 04 Public Key Algorithms. Modular Arithmetic n Public key algorithms are based on modular arithmetic. n Modular addition. n Modular multiplication.
Ch1 - Algorithms with numbers Basic arithmetic Basic arithmetic Addition Addition Multiplication Multiplication Division Division Modular arithmetic Modular.
Great Theoretical Ideas in Computer Science.
An Optimized Hardware Architecture for the Montgomery Multiplication Algorithm Miaoqing Huang 1, Kris Gaj 2, Soonhak Kwon 3, Tarek El-Ghazawi 1 1 The George.
Introduction to Elliptic Curve Cryptography CSCI 5857: Encoding and Encryption.
RSA Pubic Key Encryption CSCI 5857: Encoding and Encryption.
CSCE 715: Network Systems Security Chin-Tser Huang University of South Carolina.
CSEN 1001 Computer and Network Security Amr El Mougy Mouaz ElAbsawi.
1 The RSA Algorithm Rocky K. C. Chang February 23, 2007.
Efficient Montgomery Modular Multiplication Algorithm Using Complement and Partition Techniques Speaker: Te-Jen Chang.
Motivation Basis of modern cryptosystems
Public Key Cryptography. Asymmetric encryption is a form of cryptosystem in which Encryption and decryption are performed using the different keys—one.
An Optimized Hardware Architecture for the Montgomery Multiplication Algorithm Miaoqing Huang Nov. 5, 2010.
Public Key Cryptosystem
Chapter -5 PUBLIC-KEY CRYPTOGRAPHY AND RSA
Presentation transcript:

1 Montgomery Multiplication David Harris and Kyle Kelley Harvey Mudd College Claremont, CA {David_Harris,

2 Outline Cryptography Overview Finite Field Mathematics Montgomery Multiplication Tenca-Koç Montgomery Multiplier Improved Montgomery Multiplier Very High Radix Implementation Results Summary

3 Cryptography Overview Encryption has become essential –E-commerce (SSL) –Communications / network processors –Smart cards / digital cash –Military Two major classes of algorithms –Symmetric cryptosystems (e.g. DES) –Public key cryptosystems (e.g. RSA)

4 Cryptographic Protocols Alice and Bob would like to communicate securely. Eve wants to listen in. –Symmetric key: Alice and Bob must share a key for encryption and decryption. If Eve hears it, she can read the messages. –Public key: Alice publishes her public key to the world. Bob encrypts with Alice’s public key. Alice can decrypt only with her private key. Eve can’t decrypt with the public key.

5 Digital Signatures Alice wants to sign a contract in a way that only she can do. –Alice publishes her public key and keeps the private key secret. –Encrypt the document with her secret key. –Anyone can decrypt the document with her public key –But nobody can forge her signature.

6 Key Exchange Public key encryption is slow Use it to share a symmetric key –Use symmetric key to encrypt large blocks of data

7 RSA Encryption Most widely used public key system. –Good for encryption and signatures. –Invented by Rivest, Shamir, Adleman (1978) Public e and private d keys are long #s –n = bits –Satisfy x de mod M = 1 for all x –Finding d from e is as hard as factoring M Encryption: B = A e mod M Decryption: C = B d mod M = A ed = A

8 Modular Exponentiation Critical operation in RSA and for –Digital signature algorithm –Diffie-Hellman key exchange –Elliptic curve cryptosystems Done with 2n modular multiplications –Ex: A 27 = ((((((A 2 ) * A) 2 ) 2 ) * A) 2 ) * A –Division required after each multiplication to compute modulo

9 Finite Field Mathematics +, * modulo prime p form a finite field –p elements –Additive identity: 0 –Multiplicitive identity: 1 –Each nonzero number has a unique inverse x -1 –Named GF(p) For Evariste Galois, a 19 th century number theorist killed in a duel at age 20

10 Binary Extension Fields Building blocks are polynomials in x –Operations performed modulo some irreducible polynomial f(x) of degree n –Arithmetic done modulo 2 –Called GF(2 n ) Example: GF(2 3 ) Computation is the same as GF(p) –Except that no carries are propagated ElementCode x010 x+1011 x2x2 100 x x2+xx2+x110 x 2 +x+1111

11 Montgomery Multiplication Faster way to do modular exponentation –Operate on Montgomery residues –Division becomes a simple shift –Requires conversion to and from residues only once per exponentiation

12 Montgomery Residues Let the modulus M be an odd n-bit integer Define r = 2 n Define the M-residue of an integer a < M as There is a one-to-one correspondence between integers and M-residues for 0 < a < M-1

13 M-Residue Examples M = 11, r = 16

14 Montgomery Multiplicaton Define Where r -1 is the inverse of r mod M: –r -1 r = 1 (mod M) This gives the Montgomery residue of –z = xy mod M

15 Mont. Multiplication Example It may not be obvious that this is easier to do than regular modular multiplication.

16 Montgomery Multiplier MM is an easier operation that requires no hard division, just shifting In radix 2, Z = 0 for i = 0 to n-1 Z = Z + x i Y if Z is odd then Z = Z + M Z = Z/2 if Z ≥ M then Z = Z – M

17 Example X = 7 = 0111 Y = 5 = 0101 M = 11 = 1011 Z initially 0 –Z = ( ) / 2 = 8 –Z = ( ) / 2 = 12 –Z = ( ) / 2 = 14 –Z = (14 + 0) / 2 = 7 (final result) Z = 0 for i = 0 to n-1 Z = Z + xiY if Z is odd then Z = Z + M Z = Z/2 if Z ≥ M then Z = Z – M

18 Conversion Conversion of integers to/from Montgomery residues takes one MM operation (if r 2 mod M is precomputed and saved): Modular exponentiation takes two conversion steps and 2n multiplication steps.

19 Cryptography Accelerators Hardware accelerators offer more speed at less power than software –Via announced x86 C5J core Montgomery Multiply opcode (May 04) 3COM Router 5000 Series Encryption Accelerator IBM PCI SSL Cryptography Accelerator

20 Break

21 Break

22 Break

23 Reconfigurable Hardware Building hardwired n-bit unit is limiting –Slow for large n –Not scalable to different n Better to design for w-bit words –Break n-bit operand into e w-bit words –This is called scalable Also handle both GF(p) and GF(2 n ) –Requires conditionally killing carries –Called unified

24 Unified Carry Gate Full adder modified for dual-field ops –fsel = 1: normal operation GF(p) –fsel = 0: kill carryGF(2 n ) Only changes majority gate Sum remains XOR

25 Tenca-Koç Montgomery Multiplier Z = 0 for i = 0 to n-1 (C A, Z w-1:0 ) = Z w-1:0 + X i × Y w-1:0 reduce = Z 0 (C B, Z w-1:0 ) = Z w-1:0 + reduce × M w-1:0 for j = 1 to e+1 (C A, Z (j+1)w-1:jw ) = Z (j+1)w-1:jw + X i × Y (j+1)w-1:jw + C A (C B, Z (j+1)w-1:jw ) = Z (j+1)w-1:jw + reduce × M (j+1)w-1:jw + C B Z jw-1:(j-1)w = (Z jw, Z jw-1:(j-1)w+1 ) IEEE Transactions on Computers, Sept. 2003

26 Processing Elements Keep Z in carry-save redundant form Simple processing element (PE)

27 Parallelism Two dimensions of parallelism: –Width of processing element w –Number of pipelined PEs p Multiply takes k = n/p kernel cycles

28 Pipeline Timing

29 Queue If full PEs cause stall, queue results Convert back to nonredundant form –Saves queue space –CPA needed for final result anyway

30 Improved Design Don’t wait two cycles for MSB Kick off dependent operation right away on the available bits Take extra cycle(s) at the end to handle the extra bits For p processing elements, cycle count reduces from 2p to p + (p/w)

31 Improved PE Left-shift M and Y rather than right- shifting Z Same amount of hardware

32 Pipeline Timing

33 Latency Tenca-Koç Improved Design

34 Very High Radix These designs are Radix-2 –1 bit of x per PE Higher radix designs reduce latency –Process more bits of x per PE –Require integer multiplication instead of AND gates

35 Montgomery’s Algorithm Multiply: Z = X × Y Reduce:reduce = Z × M’ mod R Z = [Z + reduce × M] / R Normalize:if Z ≥ M then Z = Z – M M’ satisfies RR -1 – MM ’ = 1 –Drives LSBs to 0

36 Scalable Very High Radix Algorithm w-bit words of M and Ye = n/w v-bit digits of Xf = n/vradix = 2 v Z = 0 for i = 0 to f-1 (C A, Z w-1:0 ) = Z w-1:0 + X (i+1)v-1:iv × Y w-1:0 reduce = (M' v-1:0 × Z w-1:0 ) v-1:0 (C B, Z w-1:0 ) = Z w-1:0 + reduce × M w-1:0 for j = 1 to e+1 (C A, Z (j+1)w-1:jw ) = Z (j+1)w-1:jw + X (i+1)v-1:iv × Y (j+1)w-1:jw + C A (C B, Z (j+1)w-1:jw ) = Z (j+1)w-1:jw + reduce × M (j+1)w-1:jw + C B Z jw-1:(j-1)w = (Z jw+v-1:jw, Z jw-1:(j-1)w+v )

37 Very High Radix PE

38 Very High Radix Pipeline Timing

39 Latency Tenca-Koç: k = n/p Very High Radix: k = n/pv

40 Implementation C and Verilog reference models –Parameterized by w, p, and v –Extensive testing up to n = 1024 Synthesized Verilog onto FPGA –Xilinx Virtex II Pro XC2V2000-6

41 Results DescriptionTechnologyHardwareClock Speed (MHz) Scalable256-bit time (ms) 1024-bit time (ms) T-K p = 40 w=8 0.5  m CMOS synthesized 28 Kgates80Yes3.888 Improved p = 16 w = 16Xilinx Virtex II1514 LUTs + ~5n RAM 144Yes1.159 Improved p = 64 w = 16Xilinx Virtex II5598 LUTs + ~5n RAM 144Yes1.016 p = 4 w = 16 v = 16 very high radix Xilinx Virtex II780 LUTs + 8 mults + ~5n RAM 102Yes p = 16 w = 16 v = 16 very high radix Xilinx Virtex II2847 LUTs + 32 mults + ~5n RAM 102Yes

42 Summary Modular exponentiation is key operation in cryptography Hardware accelerators getting popular –Reconfigurable in key length & field Developed improved MM –Half the latency for n ≤ w*p –Half the queue size Higher radix looks even better –Well-suited to FPGAs