IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved An Efficient Polynomial Multiplier in GF(2 m ) and ist Application to ECC Designs Steffen Peter and Peter Langendörfer
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Outline Motivation and introduction into ECC Basic polynomial multiplication approaches Combinatorial polynomial multiplier Iterative polynomial multiplier Implications for the ECC design
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Elliptic Curve Cryptography Asymmetric cryptography Trapdoor : Elliptic Curve Point Multiplication – o ne can compute: Q = kP – it is infeasible to determine k for given Q and P Higher security with shorter keys than RSA – Recommended key lengths [Lenstra & Verheul “Selecting Cryptographic Key Sizes”] YearRSAECC >
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved ECC in Software or Hardware? 233 Bit ECC on MIPS (Software) or ECC hardware accelerator? Time for one ECPM: –MIPS:410 ms –HW: 0.4 ms Energy for one ECPM: –MIPS:16.5 mWs –HW: 0.03 mWs
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved ECC Pyramid Cryptographic Operations EC Point Arithmetic Finite Field Operations Basic Field Operations
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved EC Cryptographic Operations Crypto Ops EC Point Ops Finite Field Operations Basic Field Operations Cryptographic protocols -Signature generation/verification -Encryption/decryption Executed on a CPU -May use ECC accelerator for sub-routines CPU (MIPS, ARM, LEON,…) ECC Co-processor
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved EC Point Operations Crypto Ops EC Point Ops Finite Field Operations Basic Field Operations Operations on points on the Elliptic Curve –Point addition: Point + Point –Point multiplication: integer · Point (Montgomery/Lopez-Dahab Point Multiplication) Executed on the Co-processor CPU ECC Co-processor
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved EC Point Operations Crypto Ops EC Point Ops Finite Field Operations Basic Field Operations Asymmetric cryptography Trapdoor : Elliptic Curve Point Multiplication – one can compute: Q = kP – it is infeasible to determine k for given Q and P
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Finite Field Operations Crypto Ops EC Point Ops Finite Field Operations Basic Field Operations Operations in the finite field -Addition/subtraction (m-bit XOR) -Multiplication (m-bit · m-bit) -Squaring (much faster than multiplication) -Division (very expensive) Each EC point operation requires operations in the finite field –E.g one 233 bit EC Point multiplication –1200 Additions –1500 Multiplications (233 bit multiplication) –800 Squaring –1 division
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Basic Field Operations Crypto Ops EC Point Ops Finite Field Operations Basic Field Operations Prime Fields (GF(p)) – p is a very large prime (about 200 bits) – requires carries for additions – preferred for software implementations Binary Extension Fields (GF(2 m )) – m is bit length of the field (typical bit) –easy hardware representation (m-bit array) –no carries (additions are simple XOR operations) preferred for hardware implementations
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Utilization /Area of Functional Blocks Asymmetric cryptography Trapdoor : Elliptic Curve Point Multiplication – one can compute: Q = kP – it is infeasible to determine k for given Q and P Utilization 95%15% 50% Area 70% 5% 20%
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Classic (school) Polynomial Multiplication a(x) & b(x 0 ) a(x) & b(x 1 ) a(x) & b(x 2 ) a(x) & b(x 3 ) a(x) & b(x m-2 ) a(x) & b(x m-1 ) c(x) = a(x) ∙ b(x) a(x)b(x) ∙=
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Classic Polynomial Multiplication Gate count: m 2 AND gates (m-1) 2 XOR gates Longest path: 1 AND + log 2 (m) XOR & + + & & & & & & &
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Classic Karatsuba Multiplication a(x) + + A 0 ∙B c(x) = a(x) ∙ b(x) A1A1 A0A0 A 0 ∙B 0 (A 1 + A 0 ) ∙ (B 1 + B 0 ) A 1 ∙B 1 4 additions (XOR) + 3 multiplications per level (CPM: 3 additions + 4 multiplications) b(x) B1B1 B0B0
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Classic Karatsuba Multiplication Gate count: AND gates XOR gates Longest path: 1 AND + 3 log 2 m XOR &&&&&&&& 3 XORs each
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Iterative Karatsuba Multiplication Split factors in 4 segments A(x) = a3…a0 B(x) = b3…b0 Perform 9 partial multiplications Result is 8 segments C(x) = c7…c0
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Iterative Karatsuba Multiplication (2) Optimized aggregation plan Reduces number of XOR operations to 34 (instead of 40 for classic Karatsuba) Without additional costs – constant number of ANDs – constant longest path Can be applied recursively – 256 bit mul = 9 x 64 bit mul – 64 bit mul = 9 x 16 bit mul – 16 bit mul = 9 x 4 bit mul
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Comparison Bit sizeClassic PolynomialRAI Karatsuba XORANDXORAND 2144 (4) (24) (360) (3864) (12100) (37320)6561 9x Hybrid RAIK XORAND Hybrid RAIK is smallest polynomial multiplication unit BUT: CPM is faster Bit sizeXOR gates in longest path CPMHybrid RAIK
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Recursive combinatorial multiplication units Perform multiplication within one clock cycle Do not need state information Technical feasible up to 256 bit – huge complexity – high latency Practically questionable – Data transport/bus becomes bottleneck MUL 256 bit 16 ns AB C = A·B
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Iterative multiplication units More than one clock cycle per Multiplication Iterative unit embeds smaller recursive unit Highly regular structure – flexible – little overhead A B Selection Partial Multiplier Aggregation C 256 bit64 bit128 bit511 bit Control 9 times
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Iterative multiplication units 256 bit polynomial multipliers Confi- guration Cycles per Multiplication Size of embedded multiplier [Bit] Delay [ns] Silicon Area [mm 2 ] Energy per Multiplication [nWs] Combinatorial segment segment segment
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Set up an ECC accelerator design Asymmetric cryptography Trapdoor : Elliptic Curve Point Multiplication – one can compute: Q = kP – it is infeasible to determine k for given Q and P 283 bit –Bus –Registers –Alu Speed requirements 4 segment - Multiplier (72 bit embedded) Adapt control logic
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved ECC designs 163 – 571 bit Time per ECPM
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved ECC designs 163 – 571 bit Energy per ECPM and silicon area (IHP 0.25um CMOS)
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Conclusions Polynomial multiplication is the most challenging operation in the finite field: –executed 1500 times for one 233 bit ECPM –Most silicon area (70%) –Highest utilization (95%) Large combinatorial multiplier are feasible – hRAIK is the smallest – Classic polynomial is the fastest For ECC designs iterative Karatsuba approaches are well suited –Adaptable –Small –Energy efficient
IHP Im Technologiepark Frankfurt (Oder) Germany © All rights reserved Thank You Questions?