Presentation is loading. Please wait.

Presentation is loading. Please wait.

BCRYPT ECC-Day 2008 Requirements, Algorithms, Architectures The design space of ECC hardware.

Similar presentations


Presentation on theme: "BCRYPT ECC-Day 2008 Requirements, Algorithms, Architectures The design space of ECC hardware."— Presentation transcript:

1 BCRYPT ECC-Day 2008 Requirements, Algorithms, Architectures The design space of ECC hardware

2 2 Contents Applications of ECC Hardware Existing Solutions Design of ECC Hardware Details of ECC Hardware

3 3 Motivation ECC Hardware: What for?  Acceleration  Power efficiency  Implementation security  Side-channel resistance Competitors of ECC hardware  RSA hardware  Software implementation  Very fast on PC  But very slow on 8-bit µC Application: Server  High throughput  > 100 signatures / sec Application: Smartcard  Low latency  100 ms per signature  Low die size Application: RFID  Low power consumption  Low die size

4 4 ECC Hardware: Application Different Requirements for ECC applications  Smartcard  Acceptable latency  Implementation security  One EC curve sufficient  Server acceleration  Throughput (not latency)  Complete offloading

5 5 ECC Hardware: Server Acceleration GF(2 191 ) Hardware Accelerator  No GF(2 m ) support in processors (x86, PPC, …)  FPGA (programmable HW) as platform  Optimized for one curve  Complete EC operation in HW GF(2 191 ),f Clk = 66 MHz Multipl. [Radix] k·P [Takte] f CLK,max [MHz] k·P / sec [Ops] W = 8-Bit40.21074,61641 W = 16-Bit23.82071,32770 W = 32-Bit15.62370,44224

6 6 ECC Hardware: Smartcards Infineon SLE88CFX4000P SLE 88  32-Bit Platform  1408-Bit RSA co-processor RSA coprocessor  Local memory (704 bytes)  Scalable word width  Support for ECC: GF(p), GF(2 m ) Photo © Infineon Technologies

7 7 ECC Hardware: Smartcards NXP Smart MX P5CC072 Smart MX  8-bit smartcard  FameXE coprocessor FameXE  RSA, ECC: GF(p), GF(2 m )  2.5 kB local RAM  Word width < 4096 bits Photo © NXP

8 8 ECC Hardware: RFID Authentication Challenge-response authentication in RFID  Minimization of power consumption  Trading performance for power  Lower clock speed  Reduced word size

9 9 Hardware Design: CMOS Circuits CMOS  complementary metal-oxide semiconductor  Silicon circuit: up to 2*10 6 transistors per IC  Digital hardware: standard-cell circuits  Flipflops, full adders, muxes, gates: xor, and, …

10 10 Hardware Design: Top → Down Top-down design methodology  From specification  To working silicon  „First time right“ Design process  Refinement of models  Early estimates of  area, power, performance  Design iterations  when constraints are not met

11 11 Hardware Design: Design Flow Abstraction level and tools 1.System level  Defining functionality and constraints 2.Algorithmic level  High-level model 3.Architectural level  Paper + pencil 4.Register-transfer level  HDL description 5.Circuit level  Schematic + layout 12 34 5

12 12 Challenges of ECC Hardware EC Algorithms (ladder, EC point operation, point representation)  Defines number of multiplications  Defines storage requirements  Defines implementation security Multiplication  Determines performance Storage  Determines circuit size Control  Determines HDL complexity Do’s  Fix EC parameters  Fixed field size  Separate storage and computation Dont’s  Trading increased storage for lower computation  Optimization of negligible things  Inversion

13 13 Approaches to ECC Hardware EC-processor  Computing full point multiplication  No external interaction necessary Co-processor  Acceleration of finite-field operation  (Limited local memory)  External interaction needed  For point ladder and point operation ISE  Enhancement of existing instruction set  Acceleration of core operations  Multiply-Accumulate instructions  Support of polynomial arithmetic ?

14 14 Algorithms for ECC Bitserial multiplication  a in full precision; b bitwise  Faster: digit-serial (w bits of b) Modular reduction  Without division: NIST reduction  For trinomial / pentanomials  For Mersenne-like primes Montgomery Multiplication  Combines a*b and mod p  For arbitrary moduli MulSer(a, b) = a*b c = 0 for i = n-1 to 0 do c = 2·c + a·bi Pre-comp: R = 2 n+2 mod p, R 2 mod p, p’ = (-p) -1 mod 2 MonMul(a, b) = a·b·R -1 mod p c = 0 for i = 0 to n+1 do q = ((c 0 + a 0 ·b i ) mod 2)·p’ c = c + p·q + a·b i

15 15 Modular Multiplication in HW GF(2 191 ) Example  Digit-serial multiplication  c(x) = a(x)*m(x) mod f(x)  a(x): full precision  m(x): w-bit digits –Digit size w = 8, 16, 32  Alignment of intermediate result  Interleaved NIST reduction  small intermediate results  Squaring as own operation  Simple when irred. poly f(x) fixed

16 16 Multiplier in HW Partial product generation  a(x) * m i  Simply 191 AND gates  Amplification of m i crucial Aligning intermediate results  Simple: Fixed shift operation Accumulation of PP  Array or Tree adder Modular reduction  200 bits -> 191 bits

17 17 GF(p) Multiplier Radix-4 multiplier  A in full precision  B: 2 bits / cycle Montgomery multiplic.  Orup’s optimization Redundant number representation  Carry-save (CS)  More storage  Shorter crit. Path  Red2bin: CSA reuse Booth recoding (Benc)

18 18 Dual-field Support Application: e.g. ECDSA  ECC over GF(2 m )  Protocol: GF(p)  Mul, Add, Inv mod n –n … base point order Architecture ~GF(p) mult.  CSA for GF(p)  XOR for GF(2 m )  Carries blocked GF(p) versus GF(2 m )  GF(2 m ) faster …  GF(p) needs reg. C

19 19 ECC for RFID Problem: Very constrained power budget  P = E/t = I*U = f clk *C L *Vdd*Vdd  Problem analysis: where is power consumed?  Mostly for storage: clocking of registers New idea  Less registers; more comb. logic  Smaller datapaths  No computation at full wordsize  Adoption of ISE techniques –MAC-operation  Simple HDL implementation

20 20 Control Task of control logic  Generate control signals  For 60.000 – 6 Mio clock cycles Separation of control and datapath  Registered control signals  For performance and power efficiency  Avoiding critical path Hierarchical control  Complex control Options  Hardwired  State machine  Micro-program  Counter + ROM  Micro-controller  Software

21 21 Results Server Acceleration  For GF(2 191 )  Size: 1500 slices  On Xilinx FPGA  > 1000 EC ops / sec  @ 66 MHz clock Smartcard Coprocessor  Dual-Field capability  192-bit ECC: 23k GE  400k – 700k cycles  256-bit ECC: 31k GE  600k - 900k cycles ECC for RFID  163-bit ECC: 12k GE  400k cycles  192-bit ECC: 18k GE  850k cycles  Storage  75% of area  ISE-datapath  75% of power  Realistic on <130 nm CMOS  Power constraint ~15µA

22 22 Conclusions Different applications  require different ECC hardware Fixed parameters (EC params, field)  allow more efficient implementation  Squaring in GF(2 m )  NIST reduction ECC for RFID  Seems possible


Download ppt "BCRYPT ECC-Day 2008 Requirements, Algorithms, Architectures The design space of ECC hardware."

Similar presentations


Ads by Google