Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K.

Similar presentations


Presentation on theme: "Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K."— Presentation transcript:

1 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K. Sakiyama, L. Batina, B. Preneel, I. Verbauwhede Katholieke Universiteit Leuven / IBBT Department Electrical Engineering - ESAT/COSIC

2 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 2/26  Introduction  Curve-based Cryptography  HW/SW Partitioning  Superscalar Coprocessor  Results  Conclusions Overview

3 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 3/26 Introduction Motivation  High-speed curve-based cryptography in HW/SW co-design  How much instruction-level parallelism can we obtain from coprocessor instructions?  Performance improvement for different operation forms in datapath  AB+C mod P vs A(B+D)+C mod P,A,B,C,D,P: polynomials  Performance comparison three different curve-based cryptosystems  Which one is faster between ECC, HECC, ECC over a composite field?  Programmability and scalability  Programmable in order to support different cryptosystems?  Scalable in field sizes?

4 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 4/26 Introduction Target Architecture  Curve-based cryptography over binary fields  Hardware can be smaller and faster than prime field  ECC over a binary field, e.g. GF(2 163 )  HECC of genus 2 Field length can be shorter with a factor of 2, e.g. GF(2 83 )  ECC over a composite field Field length can be shorter with a factor of 2, e.g. GF ((2 83 ) 2 )  The datapath can be shared  Programmable coprocessor supporting three curve-based cryptography by defining coprocessor instruction(s)  (Coprocessor) instruction-level parallelism by superscalar

5 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 5/26  Introduction  Curve-based Cryptography  HW/SW Partitioning  Superscalar Coprocessor  Results  Conclusions Overview

6 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 6/26 Curve-based Cryptography HW/SW partitioning (1)  General hierarchy in coprocessor for curve-based cryptography Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Addition Finite Field Multiplication Finite Field Inversion HW Datapath SW or HW controller

7 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 7/26  Single instruction for all finite field operations  Fixed-cycle execution enables efficient implementation Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Addition Finite Field Multiplication Finite Field Inversion Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Operation E.g. AB+C mod P Finite Field Inversion Curve-based Cryptography Proposed Hierarchy (1) Single Instruction (Datapath) Conventional

8 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 8/26  (a) Building block: Regular XOR chains  (b) Scalable in digit size (d) and field size (k) by interconnecting several building blocks  We use MALU 83 (n=83, d=12) as building block  2xMALU 83 can be configured as 1xMALU 163 Curve-based Cryptography Modular Arithmetic Logic Unit (MALU)

9 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 9/26  Introduction  Curve-based Cryptography  HW/SW Partitioning  Superscalar Coprocessor  Results  Conclusions Overview

10 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 10/26 HW/SW Partitioning TYPE I: Smallest implementation (baseline) 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O SRAM MALU 83 Data Bus DBC Coprocessor IBC

11 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 11/26 HW/SW Partitioning TYPE II: TYPE I +  -code RAM IBC 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O SRAM  -code RAM Data Bus DBC FSM Coprocessor MALU 83

12 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 12/26 HW/SW Partitioning TYPE III: TYPE I + Coprocessor Memory 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O Coprocessor Memory SRAM MALU 83 Data Bus DBC Coprocessor IBC

13 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 13/26 HW/SW Partitioning TYPE IV: TYPE I + Copro. Mem.&  -code RAM 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O Coprocessor Memory SRAM MALU 83 Data Bus DBC IBC  -code RAM FSM Coprocessor

14 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 14/26 HW/SW Partitioning Co-design flow with GEZEL Partitioning of functions C/C++ codes for PKCs C/C++ codes & H/W behavior blocks w/interface GEZEL FDL codes Cross compileSynthesis C/C++ codes w/physical memory map ARM (SW)Co-processor (HW) Cycle-true sim. (GEZEL) VHDL codesProgram codes

15 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 15/26 HW/SW Partitioning Result: Vertical Exploration of System  HECC Performance for different HW/SW partitioning (Performance: Point/Divisor multiplication)

16 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 16/26  Introduction  Curve-based Cryptography  HW/SW Partitioning  Superscalar Coprocessor  Results  Conclusions Overview

17 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 17/26  Multiple Modular Arithmetic Logic Units (MALUs) in coprocessor Finite Field Operation E.g. AB+C mod P Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Inversion Finite Field Operation E.g. AB+C mod P Finite Field Operation E.g. AB+C mod P Finite Field Operation E.g. AB+C mod P … Multiple MALUs Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Operation E.g. AB+C mod P Finite Field Inversion Single MALU Superscalar Coprocessor Proposed Hierarchy (2)

18 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 18/26 IBC 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O MALU 83 Coprocessor Memory SRAM MALU 83 IQB  -code RAM Data Bus Buffer Full DBC FSM Coprocessor Superscalar Coprocessor Parallel Processing Architecture (TYPE IV-based)

19 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 19/26 Superscalar Coprocessor Horizontal Exploration of System  Performance of ECC and HECC

20 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 20/26  Introduction  Curve-based Cryptography  HW/SW Partitioning  Superscalar Coprocessor  Results  Conclusions Overview

21 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 21/26 Results Performance for ECC over GF(2 83 )  Fastest of three  x1.8 speed-up by 2-way superscaling (ILP D P=6) with A(B+D)+C  Still more improvement is possible by adding MALUs AB+CA(B+D)+C

22 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 22/26 Results Performance of HECC over GF(2 83 )  Faster than ECC over a composite field  x2.7 speed-up by 4-way superscaling (ILP D P=5) with A(B+D)+C  Less improvement as increasing # of MALU AB+CA(B+D)+C

23 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 23/26 Results Performance for ECC over GF((2 83 ) 2 )  Slowest of three  x2.5 speed-up by 4-way superscaling (ILP D P=6) with A(B+D)+C  Less improvement as increasing # of MALU AB+CA(B+D)+C

24 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 24/26 Results Comparison of ECC/HECC implementations on FPGAs [11] T. Wollinger, PhD thesis, 2004. [13] G. Orlando and C. Paar, CHES 00. [14] N. Gura et al., CHES02. [29] Nazar A. Saqib et al., International Journal of Embedded Systems 2005

25 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 25/26  Performance improvement / Comparison  ECC was improved by a factor of 1.8 (2-way)  HECC (genus 2) was improved by a factor of 2.7 (4-way)  ECC over a composite field was improved by a factor of 2.5 (4-way)  A(B+D)+C offers better performance than AB+C  ECC is the fastest in this case study  Programmability & flexibility  Support three different curve-based cryptosystems over a binary field  Arbitrary irreducible polynomial  Field size up to 332 bits by using 4xMALU 83 Conclusions

26 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 26/26 Thank you!

27 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 27/26 Parallel issue of instructions Case of using 4 MALUs  IF/D : Instruction Fetch & Decode  R_ : Read operands (dependent on the type of operation)  EX : Execution (dependent on MALU configuration, k & d)  W_ : Write (dependent on # of instructions issued in parallel)

28 Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 28/26 Parallel issue of instructions Out-of-order Execution  Check RAW (Read After Write Dependency) for in-/out-of- order execution


Download ppt "Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K."

Similar presentations


Ads by Google