Download presentation
Presentation is loading. Please wait.
Published byBernice Newman Modified over 9 years ago
1
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K. Sakiyama, L. Batina, B. Preneel, I. Verbauwhede Katholieke Universiteit Leuven / IBBT Department Electrical Engineering - ESAT/COSIC
2
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 2/26 Introduction Curve-based Cryptography HW/SW Partitioning Superscalar Coprocessor Results Conclusions Overview
3
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 3/26 Introduction Motivation High-speed curve-based cryptography in HW/SW co-design How much instruction-level parallelism can we obtain from coprocessor instructions? Performance improvement for different operation forms in datapath AB+C mod P vs A(B+D)+C mod P,A,B,C,D,P: polynomials Performance comparison three different curve-based cryptosystems Which one is faster between ECC, HECC, ECC over a composite field? Programmability and scalability Programmable in order to support different cryptosystems? Scalable in field sizes?
4
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 4/26 Introduction Target Architecture Curve-based cryptography over binary fields Hardware can be smaller and faster than prime field ECC over a binary field, e.g. GF(2 163 ) HECC of genus 2 Field length can be shorter with a factor of 2, e.g. GF(2 83 ) ECC over a composite field Field length can be shorter with a factor of 2, e.g. GF ((2 83 ) 2 ) The datapath can be shared Programmable coprocessor supporting three curve-based cryptography by defining coprocessor instruction(s) (Coprocessor) instruction-level parallelism by superscalar
5
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 5/26 Introduction Curve-based Cryptography HW/SW Partitioning Superscalar Coprocessor Results Conclusions Overview
6
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 6/26 Curve-based Cryptography HW/SW partitioning (1) General hierarchy in coprocessor for curve-based cryptography Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Addition Finite Field Multiplication Finite Field Inversion HW Datapath SW or HW controller
7
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 7/26 Single instruction for all finite field operations Fixed-cycle execution enables efficient implementation Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Addition Finite Field Multiplication Finite Field Inversion Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Operation E.g. AB+C mod P Finite Field Inversion Curve-based Cryptography Proposed Hierarchy (1) Single Instruction (Datapath) Conventional
8
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 8/26 (a) Building block: Regular XOR chains (b) Scalable in digit size (d) and field size (k) by interconnecting several building blocks We use MALU 83 (n=83, d=12) as building block 2xMALU 83 can be configured as 1xMALU 163 Curve-based Cryptography Modular Arithmetic Logic Unit (MALU)
9
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 9/26 Introduction Curve-based Cryptography HW/SW Partitioning Superscalar Coprocessor Results Conclusions Overview
10
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 10/26 HW/SW Partitioning TYPE I: Smallest implementation (baseline) 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O SRAM MALU 83 Data Bus DBC Coprocessor IBC
11
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 11/26 HW/SW Partitioning TYPE II: TYPE I + -code RAM IBC 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O SRAM -code RAM Data Bus DBC FSM Coprocessor MALU 83
12
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 12/26 HW/SW Partitioning TYPE III: TYPE I + Coprocessor Memory 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O Coprocessor Memory SRAM MALU 83 Data Bus DBC Coprocessor IBC
13
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 13/26 HW/SW Partitioning TYPE IV: TYPE I + Copro. Mem.& -code RAM 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O Coprocessor Memory SRAM MALU 83 Data Bus DBC IBC -code RAM FSM Coprocessor
14
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 14/26 HW/SW Partitioning Co-design flow with GEZEL Partitioning of functions C/C++ codes for PKCs C/C++ codes & H/W behavior blocks w/interface GEZEL FDL codes Cross compileSynthesis C/C++ codes w/physical memory map ARM (SW)Co-processor (HW) Cycle-true sim. (GEZEL) VHDL codesProgram codes
15
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 15/26 HW/SW Partitioning Result: Vertical Exploration of System HECC Performance for different HW/SW partitioning (Performance: Point/Divisor multiplication)
16
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 16/26 Introduction Curve-based Cryptography HW/SW Partitioning Superscalar Coprocessor Results Conclusions Overview
17
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 17/26 Multiple Modular Arithmetic Logic Units (MALUs) in coprocessor Finite Field Operation E.g. AB+C mod P Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Inversion Finite Field Operation E.g. AB+C mod P Finite Field Operation E.g. AB+C mod P Finite Field Operation E.g. AB+C mod P … Multiple MALUs Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Operation E.g. AB+C mod P Finite Field Inversion Single MALU Superscalar Coprocessor Proposed Hierarchy (2)
18
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 18/26 IBC 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O MALU 83 Coprocessor Memory SRAM MALU 83 IQB -code RAM Data Bus Buffer Full DBC FSM Coprocessor Superscalar Coprocessor Parallel Processing Architecture (TYPE IV-based)
19
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 19/26 Superscalar Coprocessor Horizontal Exploration of System Performance of ECC and HECC
20
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 20/26 Introduction Curve-based Cryptography HW/SW Partitioning Superscalar Coprocessor Results Conclusions Overview
21
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 21/26 Results Performance for ECC over GF(2 83 ) Fastest of three x1.8 speed-up by 2-way superscaling (ILP D P=6) with A(B+D)+C Still more improvement is possible by adding MALUs AB+CA(B+D)+C
22
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 22/26 Results Performance of HECC over GF(2 83 ) Faster than ECC over a composite field x2.7 speed-up by 4-way superscaling (ILP D P=5) with A(B+D)+C Less improvement as increasing # of MALU AB+CA(B+D)+C
23
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 23/26 Results Performance for ECC over GF((2 83 ) 2 ) Slowest of three x2.5 speed-up by 4-way superscaling (ILP D P=6) with A(B+D)+C Less improvement as increasing # of MALU AB+CA(B+D)+C
24
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 24/26 Results Comparison of ECC/HECC implementations on FPGAs [11] T. Wollinger, PhD thesis, 2004. [13] G. Orlando and C. Paar, CHES 00. [14] N. Gura et al., CHES02. [29] Nazar A. Saqib et al., International Journal of Embedded Systems 2005
25
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 25/26 Performance improvement / Comparison ECC was improved by a factor of 1.8 (2-way) HECC (genus 2) was improved by a factor of 2.7 (4-way) ECC over a composite field was improved by a factor of 2.5 (4-way) A(B+D)+C offers better performance than AB+C ECC is the fastest in this case study Programmability & flexibility Support three different curve-based cryptosystems over a binary field Arbitrary irreducible polynomial Field size up to 332 bits by using 4xMALU 83 Conclusions
26
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 26/26 Thank you!
27
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 27/26 Parallel issue of instructions Case of using 4 MALUs IF/D : Instruction Fetch & Decode R_ : Read operands (dependent on the type of operation) EX : Execution (dependent on MALU configuration, k & d) W_ : Write (dependent on # of instructions issued in parallel)
28
Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 28/26 Parallel issue of instructions Out-of-order Execution Check RAW (Read After Write Dependency) for in-/out-of- order execution
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.