Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K.

Slides:



Advertisements
Similar presentations
Click to edit Master title style KATHOLIEKE UNIVERSITEIT LEUVEN | COSIC 1 Compact Implementations for RFID and Sensor Nodes L. Batina, K. Sakiyama and.
Advertisements

14. Aug Towards Practical Lattice-Based Public-Key Encryption on Reconfigurable Hardware SAC 2013, Burnaby, Canada Thomas Pöppelmann and Tim Güneysu.
Are standards compliant Elliptic Curve Cryptosystems feasible on RFID?
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
CryptoBlaze: 8-Bit Security Microcontroller. Quick Start Training Agenda What is CryptoBlaze? KryptoKit GF(2 m ) Multiplier Customize CryptoBlaze Attacks.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 9 Programmable Configurations Read Only Memory (ROM) – –a fixed array of AND gates.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
LOGO HW/SW Co-Verification -- Mentor Graphics® Seamless CVE By: Getao Liang March, 2006.
A Handel-C Implementation of a Computationally Intensive Problem in GF(3) Joey C. Libby, Jonathan P. Lutes, and Kenneth B. Kent The Handel-C Language Handel-C.
1 EFFICIENT ADDERS TO SPEEDUP MODULAR MULTIPLICATION FOR CRYPTOGRAPHY Adnan Gutub Hassan Tahhan Computer Engineering Department KFUPM, Dhahran, SAUDI ARABIA.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
MEMOCODE 2007 HW/SW Co-design Contest Documentation of the submission by Eric Simpson Pengyuan Yu Sumit Ahuja Sandeep Shukla Patrick Schaumont Electrical.
A Dual Field Elliptic Curve Cryptographic Processor Laboratory for Reliable Computing (LaRC) Electrical Engineering Department National Tsing Hua University.
SCOTT MILLER, AMBROSE CHU, MIHAI SIMA, MICHAEL MCGUIRE ReCoEng Lab DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING UNIVERSITY OF.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Define Embedded Systems Small (?) Application Specific Computer Systems.
Configurable System-on-Chip: Xilinx EDK
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Dynamically Reconfigurable Architectures: An Overview Juanjo Noguera Dept. Computer Architecture (DAC-UPC)
CHES20021 Scalable and Unified Hardware to Compute Montgomery Inverse in GF(p) and GF(2 n ) A. Gutub, A. Tenca, E. Savas, and C. Koc Information Security.
IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
1 An Elliptic Curve Processor Suitable for RFID-Tags L. Batina 1, J. Guajardo 2, T. Kerins 2, N. Mentens 1, P. Tuyls 2 and I. Verbauwhede 1 Katholieke.
Processor Types And Instruction Sets Barak Perelman CS147 Prof. Lee.
HW/SW CODESIGN OF THE MPEG-2 VIDEO DECODER Matjaz Verderber, Andrej Zemva, Andrej Trost University of Ljubljana Faculty of Electrical Engineering Trzaska.
HW/SW CODESIGN OF THE MPEG-2 VIDEO DECODER Matjaz Verderber, Andrej Zemva, Andrej Trost University of Ljubljana Faculty of Electrical Engineering Trzaska.
Digital signature using MD5 algorithm Hardware Acceleration
Chapter 1 Introduction. Computer Architecture selecting and interconnecting hardware components to create computers that meet functional, performance.
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
Levels of Architecture & Language CHAPTER 1 © copyright Bobby Hoggard / material may not be redistributed without permission.
Computer Architecture and Organization Introduction.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Paper Review: XiSystem - A Reconfigurable Processor and System
Automated Design of Custom Architecture Tulika Mitra
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.
FPT 2006 Bangkok A Novel Memory Architecture for Elliptic Curve Cryptography with Parallel Modular Multipliers Ralf Laue, Sorin A. Huss Integrated Circuits.
Hyperelliptic Curve Coprocessors On a FPGA HoWon Kim ETRI, Korea.
Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative.
Array Synthesis in SystemC Hardware Compilation Authors: J. Ditmar and S. McKeever Oxford University Computing Laboratory, UK Conference: Field Programmable.
Computer Organization and Design Computer Abstractions and Technology
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
Electronic Analog Computer Dr. Amin Danial Asham by.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Overview von Neumann Architecture Computer component Computer function
Ee314 Microprocessor Systems Dr. Mircea DABACAN Electrical Engineering & Computer Science Dept., Washington State University Office: EE/ME 504 Phone:
Lecture5 – Introduction to Cryptography 3/ Implementation Rice ELEC 528/ COMP 538 Farinaz Koushanfar Spring 2009.
Introduction to Computer Organization Pipelining.
Hardware Implementations of Finite Field Primitives
Winter-Spring 2001Codesign of Embedded Systems1 Essential Issues in Codesign: Architectures Part of HW/SW Codesign of Embedded Systems Course (CE )
1 TM 1 Embedded Systems Lab./Honam University ARM Microprocessor Programming Model.
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
Controller Implementation
Supported in part by NIST/U.S. Department of Commerce
Instructor: Dr. Phillip Jones
William Stallings Computer Organization and Architecture 7th Edition
CDA 3101 Spring 2016 Introduction to Computer Organization
Elliptic Curve Cryptography over GF(2m) on a Reconfigurable Computer:
Dynamically Reconfigurable Architectures: An Overview
EFFICIENT ADDERS TO SPEEDUP MODULAR MULTIPLICATION FOR CRYPTOGRAPHY
Computer Evolution and Performance
Chapter Four The Processor: Datapath and Control
CPU Structure CPU must:
Accelerating Regular Path Queries using FPGA
Presentation transcript:

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 1/26 Superscalar Coprocessor for High-speed Curve-based Cryptography K. Sakiyama, L. Batina, B. Preneel, I. Verbauwhede Katholieke Universiteit Leuven / IBBT Department Electrical Engineering - ESAT/COSIC

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 2/26  Introduction  Curve-based Cryptography  HW/SW Partitioning  Superscalar Coprocessor  Results  Conclusions Overview

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 3/26 Introduction Motivation  High-speed curve-based cryptography in HW/SW co-design  How much instruction-level parallelism can we obtain from coprocessor instructions?  Performance improvement for different operation forms in datapath  AB+C mod P vs A(B+D)+C mod P,A,B,C,D,P: polynomials  Performance comparison three different curve-based cryptosystems  Which one is faster between ECC, HECC, ECC over a composite field?  Programmability and scalability  Programmable in order to support different cryptosystems?  Scalable in field sizes?

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 4/26 Introduction Target Architecture  Curve-based cryptography over binary fields  Hardware can be smaller and faster than prime field  ECC over a binary field, e.g. GF(2 163 )  HECC of genus 2 Field length can be shorter with a factor of 2, e.g. GF(2 83 )  ECC over a composite field Field length can be shorter with a factor of 2, e.g. GF ((2 83 ) 2 )  The datapath can be shared  Programmable coprocessor supporting three curve-based cryptography by defining coprocessor instruction(s)  (Coprocessor) instruction-level parallelism by superscalar

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 5/26  Introduction  Curve-based Cryptography  HW/SW Partitioning  Superscalar Coprocessor  Results  Conclusions Overview

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 6/26 Curve-based Cryptography HW/SW partitioning (1)  General hierarchy in coprocessor for curve-based cryptography Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Addition Finite Field Multiplication Finite Field Inversion HW Datapath SW or HW controller

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 7/26  Single instruction for all finite field operations  Fixed-cycle execution enables efficient implementation Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Addition Finite Field Multiplication Finite Field Inversion Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Operation E.g. AB+C mod P Finite Field Inversion Curve-based Cryptography Proposed Hierarchy (1) Single Instruction (Datapath) Conventional

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 8/26  (a) Building block: Regular XOR chains  (b) Scalable in digit size (d) and field size (k) by interconnecting several building blocks  We use MALU 83 (n=83, d=12) as building block  2xMALU 83 can be configured as 1xMALU 163 Curve-based Cryptography Modular Arithmetic Logic Unit (MALU)

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/2006 9/26  Introduction  Curve-based Cryptography  HW/SW Partitioning  Superscalar Coprocessor  Results  Conclusions Overview

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 HW/SW Partitioning TYPE I: Smallest implementation (baseline) 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O SRAM MALU 83 Data Bus DBC Coprocessor IBC

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 HW/SW Partitioning TYPE II: TYPE I +  -code RAM IBC 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O SRAM  -code RAM Data Bus DBC FSM Coprocessor MALU 83

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 HW/SW Partitioning TYPE III: TYPE I + Coprocessor Memory 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O Coprocessor Memory SRAM MALU 83 Data Bus DBC Coprocessor IBC

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 HW/SW Partitioning TYPE IV: TYPE I + Copro. Mem.&  -code RAM 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O Coprocessor Memory SRAM MALU 83 Data Bus DBC IBC  -code RAM FSM Coprocessor

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 HW/SW Partitioning Co-design flow with GEZEL Partitioning of functions C/C++ codes for PKCs C/C++ codes & H/W behavior blocks w/interface GEZEL FDL codes Cross compileSynthesis C/C++ codes w/physical memory map ARM (SW)Co-processor (HW) Cycle-true sim. (GEZEL) VHDL codesProgram codes

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 HW/SW Partitioning Result: Vertical Exploration of System  HECC Performance for different HW/SW partitioning (Performance: Point/Divisor multiplication)

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26  Introduction  Curve-based Cryptography  HW/SW Partitioning  Superscalar Coprocessor  Results  Conclusions Overview

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26  Multiple Modular Arithmetic Logic Units (MALUs) in coprocessor Finite Field Operation E.g. AB+C mod P Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Inversion Finite Field Operation E.g. AB+C mod P Finite Field Operation E.g. AB+C mod P Finite Field Operation E.g. AB+C mod P … Multiple MALUs Point/Divisor Multiplication Point/Divisor Addition Point/Divisor Doubling Finite Field Operation E.g. AB+C mod P Finite Field Inversion Single MALU Superscalar Coprocessor Proposed Hierarchy (2)

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 IBC 32-bit instructions 32-bit data Instruction Bus Program ROM Main CPU Memory Mapped I/O MALU 83 Coprocessor Memory SRAM MALU 83 IQB  -code RAM Data Bus Buffer Full DBC FSM Coprocessor Superscalar Coprocessor Parallel Processing Architecture (TYPE IV-based)

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 Superscalar Coprocessor Horizontal Exploration of System  Performance of ECC and HECC

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26  Introduction  Curve-based Cryptography  HW/SW Partitioning  Superscalar Coprocessor  Results  Conclusions Overview

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 Results Performance for ECC over GF(2 83 )  Fastest of three  x1.8 speed-up by 2-way superscaling (ILP D P=6) with A(B+D)+C  Still more improvement is possible by adding MALUs AB+CA(B+D)+C

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 Results Performance of HECC over GF(2 83 )  Faster than ECC over a composite field  x2.7 speed-up by 4-way superscaling (ILP D P=5) with A(B+D)+C  Less improvement as increasing # of MALU AB+CA(B+D)+C

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 Results Performance for ECC over GF((2 83 ) 2 )  Slowest of three  x2.5 speed-up by 4-way superscaling (ILP D P=6) with A(B+D)+C  Less improvement as increasing # of MALU AB+CA(B+D)+C

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 Results Comparison of ECC/HECC implementations on FPGAs [11] T. Wollinger, PhD thesis, [13] G. Orlando and C. Paar, CHES 00. [14] N. Gura et al., CHES02. [29] Nazar A. Saqib et al., International Journal of Embedded Systems 2005

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26  Performance improvement / Comparison  ECC was improved by a factor of 1.8 (2-way)  HECC (genus 2) was improved by a factor of 2.7 (4-way)  ECC over a composite field was improved by a factor of 2.5 (4-way)  A(B+D)+C offers better performance than AB+C  ECC is the fastest in this case study  Programmability & flexibility  Support three different curve-based cryptosystems over a binary field  Arbitrary irreducible polynomial  Field size up to 332 bits by using 4xMALU 83 Conclusions

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 Thank you!

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 Parallel issue of instructions Case of using 4 MALUs  IF/D : Instruction Fetch & Decode  R_ : Read operands (dependent on the type of operation)  EX : Execution (dependent on MALU configuration, k & d)  W_ : Write (dependent on # of instructions issued in parallel)

Workshop on Cryptographic Hardware and Embedded Systems (CHES 2006) 13/10/ /26 Parallel issue of instructions Out-of-order Execution  Check RAW (Read After Write Dependency) for in-/out-of- order execution