Unified Architectures for Efficient and Compact Crypto-Processing

Slides:



Advertisements
Similar presentations
Key Management Nick Feamster CS 6262 Spring 2009.
Advertisements

Are standards compliant Elliptic Curve Cryptosystems feasible on RFID?
Implementing Cryptographic Pairings on Smartcards Mike Scott.
UNIVERSITY OF MASSACHUSETTS Dept
November 1, 2006Sarah Wahl / Graduate Student UCCS1 Public Key Infrastructure By Sarah Wahl.
Dr. Lo’ai Tawalbeh Fall 2005 Chapter 10 – Key Management; Other Public Key Cryptosystems Dr. Lo’ai Tawalbeh Computer Engineering Department Jordan University.
Cryptography1 CPSC 3730 Cryptography Chapter 9 Public Key Cryptography and RSA.
CHES20021 Scalable and Unified Hardware to Compute Montgomery Inverse in GF(p) and GF(2 n ) A. Gutub, A. Tenca, E. Savas, and C. Koc Information Security.
Fall 2010/Lecture 311 CS 426 (Fall 2010) Public Key Encryption and Digital Signatures.
IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
M. Interleaving Montgomery High-Radix Comparison Improvement Adders CLA CSK Comparison Conclusion Improving Cryptographic Architectures by Adopting Efficient.
ELECTRONIC PAYMENT SYSTEMSFALL 2001COPYRIGHT © 2001 MICHAEL I. SHAMOS Electronic Payment Systems Lecture 6 Epayment Security II.
ASYMMETRIC CIPHERS.
-Anusha Uppaluri.  ECC- A set of algorithms for key generation, encryption and decryption (public key encryption technique)  ECC was introduced by Victor.
Elliptic Curve Cryptography
Lecture 10: Elliptic Curve Cryptography Wayne Patterson SYCS 653 Fall 2009.
CPSC 3730 Cryptography and Network Security
Part.7.1 Copyright 2007 Koren & Krishna, Morgan-Kaufman FAULT TOLERANT SYSTEMS Part 7 - Coding.
Applied Cryptography Spring 2015 Asymmetric ciphers.
Cryptography and Network Security Chapter 10 Fifth Edition by William Stallings Lecture slides by Lawrie Brown.
Data Security and Encryption (CSE348) 1. Lecture # 12 2.
Chapter 3 (B) – Key Management; Other Public Key Cryptosystems.
Cryptography and Network Security Key Management and Other Public Key Cryptosystems.
Public Key Cryptosystem Introduced in 1976 by Diffie and Hellman [2] In PKC different keys are used for encryption and decryption 1978: First Two Implementations.
Lecture 9 Elliptic Curves. In 1984, Hendrik Lenstra described an ingenious algorithm for factoring integers that relies on properties of elliptic curves.
Lecture 11: Elliptic Curve Cryptography Wayne Patterson SYCS 653 Fall 2008.
1 Network Security Dr. Syed Ismail Shah
CSCE 715: Network Systems Security Chin-Tser Huang University of South Carolina.
1 Cryptanalysis Lab Elliptic Curves. Cryptanalysis Lab Elliptic Curves 2 Outline [1] Elliptic Curves over R [2] Elliptic Curves over GF(p) [3] Properties.
Introduction to Elliptic Curve Cryptography CSCI 5857: Encoding and Encryption.
Hardware Implementations of Finite Field Primitives
1 The RSA Algorithm Rocky K. C. Chang February 23, 2007.
Motivation Basis of modern cryptosystems
Key Management public-key encryption helps address key distribution problems have two aspects of this: – distribution of public keys – use of public-key.
Public Key Cryptography. Asymmetric encryption is a form of cryptosystem in which Encryption and decryption are performed using the different keys—one.
Lecture 5 Asymmetric Cryptography. Private-Key Cryptography Traditional private/secret/single key cryptography uses one key Shared by both sender and.
Information Security Lab. Dept. of Computer Engineering 251/ 278 PART II Asymmetric Ciphers Key Management; Other CHAPTER 10 Key Management; Other Public.
Asymmetric-Key Cryptography
Somet things you should know about digital arithmetic:
Practical Aspects of Modern Cryptography
CSE565: Computer Security Lecture 7 Number Theory Concepts
Public Key Cryptosystem
Asymmetric-Key Cryptography
Network Security Design Fundamentals Lecture-13
D. Cheung – IQC/UWaterloo, Canada D. K. Pradhan – UBristol, UK
UNIVERSITY OF MASSACHUSETTS Dept
Public Key Encryption and Digital Signatures
Introduction to Number Theory
Attribute Based Encryption
Elliptic Curves.
Basics Combinational Circuits Sequential Circuits Ahmad Jawdat
Unconventional Fixed-Radix Number Systems
Arithmetic Circuits (Part I) Randy H
Cryptography and Network Security
EFFICIENT ADDERS TO SPEEDUP MODULAR MULTIPLICATION FOR CRYPTOGRAPHY
The Application of Elliptic Curves Cryptography in Embedded Systems
UNIVERSITY OF MASSACHUSETTS Dept
ARM implementation the design is divided into a data path section that is described in register transfer level (RTL) notation control section that is viewed.
Lecture 3.1: Public Key Cryptography I
UNIVERSITY OF MASSACHUSETTS Dept
UNIVERSITY OF MASSACHUSETTS Dept
Introduction to Elliptic Curve Cryptography
UNIVERSITY OF MASSACHUSETTS Dept
CSCE 715: Network Systems Security
Symmetric-Key Cryptography
Cryptology Design Fundamentals
Introduction to Cryptography
Network Security Design Fundamentals Lecture-13
UNIVERSITY OF MASSACHUSETTS Dept
Mathematical Background: Extension Finite Fields
Presentation transcript:

Unified Architectures for Efficient and Compact Crypto-Processing Erkay Savaş Sabancı University 11/15/2018 Erkay Savaş

Outline Research Motivation Public Key Cryptography Unified Arithmetic High-Radix Multiplication Dual-Radix Multiplication Support for GF(3n) Arithmetic Implementation Results Future Research 11/15/2018 Erkay Savaş

Motivation Compatibility Saving in Area Algorithm Agility support for fast arithmetic in different finite fields and groups Saving in Area Improve {time  area} metric Algorithm Agility NTRU  ECC 11/15/2018 Erkay Savaş

Public Key Cryptography (PKC) Each user has a pair of keys: Private Key - known only to the owner Public Key - known to everyone in the systems with assurance Encryption: Encryption with the Public Key of the receiver Decryption: Only the receiver can decrypt the message by her/his Private Key 11/15/2018 Erkay Savaş

Public Key Cryptography in Use RSA, Rabin’s scheme Integer factorization, Square root of modulo a composite number Discrete Logarithm Based Algorithms Diffie-Helman Key Exchange, El Gamal Elliptic curve DH Key Exchange, ECDSA Discrete logarithm over elliptic curves IBE pairings over elliptic curve points 11/15/2018 Erkay Savaş

RSA Most popular PKC Invented by Rivest/Shamir/Adleman in 1977 at MIT. Its patent expired in 2000. Based on Integer Factorization problem Each user has public and private key pair. 11/15/2018 Erkay Savaş

RSA Encryption & Decryption Encryption done by using public key y  xe mod n, where x, y < n Decryption done by using private key x  yd mod n 11/15/2018 Erkay Savaş

DL Based Cryptosystems Fundamental operation gx mod p, where x, g < p and g is primitive 11/15/2018 Erkay Savaş

Elliptic Curve Cryptography 1/2 Emerging public key cryptography standard for constrained devices. 160 bit key length is equivalent in cryptographic strength to 1024-bit RSA. 313 bit ECC is equivalent to 4096 bit RSA As algebraic/geometric entities have been studied extensively for the past 150 years. Rich and deep theory suitable to cryptography First proposed for cryptographic usage in 1985 independently by Neal Koblitz and Victor Miller 11/15/2018 Erkay Savaş

Elliptic Curve Cryptography 2/2 Dominant fundamental operations Multiplication in GF(q) where q = pk and p is prime Alternatives GF(p) k = 1 GF(2k) p = 2 GF(pk) GF(3k) p = 3 11/15/2018 Erkay Savaş

Identity Based Encryption (IBE) Public key can be any string e-mail address, name, etc. No need for certificates Anonymity achieved users can choose any public key without revealing their ID It can easily change it 11/15/2018 Erkay Savaş

IBE – Bilinear Mapping e(xP, yQ) = e(P, Q)xy = e(yP, xQ) = g g is in an (extension of) the underlying field. Bilinear mapping over elliptic curves Weil pairing Tate pairing Resource consuming Most efficient bilinear mappings defined on curves over GF(3k) 11/15/2018 Erkay Savaş

An Introduction to Unified Arithmetic Types of finite fields are heavily used Prime fields, GF(p) Binary extension fields, GF(2k) Ternary extension fields GF(3k) (recently, due to IBE schemes) These finite fields feature dissimilar properties Different implementations on specialized hardware 11/15/2018 Erkay Savaş

Unified Arithmetic Unified hardware design methodology requires A single (unified) datapath A single (unified) control Insignificant overhead in the area Insignificant overhead in the time complexity (e.g. critical path delay) Good {timearea} metric 11/15/2018 Erkay Savaş

Unified Arithmetic (GF(p) + GF(2k)) A unified hardware design methodology for both field is possible since: the elements of either field are represented using almost the same data structures in digital systems the algorithms for basic arithmetic operations in both fields have structural similarities (i.e. the steps of the algorithms are almost identical) Hence, eventually unified arithmetic is possible 11/15/2018 Erkay Savaş

Finite Field Operations in ECC Addition in GF(p) and GF(2k) Relatively inexpensive in area and time complexity Multiplicative inversion in GF(p) and GF(2k) Prohibitively expensive in terms of time Possible to avoid some of them Multiplication in GF(p) and GF(2k) Expensive in terms of time and area Usually most important operation Our focus 11/15/2018 Erkay Savaş

Montgomery Multiplication Very efficient way of doing multiplication in GF(p) and GF(2k) (now also in GF(3k)) Faster (replaces division by shifts) Suitable for unified design Suitable for scalable design Highly parallel Suitable for pipelining 11/15/2018 Erkay Savaş

Montgomery Multiplication Definition: Given a, b  GF(p), MonMul(a, b) = a·b·R-1 mod p, where R = 2k mod p and k = log2p. Algorithm c := 0 for i = 0 to k-1 c := (c + ai · b) c := (c + c0 · p)/2 if c > p then c := c-p (final subtraction) 11/15/2018 Erkay Savaş

Algorithm for GF(2k) Input : a(x), b(x)  GF(2k), p(x) and k Output: c(x) = a(x)·b(x)·xk GF(2k) c(x) := 0 for i = 0 to k-1 c(x) := (c(x)  ai · b(x)) c(x) := (c(x)  c0 · p(x))/x No final subtraction Note that c/2 and c(x)/x are implemented in an identical way in SW and HW 11/15/2018 Erkay Savaş

Representation Addition Unified addition Carry-save representation Atomic operation: multiplication is performed as a repeated addition Unified addition most efficient when carry-save representation is used for elements of GF(p) Carry-save representation an integer is represented as the sum of two other integers x := xs + xc (sum and carry parts, resp.) 11/15/2018 Erkay Savaş

Scalability Original Montgomery multiplication algorithm performs full-precision integer additions Not scalable Instead, long integers are divided into words Addition of words are handled separately on word adders. Choice of word length depends on the precision, area and speed requirements 11/15/2018 Erkay Savaş

Word-Based Multiplication ai b(j+1) p(j+1) c(j+1) PUi+1 ai+1 b(j) p(j) c(j) b(j) p(j) c(j) PUi c(j+1)w-1 c(j)w-1 c(j+1)1 c(j)1 c(j+1)0 c(j)0 c(j) 11/15/2018 Erkay Savaş

Dependency Graph 11/15/2018 Erkay Savaş

Processing Unit (PU) with w=2 C1(j) C0(j) Dual-Field Adder Dual-Field Adder FSEL 11/15/2018 Erkay Savaş

Dual-Field Adder (DFA) 1/2 Almost identical to a full-adder (FA) Difference it has and additional (control) input (FSEL) which suppress the carry output of the adder when it is set to logic-0 Namely, when FSEL = 0 then the adder operates in GF(2k), otherwise it becomes a regular FA 11/15/2018 Erkay Savaş

DFA 2/2 B S A C FSEL Cout 11/15/2018 Erkay Savaş

Pipeline Organization with two PUs RAM-b RAM-p RAM-a SR-a PU-1 PU-2 SR-C s: the number of PUs 11/15/2018 Erkay Savaş

Total Computation Time (in clock cycles) w: word size, k: precision, e := k/w, s: the number of PUs 11/15/2018 Erkay Savaş

Example Execution Times Example: k = 1024, w = 32 s = 17  T = 2105 s = 15  T = 2305 s = 10  T = 3415 s = 1  T = 33792 Example: k = 2048, w = 32 s = 33  T = 4221 s = 30  T = 4543 s = 10  T = 13343 s = 1  T = 133120 11/15/2018 Erkay Savaş

Comparison to the single-field (GF(p)) design Unified Overhead Cell Area 47.2w 48.5w 2.75% Cell Propagation Time 11 ns 0% w: word size 1.2 m CMOS technology 11/15/2018 Erkay Savaş

Design Alternatives Higher Radix Original design is radix 2 Namely, multiplier bits are scanned one bit in each clock cycle Possible to scan two or more bits of the multiplier a Radix-4: two bits Radix-8: three bits More Complex Design: lower clock frequency, higher area Less clock cycle count  Faster execution of multiplication 11/15/2018 Erkay Savaş

Comparison Higher radix vs. single radix Metric area  time For small total area (i.e. <10000 equivalent NAND gates) the performances of radix-2 and radix-8 are comparable Radix-8 multiplier outperforms radix-2 multiplier more than 3 times when the total area is around 25000 NAND gates 11/15/2018 Erkay Savaş

Dual-Radix Multiplier Radix-2 for GF(p) and radix-4 for GF(2k) MUX-1 MUX-2 Selection Logic 3x2 Dual Field Adder 11/15/2018 Erkay Savaş

Dual-Radix Multiplier Three multipliers A1: GF(p)-only multiplier A2: single-radix unified multiplier (with precomp.) A3: dual-radix multiplier Performance (area  time) A3 performs slightly worse than A1 and A2 (between 7% to 19%) in GF(p) mode A3 outperforms A2 by 38% to 46% in GF(2k)-mode 11/15/2018 Erkay Savaş

Unified Arithmetic? Unified multiplier carry-save adders used in multiplier It is not easy to perform other arithmetic operations with carry-save representation such as subtraction and comparison (essential in inversion) 11/15/2018 Erkay Savaş

New Redundant Representation Recall: Carry-save representation X = xs + xc. New redundant representation Redundant signed representation (RSD) X = xp - xn. Subtraction is equivalent to the addition X-Y = (xp - xn) - (yp - yn) = (xp - xn) + (yn - yp) Comparison is relatively easy 11/15/2018 Erkay Savaş

RSD All previous multipliers require a reverse transformation to non-redundant for after each multiplication There are thousands multiplication in ECC With RSD, all the computation can be done in RSD form without any reverse transformation a single transformation is necessary if the result is needed in non-redundant form. 11/15/2018 Erkay Savaş

Support for GF(3n) Arithmetic RSD lends itself to a unified arithmetic architecture that efficiently supports GF(3n) arithmetic 11/15/2018 Erkay Savaş

Analysis A1: GF(p)-only architecture A2: GF(2k)-only architecture A3: GF(3n)-only architecture A4: Unified architecture (GF(p) + GF(2k)) A5: Unified architecture (GF(p) + GF(2k) + GF(3n)) A1 + A2: Hypothetical architecture that has separate datapath for GF(p) and GF(2k) 11/15/2018 Erkay Savaş

Analysis Metric: area  time A4 over A1 + A2: 7.94% A5 over A1 + A2 + A3: 33.54% A5 over A4 + A3: 28.36% 11/15/2018 Erkay Savaş

Implementation Results 2.38 GHz, 0.13 m CMOS # of PUs 160-bit ECC s 1024-bit RSA ms Tate pairing GF(397) 4 315 21.0 508 8 210 10.5 334 16 189 5.25 32 2.12 4 PUs  ~11,000, 8 PUs  ~15,000 NAND gates 11/15/2018 Erkay Savaş

Research Directions Embed the unified architectures into common general-purpose processors Unified inversion using RSD Unified architectures for other PKC 11/15/2018 Erkay Savaş

Ending… Questions Contact Erkay Savaş erkays@sabanciuniv.edu http://people.sabanciuniv.edu/~erkays 11/15/2018 Erkay Savaş