Elliptic Curve Cryptography over GF(2m) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative Study Kris Gaj, Sashisu Bajracharya, Nghi Nguyen, Deapesh Misra Tarek El-Ghazawi George Mason University The George Washington University Gaj 1
What is a reconfigurable computer? Reconfigurable processor system Microprocessor system P . . . P . . . FPGA FPGA P memory P memory FPGA memory FPGA memory . . . . . . Interface Interface I/O I/O Gaj 2
Why cryptography is a good application for reconfigurable computers? computationally intensive arithmetic operations unconventionally long operand sizes (160-2048 bits) multiple algorithms, parameters, key sizes, and architectures = need for reconfiguration
SRC Hardware & Software Gaj 4
SRC-6E from SRC Computers, Inc. Gaj 5
SRC Hardware Architecture Gaj 6
SRC Programming SRC HDL (VHDL) HLL (C) P system FPGA system Library Developer Application Programmer Gaj 7
SRC Compilation Process Gaj 8
Elliptic Curve Cryptosystems Gaj 9
Elliptic Curve Cryptosystems public key (asymmetric) cryptosystems first true alternative for RSA several times shorter keys fast and compact implementations, in particular in hardware a family of cryptosystems, instead of a single cryptosystem Gaj 10
Three Classes of Elliptic Curves Elliptic curves built over Secure m m=155 .. 512 K = GF(p) K = GF(2m) Our m m=233 Arithmetic operations present in many libraries Polynomial basis representation Normal basis representation Fast in hardware Compact in hardware Gaj 11
Basic operations of ECC Basic operations in Galois Field GF(2m) addition and subtraction (xor): x+y, x-y (XOR) multiplication, squaring: x y, x2 inversion: x-1 Basic operations on points of an Elliptic Curve addition of points: P + Q doubling a point: 2 P projective to affine coordinate: P2A Complex operations on points of an Elliptic Curve scalar multiplication: k P = P + P + …+P Gaj 12 k times
ECC hierarchy of functions High level kP projective_to_affine (P2A) Medium level P+Q 2P Low level 2 INV Low level 1 XOR MUL SQR independent of the GF representation specific to the given GF representation Gaj 13
Investigated Partitioning Schemes Gaj 14
SRC Program Partitioning C function for P P system HLL C function for MAP FPGA system VHDL macro HDL Gaj 15
H00 Partitioning (μP Software Only) C function for P kP H C function for MAP VHDL macro specific to the given GF representation Gaj 16
00H Partitioning (VHDL only) C function for P C function for MAP VHDL macro H kP specific to the given GF representation Gaj 17
0HL1 Partitioning C function for MAP VHDL macros for µ P H L1 kP MUL4 C function for MAP VHDL macros ROT SQR XOR for µ P H L1 V_ POW kP INV P2A P+Q 2P MUL2 MUL independent of the GF representation specific to the given GF representation Gaj 18
FPGA Contents (0HL1) kP MUL4 P+Q 2P P2A MUL2 INV MUL MUL MUL MUL MUL POW INV P2A Gaj 19
0HL2 Partitioning C function for µ P kP kP C function H for MAP VHDL for µ P kP kP C function H P+Q 2P for MAP P+Q P+Q 2P 2P P2A P2A VHDL MUL2 L2 MUL4 SQR ROT XOR INV INV macros independent of the GF representation specific to the given GF representation Gaj 20
0HM Partitioning C function for µ P C function kP H for MAP VHDL P+Q C function for µ P C function kP H for MAP VHDL P+Q P+Q 2P 2P P2A P2A M macros independent of the GF representation specific to the given GF representation Gaj 21
Results Gaj 22
Timing Measurements End-to-End time (SW) MAP function MAP function MAP .c file .mc file MAP function MAP function MAP Alloc. FPGA Configure DMA Data In FPGA Computation DMA DataOut MAP Free End-to-End time (HW) End-to-End time (SW) MAP Allocation time Configuration time MAP Release Time Gaj 23
Results for Optimal Normal Basis (Latency) Gaj 24
Results for Polynomial Basis (Latency) Gaj 25
Results for Optimal Normal Basis (Area) Gaj 26
Results for the Polynomial Basis (Area) Gaj 27
Algorithm Partitioning Scheme Number of lines of code Algorithm Partitioning Scheme VHDL PB ONB Macro Wrapper MAP C Main C 0HL1 N/A 1007 260 371 153 0HL2 714 1291 230 349 0HM 1744 160 185 00H 1960 36 78 Gaj 28
Conclusions Assuming focus on: Timing Resources Ease of programming Gaj 29
Large speedup vs. software Ease of implementation Best implementation approaches Optimal Normal Basis OHL1 scheme Polynomial Basis OHL2 scheme Large speedup vs. software Ease of implementation Flexibility Gaj 30
Conclusions Elliptic Curve Cryptosystem implementation challenging for reconfigurable computers because of optimization for latency rather than throughput limited amount of parallelism Absolute latency and resource utilization similar for Optimal Normal Basis and Polynomial Basis Number of lines of VHDL code smaller for the polynomial basis representation Gaj 31
Conclusions – cont. Speed-up over Intel P3 microprocessor implementation From 893 to 1305 times for Optimal Normal Basis From 33 times for Polynomial Basis 27 x greater for Optimal Normal Basis compared to Polynomial Basis Representation because of polynomial basis operations more efficient in software Gaj 32