Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative Study Kris Gaj, Sashisu Bajracharya, Nghi Nguyen, Deapesh Misra Tarek El-Ghazawi The George Washington University George Mason University
Gaj2P230/MAPLD 2004 Interface P memory P memory... PP PP I/O Interface FPGA memory FPGA memory... FPGA... I/O Microprocessor system Reconfigurable processor system What is a reconfigurable computer?
P230/MAPLD 2004 Why cryptography is a good application for reconfigurable computers? computationally intensive arithmetic operations unconventionally long operand sizes ( bits) multiple algorithms, parameters, key sizes, and architectures = need for reconfiguration
Gaj4P230/MAPLD 2004 SRC Hardware & Software
Gaj5P230/MAPLD 2004 SRC-6E from SRC Computers, Inc.
Gaj6P230/MAPLD 2004 SRC Hardware Architecture
Gaj7P230/MAPLD 2004 SRC Programming HLL (C) HDL (VHDL) SRC P system FPGA system Application Programmer Library Developer
Gaj8P230/MAPLD 2004 SRC Compilation Process
Gaj9P230/MAPLD 2004 Elliptic Curve Cryptosystems
Gaj10P230/MAPLD 2004 Elliptic Curve Cryptosystems public key (asymmetric) cryptosystems first true alternative for RSA several times shorter keys fast and compact implementations, in particular in hardware a family of cryptosystems, instead of a single cryptosystem
Gaj11P230/MAPLD 2004 Three Classes of Elliptic Curves Elliptic curves built over K = GF(p) K = GF(2 m ) Polynomial basis representation Normal basis representation Fast in hardware Arithmetic operations present in many libraries Compact in hardware m= Secure m Our m m=233
Gaj12P230/MAPLD 2004 Basic operations of ECC Basic operations in Galois Field GF(2 m ) Basic operations on points of an Elliptic Curve addition and subtraction (xor): x+y, x-y (XOR) addition of points: P + Q doubling a point: 2 P projective to affine coordinate: P2A multiplication, squaring: x y, x 2 inversion: x -1 Complex operations on points of an Elliptic Curve scalar multiplication: k P = P + P + …+P k times
Gaj13P230/MAPLD 2004 ECC hierarchy of functions kP P+Q2P projective_to_affine (P2A) MUL INV High level Medium level Low level 2 SQR XOR Low level 1 independent of the GF representation specific to the given GF representation
Gaj14P230/MAPLD 2004 Investigated Partitioning Schemes
Gaj15P230/MAPLD 2004 C function for P C function for MAP VHDL macro SRC Program Partitioning P system FPGA system HLL HDL
Gaj16P230/MAPLD 2004 kP C function for P C function for MAP VHDL macro H 0 0 H00 Partitioning (μP Software Only) specific to the given GF representation
Gaj17P230/MAPLD 2004 kP 0 0 H 00H Partitioning (VHDL only) C function for P C function for MAP VHDL macro specific to the given GF representation
Gaj18P230/MAPLD 2004 MUL4 C function for MAP VHDL macros ROTSQRXOR C function forµP 0 H L1 V_ ROT POW kP INV P2A P+Q2P kP INV P2A P+Q 2P MUL2 MUL 0HL1 Partitioning independent of the GF representation specific to the given GF representation
Gaj19P230/MAPLD 2004 MUL P+Q2P INV P2A kP MUL MUL4 MUL2 MUL POW FPGA Contents (0HL1)
Gaj20P230/MAPLD 2004 MUL4 C function for MAP VHDL macros ROTSQRXOR C function forµP 0 H L2 INV kP P2A P+Q2P kP P2AP+Q 2P MUL2 0HL2 Partitioning independent of the GF representation specific to the given GF representation
Gaj21P230/MAPLD HM Partitioning C function for MAP VHDL macros C function forµP 0 H M P+Q 2P P2A kP independent of the GF representation specific to the given GF representation
Gaj22P230/MAPLD 2004 Results
Gaj23P230/MAPLD 2004 Timing Measurements MAP Alloc. MAP Free DMA DataOut DMA Data In FPGA Computation.c file.mc file End-to-End time (SW) MAP function MAP function FPGA Configure Configuration time MAP Allocation time MAP Release Time End-to-End time (HW)
Gaj24P230/MAPLD 2004 Results for Optimal Normal Basis (Latency)
Gaj25P230/MAPLD 2004 Results for Polynomial Basis (Latency)
Gaj26P230/MAPLD 2004 Results for Optimal Normal Basis (Area)
Gaj27P230/MAPLD 2004 Results for the Polynomial Basis (Area)
Gaj28P230/MAPLD 2004 Algorithm Partitioning Scheme VHDL PB VHDL ONB Macro Wrapper MAP CMain C 0HL1N/A HL HMN/A HN/A Number of lines of code
Gaj29P230/MAPLD 2004 Conclusions Assuming focus on: Timing Resources Ease of programming
Gaj30P230/MAPLD 2004 Best implementation approaches Large speedup vs. software Ease of implementation Flexibility Optimal Normal Basis OHL1 scheme Polynomial Basis OHL2 scheme
Gaj31P230/MAPLD 2004 Conclusions Elliptic Curve Cryptosystem implementation challenging for reconfigurable computers because of optimization for latency rather than throughput limited amount of parallelism Absolute latency and resource utilization similar for Optimal Normal Basis and Polynomial Basis Number of lines of VHDL code smaller for the polynomial basis representation
Gaj32P230/MAPLD 2004 Conclusions – cont. because of polynomial basis operations more efficient in software 27 x greater for Optimal Normal Basis compared to Polynomial Basis Representation From 893 to 1305 times for Optimal Normal Basis From 33 times for Polynomial Basis Speed-up over Intel P3 microprocessor implementation