A 1.5 GHz AWP Elliptic Curve Crypto Chip O. Hauck, S. A. Huss ICSLAB TU Darmstadt A. Katoch Philips Research A 1.5 GHz AWP Elliptic Curve Crypto Chip O. Hauck, S. A. Huss ICSLAB TU Darmstadt A. Katoch Philips Research
2 Outline n Current AWP projects n GATS-Chip n Elliptic Curve Chip AWPs compared to sync wave pipes SRCMOS circuits Crypto background Architecture and Implementation n Conclusion
3 Status of AWP Projects n 2D-DCT: 0.6µm, being re-designed with self-resetting logic n SRT: currently on schematics only n 64b Giga-Hertz Adder Test Site: 0.6µm, almost complete, tape out in May n Crypto chip: 0.35µm, tape out in July targeted
4 Giga-Hertz Adder Test Site n AMS 0.6µm 3M CMOS n 64b Brent-Kung adder n ~10k devices, ~1.3sqmm n latency ~2.5ns n cycle 1.0ns n on-chip test circuitry
5 General Framework for Pipelines Logic Latch/Reg Data Clk
6 Some Notations...
7 General Relations
8 Synchronous Wave Pipeline Wave Logic Latch/Reg Data Clk n Promise: higher throughput at reduced latency, clock load, area and power n Drawback: difficult tuning of logic and delay elements n Discrete, distinct valid frequency ranges n Low high narrow frequency range n not suitable for system design
9 n Throughput determined by longest logic path + clock/register overhead n Fine-grain pipelining allows high throughput at the cost of increased clock/register overhead Synchronous Pipeline Logic Latch/Reg Data Clk
10 Asynchronous Wave Pipeline (AWP) Wave Logic Wave Latch Data req_inreq_out matched delay n More than one data and request propagating coherently n One-sided cycle time constraint n Delay must track logic over PTV corners
11 Example: 64-b Brent-Kung Parallel Adder pgPG G xorxor Buffers provide for same depth on every logic path All gates in the same column must have the same delay
12 Circuits n Logic style used has to minimize delay variation n Earlier work focused on bipolar logic (ECL, CML), but CMOS is mainstream n Static CMOS is not well suited for wave piping, fixing the problem results in more power and slower speed n Pass transistor logic gives slopy edges thereby introducing delay variation n Dynamic logic is attractive as only output high transition is data-dependant, output pulldown is done by precharge n What is needed is a dynamic logic family without precharge overhead: SRCMOS
13 SRCMOS n Distinguishing property of our SRCMOS circuits: precharge feedback is fully local, and NMOS trees are delay balanced N inputs output
14 Operation of a 2-AND
15 CISCO Data Encryption Service Adapter [ Cisco Systems ]
16 DES Key Exchange using Public-Key Cryptosystem based on Elliptic Curves
17 n Security based upon DLP: in a finite Abelian group we can easily compute given n However, is hard to compute out of and n DLP extraordinarily hard for point group of elliptic curve: n Set of solutions of cubic equation over any field is an abelian group Why is this secure ?
18 Elliptic Curve Mathematics and Algorithm n Two types - supersingular and non-supersingular n Non-supersingular have the highest security n EC equation:
19 Adding Two Points Over Elliptic Curves
20 Optimal Normal Basis
21 Multiplication over ONBs
22 The Final Formula
23 Architecture of Multiplier delay abx _Xor Wave latch Pseudo NMOS SRCMOS request
24 Dual-rail Circuits n Dual-rail cross-coupled SRCMOS circuit n NMOS trees are designed such that there is only one conducting path to ground
25 Delay Variations at Various Stages
26 Hierarchy of Control always k x left shift Hamming weight = 40 EC double EC add If x=1 ADDMULLOAD/STORE EC arithmetic R * 2347 MUL/s EC arithmetic R * 2347 MUL/s Finite field arithmetic R * bit/s Finite field arithmetic R * bit/s * 261 Double-and-Add Key generation rate R Double-and-Add Key generation rate R *(261*7+40*13)
27 Control Unit Architecture n Request signals trigger the state transitions. n Autonomous state transitions are triggered by signal X X AWP Logic For static operation req1 reqn Req_out reset OUT IN1 IN2 R E G R E G R E G R E G
28 High Level Control: Double-and-Add Start/LoadX, ResetZ X=1 LoadY X=0 X=1 If K=0 Shift K If K=1 X=1 ShiftK, Double K=0,DoubleDone K=1,DoubleDone/Add X=1 AddDone X=1 X=0 If Stop=1/KP_Done 2 n Level-based control
29 Middle Level Control: EC Point Doubling n Pulse-based control 0 X=0 1 X= X=0 5 X=1 X=0 X= Start OPAX OPBZ MULT MD OPAA Shift OPBA MULT MD
30 Various States in a Pulsed Control
31 Conclusion