A 1.5 GHz AWP Elliptic Curve Crypto Chip O. Hauck, S. A. Huss ICSLAB TU Darmstadt A. Katoch Philips Research A 1.5 GHz AWP Elliptic Curve Crypto Chip O.

Slides:



Advertisements
Similar presentations
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
Advertisements

Circuit Design for SRCMOS Asynchronous Wave Pipelines Oliver Hauck Circuit Design for SRCMOS Asynchronous Wave Pipelines Oliver Hauck Integrated Circuits.
Introduction to CMOS VLSI Design Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
Pass Transistor Logic. Agenda  Introduction  VLSI Design methodologies  Review of MOS Transistor Theory  Inverter – Nucleus of Digital Integrated.
MICROELETTRONICA Sequential circuits Lection 7.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Module 12.  In Module 9, 10, 11, you have been introduced to examples of combinational logic circuits whereby the outputs are entirely dependent on the.
1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Synchronous Digital Design Methodology and Guidelines
Clock Design Adopted from David Harris of Harvey Mudd College.
Embedding of Asynchronous Wave Pipelines into Synchronous Data Processing Stephan Hermanns, Sorin Alexander Huss University of Technology Darmstadt, Germany.
ARM Organization and Implementation Aleksandar Milenkovic Web:
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
1 Clockless Logic Montek Singh Thu, Jan 13, 2004.
Team W1 Design Manager: Rebecca Miller 1. Bobby Colyer (W11) 2. Jeffrey Kuo (W12) 3. Myron Kwai (W13) 4. Shirlene Lim (W14) Stage VI: February 25 h 2004.
1 Clockless Logic Montek Singh Tue, Mar 16, 2004.
Low Power Design for Wireless Sensor Networks Aki Happonen.
COMP Clockless Logic and Silicon Compilers Lecture 3
Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.
IMPLEMENTATION OF µ - PROCESSOR DATA PATH
Team W1 Design Manager: Rebecca Miller 1. Bobby Colyer (W11) 2. Jeffrey Kuo (W12) 3. Myron Kwai (W13) 4. Shirlene Lim (W14) Stage VII: March 1 st 2004.
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
High-Throughput Asynchronous Pipelines for Fine-Grain Dynamic Datapaths Montek Singh and Steven Nowick Columbia University New York, USA
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI Circuit Design Lecture 23 - Subsystem.
Introduction to CMOS VLSI Design Circuit Families.
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines.
1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.
Digital Integrated Circuits for Communication
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
A Class Presentation for VLSI Course by : Fatemeh Refan Based on the work Leakage Power Analysis and Comparison of Deep Submicron Logic Gates Geoff Merrett.
EE 447 VLSI Design Lecture 8: Circuit Families.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Ratioed Circuits Ratioed circuits use weak pull-up and stronger pull-down networks. The input capacitance is reduced and hence logical effort. Correct.
DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.
Low Power – High Speed MCML Circuits (II)
The following foils are for a presentation in Munich for Siemens.
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.
Lecture 10: Circuit Families. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 10: Circuit Families2 Outline  Pseudo-nMOS Logic  Dynamic Logic  Pass Transistor.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 26: October 31, 2014 Synchronous Circuits.
RTL Hardware Design by P. Chu Chapter Poor design practice and remedy 2. More counters 3. Register as fast temporary storage 4. Pipelined circuit.
Computer Architecture Lecture 3 Combinational Circuits Ralph Grishman September 2015 NYU.
Divide Calculation Latency
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Dynamic Logic Dynamic Circuits will be introduced and their performance in terms of power, area, delay, energy and AT2 will be reviewed. We will review.
Thanushan Kugathasan, CERN Plans on ALPIDE development 02/12/2014, CERN.
Other Logic Implementations
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
Clocking System Design
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
RTL Hardware Design by P. Chu Chapter 9 – ECE420 (CSUN) Mirzaei 1 Sequential Circuit Design: Practice Shahnam Mirzaei, PhD Spring 2016 California State.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Lecture 11: Sequential Circuit Design
Sequential circuit design with metastability
CS Spring 2008 – Lec #17 – Retiming - 1
Multipliers Multipliers play an important role in today’s digital signal processing and various other applications. The common multiplication method is.
Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.
Clockless Logic: Asynchronous Pipelines
Wagging Logic: Moore's Law will eventually fix it
Instructor: Michael Greenbaum
Presentation transcript:

A 1.5 GHz AWP Elliptic Curve Crypto Chip O. Hauck, S. A. Huss ICSLAB TU Darmstadt A. Katoch Philips Research A 1.5 GHz AWP Elliptic Curve Crypto Chip O. Hauck, S. A. Huss ICSLAB TU Darmstadt A. Katoch Philips Research

2 Outline n Current AWP projects n GATS-Chip n Elliptic Curve Chip AWPs compared to sync wave pipes SRCMOS circuits Crypto background Architecture and Implementation n Conclusion

3 Status of AWP Projects n 2D-DCT: 0.6µm, being re-designed with self-resetting logic n SRT: currently on schematics only n 64b Giga-Hertz Adder Test Site: 0.6µm, almost complete, tape out in May n Crypto chip: 0.35µm, tape out in July targeted

4 Giga-Hertz Adder Test Site n AMS 0.6µm 3M CMOS n 64b Brent-Kung adder n ~10k devices, ~1.3sqmm n latency ~2.5ns n cycle 1.0ns n on-chip test circuitry

5 General Framework for Pipelines Logic Latch/Reg Data Clk

6 Some Notations...

7 General Relations

8 Synchronous Wave Pipeline Wave Logic Latch/Reg Data Clk n Promise: higher throughput at reduced latency, clock load, area and power n Drawback: difficult tuning of logic and delay elements n Discrete, distinct valid frequency ranges n Low high narrow frequency range n not suitable for system design

9 n Throughput determined by longest logic path + clock/register overhead n Fine-grain pipelining allows high throughput at the cost of increased clock/register overhead Synchronous Pipeline Logic Latch/Reg Data Clk

10 Asynchronous Wave Pipeline (AWP) Wave Logic Wave Latch Data req_inreq_out matched delay n More than one data and request propagating coherently n One-sided cycle time constraint n Delay must track logic over PTV corners

11 Example: 64-b Brent-Kung Parallel Adder pgPG G xorxor Buffers provide for same depth on every logic path All gates in the same column must have the same delay

12 Circuits n Logic style used has to minimize delay variation n Earlier work focused on bipolar logic (ECL, CML), but CMOS is mainstream n Static CMOS is not well suited for wave piping, fixing the problem results in more power and slower speed n Pass transistor logic gives slopy edges thereby introducing delay variation n Dynamic logic is attractive as only output high transition is data-dependant, output pulldown is done by precharge n What is needed is a dynamic logic family without precharge overhead: SRCMOS

13 SRCMOS n Distinguishing property of our SRCMOS circuits: precharge feedback is fully local, and NMOS trees are delay balanced N inputs output

14 Operation of a 2-AND

15 CISCO Data Encryption Service Adapter [ Cisco Systems ]

16 DES Key Exchange using Public-Key Cryptosystem based on Elliptic Curves

17 n Security based upon DLP: in a finite Abelian group we can easily compute given n However, is hard to compute out of and n DLP extraordinarily hard for point group of elliptic curve: n Set of solutions of cubic equation over any field is an abelian group Why is this secure ?

18 Elliptic Curve Mathematics and Algorithm n Two types - supersingular and non-supersingular n Non-supersingular have the highest security n EC equation:

19 Adding Two Points Over Elliptic Curves

20 Optimal Normal Basis

21 Multiplication over ONBs

22 The Final Formula

23 Architecture of Multiplier delay abx _Xor Wave latch Pseudo NMOS SRCMOS request

24 Dual-rail Circuits n Dual-rail cross-coupled SRCMOS circuit n NMOS trees are designed such that there is only one conducting path to ground

25 Delay Variations at Various Stages

26 Hierarchy of Control always k x left shift Hamming weight = 40 EC double EC add If x=1 ADDMULLOAD/STORE EC arithmetic R * 2347 MUL/s EC arithmetic R * 2347 MUL/s Finite field arithmetic R * bit/s Finite field arithmetic R * bit/s * 261 Double-and-Add Key generation rate R Double-and-Add Key generation rate R *(261*7+40*13)

27 Control Unit Architecture n Request signals trigger the state transitions. n Autonomous state transitions are triggered by signal X X AWP Logic For static operation req1 reqn Req_out reset OUT IN1 IN2 R E G R E G R E G R E G

28 High Level Control: Double-and-Add Start/LoadX, ResetZ X=1 LoadY X=0 X=1 If K=0 Shift K If K=1 X=1 ShiftK, Double K=0,DoubleDone K=1,DoubleDone/Add X=1 AddDone X=1 X=0 If Stop=1/KP_Done 2 n Level-based control

29 Middle Level Control: EC Point Doubling n Pulse-based control 0 X=0 1 X= X=0 5 X=1 X=0 X= Start OPAX OPBZ MULT MD OPAA Shift OPBA MULT MD

30 Various States in a Pulsed Control

31 Conclusion