A 64b Adder Using Self-Calibrating Differential Output Prediction Logic K. H. Chong and Larry McMurchie Dept. of Electrical Engineering University of Washington.

Slides:



Advertisements
Similar presentations
ECE555 Lecture 5 Nam Sung Kim University of Wisconsin – Madison
Advertisements

Transmission Gate Based Circuits
Feb. 17, 2011 Midterm overview Real life examples of built chips
CPE 626 CPU Resources: Adders & Multipliers Aleksandar Milenkovic Web:
Introduction to CMOS VLSI Design Sequential Circuits
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN1600) Lecture 21: Dynamic Combinational Circuit Design Prof. Sherief Reda Division of.
Chapter 09 Advanced Techniques in CMOS Logic Circuits
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Combinational circuits Lection 6
Clock Design Adopted from David Harris of Harvey Mudd College.
Fall 06, Sep 19, 21 ELEC / Lecture 6 1 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic.
Designing Combinational Logic Circuits: Part2 Alternative Logic Forms:
May 14, ISVLSI 09 Algorithms for Estimating Number of Glitches and Dynamic Power in CMOS Circuits with Delay Variations Jins Davis Alexander Vishwani.
Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 12 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Pass Transistor Logic: A Low Power.
VLSI Arithmetic Adders Prof. Vojin G. Oklobdzija University of California
Lecture #24 Gates to circuits
30 September 2004Comp 120 Fall September 2004 Chapter 4 – Logic Gates Read in Chapter 4 pages , , section 4.8 through top of page.
Introduction to CMOS VLSI Design Lecture 11: Adders
Outline Noise Margins Transient Analysis Delay Estimation
Introduction to CMOS VLSI Design Circuit Families.
Circuit Families Adopted from David Harris of Harvey Mudd College.
Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 13 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Pseudo-nMOS, Dynamic CMOS and Domino.
Fall 2008EE VLSI Design I - © Kia Bazargan 1 EE 5323 – VLSI Design I Kia Bazargan University of Minnesota Adders.
Lecture 17: Adders.
Digital Integrated Circuits© Prentice Hall 1995 Combinational Logic COMBINATIONAL LOGIC.
Lecture 21, Slide 1EECS40, Fall 2004Prof. White Lecture #21 OUTLINE –Sequential logic circuits –Fan-out –Propagation delay –CMOS power consumption Reading:
Introduction to CMOS VLSI Design Lecture 11: Adders David Harris Harvey Mudd College Spring 2004.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 10.1 EE4800 CMOS Digital IC Design & Analysis Lecture 10 Combinational Circuit Design Zhuo Feng.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
1 Delay Estimation Most digital designs have multiple data paths some of which are not critical. The critical path is defined as the path the offers the.
VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California
Determining the Optimal Process Technology for Performance- Constrained Circuits Michael Boyer & Sudeep Ghosh ECE 563: Introduction to VLSI December 5.
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
EE 447 VLSI Design Lecture 8: Circuit Families.
Arithmetic Building Blocks
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.
Ratioed Circuits Ratioed circuits use weak pull-up and stronger pull-down networks. The input capacitance is reduced and hence logical effort. Correct.
DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.
Chapter 14 Arithmetic Circuits (I): Adder Designs Rev /12/2003
A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri Advanced VLSI Course Presentation University of Tehran December.
Design of a 32-Bit Hybrid Prefix-Carry Look-Ahead Adder
NTU Confidential Test Asynchronous FIR Filter Design Presenter: Po-Chun Hsieh Advisor:Tzi-Dar Chiueh Date: 2003/12/1.
Introduction to CMOS VLSI Design Lecture 5: Logical Effort GRECO-CIn-UFPE Harvey Mudd College Spring 2004.
Lecture 10: Circuit Families. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 10: Circuit Families2 Outline  Pseudo-nMOS Logic  Dynamic Logic  Pass Transistor.
UNIVERSITY OF ROSTOCK Institute of Applied Microelectronics and Computer Science Single-Rail Self-timed Logic Circuits in Synchronous Designs Frank Grassert,
Advanced VLSI Design Unit 04: Combinational and Sequential Circuits.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
Introduction to CMOS VLSI Design Lecture 9: Circuit Families
Basics of Energy & Power Dissipation
EE466: VLSI Design Lecture 13: Adders
64 bit Kogge-Stone Adders in different logic styles – A study Rob McNish Satyanand Nalam.
Static CMOS Logic Seating chart updates
EE141 © Digital Integrated Circuits 2nd Combinational Circuits 1 A few notes for your design  Finger and multiplier in schematic design  Parametric analysis.
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
EE415 VLSI Design. Read 4.1, 4.2 COMBINATIONAL LOGIC.
Dynamic Logic.
1 Dynamic CMOS Chapter 9 of Textbook. 2 Dynamic CMOS  In static circuits at every point in time (except when switching) the output is connected to either.
EE141 Combinational Circuits 1 Chapter 6 (I) Designing Combinational Logic Circuits Dynamic CMOS LogicDynamic CMOS Logic V1.0 5/4/2003.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
Topic: N-Bit parallel and Serial adder
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.
M V Ganeswara Rao Associate Professor Dept. of ECE Shri Vishnu Engineering College for Women Bhimavaram Hardware Architecture of Low-Power ALU using Clock.
Output Prediction Logic (OPL)
Dual Mode Logic An approach for high speed and energy efficient design
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Presentation transcript:

A 64b Adder Using Self-Calibrating Differential Output Prediction Logic K. H. Chong and Larry McMurchie Dept. of Electrical Engineering University of Washington Carl Sechen Electrical Engineering Dept. University of Texas at Dallas Advanced VLSI FALL 2006 CLASS PRESENTATION BY: A.Jahanshahi Dept. of Electrical And Computer Engineering University of Tehran ISSCC 2006 Supervisor: M.Fakhraei

2 Outline History of Output Prediction Logic Introduction to Output Prediction Logic (OPL) –Fastest digital logic technique Self-calibrating Differential OPL (DOPL) –Twice as fast and uses half the energy compared to domino logic High-speed, power-efficient 64b adder architecture –Valency-3 Kogge-Stone sparse tree –DOPL-specific 3b carry-select units –Only 5 logic levels Measurement results

3 History First introduced in 2000[1] - speed up of 2X to 3X over optimized static CMOS and to 5X when applied to wide input NOR Gates.[1] Differential OPL(DOPL) is 5 times faster than optimized static CMOS and nearly 2 times faster than OPL and domino [2]. Until Now several successful Chips have been reported.[3-7]

4 Gate 1Gate 2Gate 3Gate Gate 1Gate 2Gate 3Gate clk1 clk2 clk3 clk4 OPL reduces the worst case by predicting that all outputs are 1 On any critical path, only every other gate will have to transition (pull down) If consecutive gates pull down, then this is not a critical path since the first pull- down event does not cause the second Critical path delay will be reduced by at least 50% Speedup > 2X by skewing the gates for pd transitions How to achieve high inputs and high outputs on inverting gates? Output Prediction Logic Both static and OPL gates are inherently inverting In the worst-case for static CMOS, the output of every gate in a critical path must fully transition from 1 to 0, or 0 to 1

5 Three Types of Single-Rail OPL Gates OPL-staticOPL-pseudoOPL-dynamic a b c abc clk out abc clk out abc clk out clk In all 3 cases, when gates are not enabled (clk = 0), output will be high even if inputs are high Enable the gates (clk = 1) when inputs have arrived

6 OPL Clock Separations Gate 1Gate 2Gate 3Gate clk1 clk2 clk3 clk4 clk i-1 clk i clk i+1 clock separation  t i One fast pull-down event every TWO adjacent clock separations! –for ANY critical path Separations are small, less than an inverter delay which can be produced for example by Reduced swing logic[3] Predicted 1’s are maintained by delaying the clocks

7 Delay vs. Clock Separation V DD GND CLK IN OUT CLK IN OUT GND CLK IN OUT GND Clock blockingDelay optimalClock too early Red: OPL-static NOR3 chain Blue point is optimally sized static CMOS NOR3 chain Robust: +/- 30% over nominal sep. of.14 gives > 2X speedup Delay (ns) Clock separation (ns) [2]

8 Drawback of true differential gates is that one side or the other will have a tall stack of devices In differential domino, in the worst case, every stack on a particular signal path will have to discharge In OPL-differential, at most every other stack on a critical signal path will have to discharge: 2X speedup OPL-Differential Gates

9 Diff. Domino vs. OPL-Diff. Delays (ns) for chains of 10 gates (FO of 4) in 0.18um TSMC –static CMOS chains are optimally sized –domino and OPL-differential use same size transistors

10 Improved OPL-Differential S. Kio, L. McMurchie, and C. Sechen, “Application of Output Prediction Logic to Differential CMOS,” Proc. of IEEE Computer Society Annual Workshop on VLSI, Orlando, FL, April 19-20, 2001 t fall (OUT)  t fall (OUTB) needed for contention free evaluation OUTB OUT clk P4 P3 P1 P2 clk OUTB OUT P4 P3 P1 P2 clk

11 Self-Calibrating Differential OPL clk clk_ref T-gate Buffer tree 1st 2nd 3rd 4th 5th 1st level DOPL 2 nd level DOPL 3 rd level DOPL 4 th level DOPL 5 th level DOPL Dual output gates: Use a completion detector to produce a downstream clock –Ideally should feed to the next level –But, DOPL gates are too fast! If a DOPL gate evaluates slower (faster) than expected, downstream clock will be delayed (sped-up) to compensate

12 Clock Skew Reduction DOPL circuits are levelized Completion detector outputs for each level are tied together Cannot use static CMOS NAND2’s due to contention DOPL clk_ref T-gate Buffer tree 1 st 2 nd 3 rd 4 th 5 th DOPL 1 st 2 nd 3 rd 4 th 5 th

13 pMOS Dynamic NAND2 Completion Detector clk Reset out2out1 out4out3 Minimizes crowbar current Fast, monotonic rising clock edge Power consumption is comparable to that of an inverter out2 clk out1 Evaluate devices out4 out3 Evaluate devices DOPL1 DOPL2

14 Low Skew Inverter Generates Reset Signals DOPL clk_ref T-gate Buffer tree 1st 2 nd 3 rd 4 th 5 th Reset(3) Reset(1) Reset(2) clk Reset(n) in1 in2 in3 in4 Crowbar current is minimized: –Reset goes low slightly before DOPL evaluates –Reset goes high slightly after DOPL pre-charges

15 Self-Calibrating DOPL Floorplan

16 64b Adder Architecture c16 c13c10 c7 c4 c1 Valency-3 Kogge-Stone sparse carry tree log 3 N levels for every 3rd carry, but the challenge is in efficiently producing the “missing” pairs of carries

17 3b Carry-Select Units Cin C2 C5 C8 Valency-3 Kogge-Stone sparse tree quickly generates every 3rd carry –log 3 N levels Use carry-select CLA Adder to output sums when this “quick” carry arrives S1 0 -S0 0 S4 0 -S2 0 S7 0 -S5 0 S1 1 -S0 1 S4 1 -S2 1 S7 1 -S5 1 MUX Cin C2 C5 S1-S0 S4-S2 S7-S5

18 64b Adder Layout and Photomicrograph IBM 130nm 1.2V process (8RF): Area = 264um X 180um Auto placed and routed

19 Energy per operation: 29.5 pJ for the IBM 130nm 1.2V process Since energy is proportional to CV 2, we can conservatively estimate the energy consumption for a 90nm 1.1V process: Energy Consumption

20 Measured Results 1.R. Zlatanovici and B. Nikolic, “Power-performance optimal 64-bit carry-lookahead adders” Proc. ESSCIRC, Sep 2003, pp. 321 – S. Sun, Y. Han, X. Guo, K.H. Chong, L. McMurchie, and C. Sechen, “409ps 4.7 FO4 64b Adder Based on Output Prediction Logic in 0.18um CMOS,” Proc. IEEE Comp. Soc. Annual Symp. on VLSI (ISVLSI), May 2005, Pages: 52 – S. Perri, P. Corsonello, and G. Staino, “A Low Power Sub-Nanosecond Standard-Cells Based Adder,” Proc. IEEE ICECS 2003, pp. 296 – 299.

21 V DD vs. Delay Curve for 64b DOPL Adder

22 Summary Developed self-calibrating differential output prediction logic (DOPL) –Twice as fast as domino logic, and half the energy Developed hybrid 64b adder architecture, consisting of a valency-3 Kogge-Stone sparse tree and DOPL-specific 3b carry-select units 64b adder implemented using 130nm 1.2V IBM process (8RF) Nominal measured delay 238ps (3.9 FO4) –Best measured delay 215ps (3.5 FO4) Fastest 64b adder reported by nearly 2X DOPL is a great candidate for scaling. Energy: 29.5 pJ (conservatively scales to 17.2 pJ for a 90nm process) –Competitive with fast static CMOS adders

23 References [1] L. McMurchie, S. Kio, G. Yee, T. Thorp and C. Sechen, “Output Prediction Logic: A High Performance CMOS Design Technique”, Proc. Int. Conf. On Computer Design (ICCD), September 17-20,2000, Austin, TX. [2] Kio Su, et al., “Application of Output Prediction Logic to Differential CMOS,” Proc. IEEE Workshop on VLSI, pp , April [3] S. Sun, Y. Han, X. Guo, K.H. Chong, L. McMurchie, and C. Sechen, “409ps 4.7 FO4 64b Adder Based on Output Prediction Logic in 0.18um CMOS,” Proc. IEEE Comp. Soc. Annual Symp. on VLSI (ISVLSI), May 2005, Pages: 52 – 58. [4] X. Guo and C. Sechen, “A High Throughput Divider Implementation,” Proc. IEEE CICC, Paper 15.2, Sept., [5] R. Zlatanovici and B. Nikolic, “Power-Performance Optimal 64-bit Carry-Lookahead Adders,” Proc. ESSCIRC, pp , Sept., [6] Sheng Sun; McMurchie, L.; A High-Performance 64-bit Adder Implemented in Output Prediction Logic, Sechen, C.; Advanced Research in VLSI, ARVLSI Proceedings Conference on March 2001 Page(s): [7] High Speed Redundant Adder and Divider in Output Prediction Logic, Proceedings of the IEEE Computer Society Annual Symposium on VLSI New Frontiers in VLSI Design,2005

24