Towards An Efficient Low Frequency Energy Recovery Dynamic Logic Sujay Phadke Advanced Computer Architecture Lab Department of Electrical Engineering and.

Slides:

Advertisements

Similar presentations

Topics Electrical properties of static combinational gates:

Advertisements

CSET 4650 Field Programmable Logic Devices

Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.

A Resonant Clock Generator for Single-Phase Adiabatic Systems Conrad H. Ziesler Marios C. Papaefthymiou University of Michigan, Ann Arbor, MI Suhwan Kim.

Robust Low Power VLSI R obust L ow P ower VLSI Sub-threshold Sense Amplifier (SA) Compensation Using Auto-zeroing Circuitry 01/21/2014 Peter Beshay Department.

Elettronica T A.A Digital Integrated Circuits © Prentice Hall 2003 Inverter CMOS INVERTER.

Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.

Chapter 09 Advanced Techniques in CMOS Logic Circuits

Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.

Clock Design Adopted from David Harris of Harvey Mudd College.

 C. H. Ziesler etal., 2003 Energy Recovering ASIC Design Advanced Computer Architecture Laboratory Department of Electrical Engineering and Computer Science.

10/27/05ELEC / Lecture 161 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.

Designing Combinational Logic Circuits: Part2 Alternative Logic Forms:

Design, Verification, and Test of True Single-Phase Adiabatic Multiplier Suhwan Kim IBM Research Division T. J. Watson Research Center, Yorktown Heights.

Digital Integrated Circuits A Design Perspective

1 Clockless Logic Montek Singh Tue, Mar 16, 2004.

Introduction to CMOS VLSI Design SRAM/DRAM

© Digital Integrated Circuits 2nd Inverter CMOS Inverter: Digital Workhorse  Best Figures of Merit in CMOS Family  Noise Immunity  Performance  Power/Buffer.

S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 22: Material Review Prof. Sherief Reda Division of Engineering, Brown University.

Lecture 5 – Power Prof. Luke Theogarajan

Lecture 7: Power.

Fall 2006: Dec. 5 ELEC / Lecture 13 1 ELEC / (Fall 2006) Low-Power Design of Electronic Circuits Adiabatic Logic Vishwani.

Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 11 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Adiabatic Logic Vishwani D. Agrawal.

1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University

The CMOS Inverter Slides adapted from:

Digital Integrated Circuits© Prentice Hall 1995 Inverter THE INVERTERS.

1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University

EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.

© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.

Design of Robust, Energy-Efficient Full Adders for Deep-Submicrometer Design Using Hybrid-CMOS Logic Style Sumeer Goel, Ashok Kumar, and Magdy A. Bayoumi.

Low-Power CMOS Logic Circuit Topic Review 1 Part I: Overview (Shaw) Part II: (Vincent) Low-Power Design Through Voltage Scaling Estimation and Optimization.

1 IN THE NAME GOD Advanced VLSI Class Presentation A 1.1GHz Charge Recovery Logic Insructor : Dr. Fakhrayi Presented by : Mahdiyeh Mehran.

EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]

1. Department of Electronics Engineering Sahand University of Technology NMOS inverter with an n-channel enhancement-mode mosfet with the gate connected.

A 30-GS/sec Track and Hold Amplifier in 0.13-µm CMOS Technology

A Class Presentation for VLSI Course by : Fatemeh Refan Based on the work Leakage Power Analysis and Comparison of Deep Submicron Logic Gates Geoff Merrett.

1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University

Chapter 07 Electronic Analysis of CMOS Logic Gates

Ratioed Circuits Ratioed circuits use weak pull-up and stronger pull-down networks. The input capacitance is reduced and hence logical effort. Correct.

DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.

Modern VLSI Design 2e: Chapter 3 Copyright  1998 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;

Adiabatic Logic as Low-Power Design Technique Presented by: Muaayad Al-Mosawy Presented to: Dr. Maitham Shams Mar. 02, 2005.

Guy Lemieux, Mehdi Alimadadi, Samad Sheikhaei, Shahriar Mirabbasi University of British Columbia, Canada Patrick Palmer University of Cambridge, UK SoC.

Chapter 1 Combinational CMOS Logic Circuits Lecture # 4 Pass Transistors and Transmission Gates.

Lecture 10: Circuit Families. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 10: Circuit Families2 Outline  Pseudo-nMOS Logic  Dynamic Logic  Pass Transistor.

VLSI Design Lecture 5: Logic Gates Mohammad Arjomand CE Department Sharif Univ. of Tech. Adapted with modifications from Wayne Wolf’s lecture notes.

Advanced VLSI Design Unit 04: Combinational and Sequential Circuits.

Basics of Energy & Power Dissipation

© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Bi-CMOS Prakash B.

Dynamic Logic Dynamic Circuits will be introduced and their performance in terms of power, area, delay, energy and AT2 will be reviewed. We will review.

Adiabatic Circuits Mohammad Sharifkhani. Introduction Applying slow input slopes reduces E below CV2 Useful for driving large capacitors (Buffers) Power.

FPGA-Based System Design: Chapter 2 Copyright  2004 Prentice Hall PTR Topics n Logic gate delay. n Logic gate power consumption. n Driving large loads.

Course: High-Speed and Low- Power VLSI (97.575) Professor: Maitham Shams Presentation: Presentation: True Single- Phase Adiabatic Circuitry By Ehssan.

Solid-State Devices & Circuits

Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;

EE415 VLSI Design THE INVERTER [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]

Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.

Seok-jae, Lee VLSI Signal Processing Lab. Korea University

Click to edit Master title style Progress Update Energy-Performance Characterization of CMOS/MTJ Hybrid Circuits Fengbo Ren 05/28/2010.

1 Dynamic CMOS Chapter 9 of Textbook. 2 Dynamic CMOS  In static circuits at every point in time (except when switching) the output is connected to either.

EE141 Combinational Circuits 1 Chapter 6 (I) Designing Combinational Logic Circuits Dynamic CMOS LogicDynamic CMOS Logic V1.0 5/4/2003.

EE 466/586 VLSI Design Partha Pande School of EECS Washington State University

1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.

1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,

Adiabatic Technique for Energy Efficient Logic Circuits Design

Pass-Transistor Logic

Vishwani D. Agrawal James J. Danaher Professor

Presentation transcript:

Towards An Efficient Low Frequency Energy Recovery Dynamic Logic Sujay Phadke Advanced Computer Architecture Lab Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor Advisor: Prof. Marios Papaefthymiou September 28 th, 2005

2 Outline Power dissipation in conventional CMOS Standard approaches to reduce power dissipation Introduction to energy recovery circuits Background - Boost Logic operation, reported simulation results pros and cons from an energy standpoint Description of 3 new circuits designed Comparison of different circuits energy dissipation power supply variation Conclusion and future work

3 Power dissipation in conventional CMOS designs Streaming applications small amount of logic large number of Buffers Long wires – Large capacitance C Driving this C wastes energy  Throughput-limited datapaths Strict requirement on throughput Longer latencies can be tolerated (DSP applications) [ATMEL76C120 78MHz]

4 Conventional approaches to reducing power: voltage scaling and pipelining Voltage scaling can result in significant energy gains Lower dissipation Lower leakage Limitations: Limited by threshold voltages V th scaling limited by manufacturing processes Overhead of flip-flops Increasing the delay, limited scalability Unpipelined 2-stage pipeline 6-stage pipeline Voltage (V) Delay (ns)

5 Reduced voltage drivers and voltage converters Limited by V TH Delay in level conversion Requirements for efficient operation Energy efficient level conversion No throughput impact due to level conversion delay Point of diminishing returns! Voltage converter vdd vdd L vddLow swing High swing out

6 Energy dissipation in CMOS DC source input E=C.V.V=CV 2 E=C.V.0=0 (1/2)CV 2 no energy recovered back into the supply point of diminishing returns in scaling V dd Reducing V decreases E diss, but eventually will make the devices go into sub-threshold region Delay increases exponentially as V is decreased

7 Energy Recovery Circuits Switching energetics different from vanilla CMOS DC supply replaced by an AC supply Energy required to swing the voltage on a node is much less than the energy stored Use of inductors to supply and recover charge Resonate current through inductors from power clock to load capacitance Energy recovery gates can be used as timing elements Latency overhead does not translate to a throughput penalty

8 Energy recovery charging/discharging Source t V t V t V t V N  easy to generate as T  E diss 

9 Energy Recovery: A Brief History Reversible computing proposed as a method of achieving asymptotically zero energy computation Early circuit design (Inverter chains) Maksimovic, Oklobdzija (1-clock / 2-phase, 1.2 µm process, 40MHz) Dickinson and Denker (4 phase, 0.9 µm process, 250MHz) Athas et. al (Graphics Processor, 0.5 µm process, 15MHz) Kim et. al (True Single Phase Logic ) 8-bit 140MHz (0.5 µm process) Fundamental requirement of gradual power clock transitions Use of diodes to recover energy (Delay and Energy inefficient) Tracking power clock at it fastest transition only Pfet evaluation trees

10 Background Type 1: Boost logic [Sathe: ISLPED ’05] hybrid energy recovery family with high gate overdrive and voltage scaling no diodes, data-independent capacitance acts as a timing element; no throughput penalty less sensitive to power supply variation compared to vanilla CMOS differential outputs for data-independent capacitance seen by power clock 65% energy saving compared to conventional voltage scaled pipelined CMOS design  high energy dissipation at low frequencies (50MHz- 200MHz) 0.13  m process Sim post layout: upto 1.6GHz Chip: 750MHz- 1.3GHz

11 Structure and operation of Boost Logic Boost stage evaluation compl. eval Reduced potential evaluation Energy recovery sense- amplification

12 Energy Dissipation in Type 1 (Boost)  Increasing crowbar at lower frequencies  Energy dissipation keeps on increasing How do we decrease this? V dd 0 1 always a fight between weak pull-up and pull down! Sim. With 32- bit RC adder 0.13  m

13 Circuit Configurations Investigated Type 2: static CMOS in the evaluation stacks Type 3: use of static CMOS stack and an inverter to create differential outputs with lesser area overhead Type 4: A new domino CMOS logic in the evaluation stage and a modified energy recovery sense amplifier

14 Type 2 circuit: CMOS stacks in evaluation tree Complementary CMOS stacks differential outputs driven to full rails (V dd ’ and V ss ’ ) reduces crowbar significantly Sim. With 32- bit RC adder with clock generator 0.13  m

15 Type 2: Energy Dissipation  significant area overhead (6N+10) compared to Type 1(2N+10)  limited fan-in  slow operation of PMOS

16 Type 3: CMOS stack with complementary inverter Use inverter to create output differential lesser energy diss. at low frequencies 3N+10 area overhead Sim. With 32- bit RC adder with clock generator 0.13  m

17 Type 3: Limitations due to sub-threshold operation of inverter due to limited drive, the inverter operates in sub-threshold region shrinks with increasing frequency, fanout reliable operation (wrt. ∆V) only till ~ 50MHz how can we increase the inverter drive? at 10MHzat 100MHz Sim. With 32- bit RC adder with clock generator 0.13  m

18 Type 3: with low-threshold devices in the inverter stack Improvement obtained for lower frequencies  Sensitive to  coupling noise  process variation  Operation not robust for f>100MHz

19 A New Structure Need to create a good differential voltage with minimum area overhead and energy dissipation Need to modify the “Boost” sense amplifier stage to make the output voltage differential independent of fan-out loading Need to have good tolerance for power supply variations

20 Type 4: Domino CMOS with transmission gates evaluation sense amplification precharge n1,n2 transmission gate proxy output lines (low C) enables low-swing pulldown mask high C lines equalization

21 Operation: Evaluation/hold Phase  dual N-tree evaluates and pulls down one proxy output line transmission gates transfer charge to low lines No crowbar because headers are switched off Transistor M7 in the sense amplifier stage keeps equalized at approx. V dd /2 weak 0 weak 1

22 Operation: Precharge/amplify phase  outputs pulled to rails in a recovery fashion by the cross coupled inverters transmission gates isolate evaluate circuit from sense amp transfer charge to Transistor M7 in the sense amplifier stage is cut-off n1 and n2 pre-charge high to V dd ’

23 Type 4: Simulation Results evaluate/ hold precharge/ amplify evaluate/ hold Sim. With 32- bit RC adder with clock generator 0.13  m

24 Type4: Energy Dissipation 32-bit adder simulations with clock generator Shows substantial energy savings wrt Type 1 (Boost) Voltage differential independent of fan-out loading Works between 10MHz-200MHz Sim. With 32- bit RC adder with clock generator 0.13  m

25 Energy Comparison of Different Topologies Energy savings in Type 4 coming from: low-Cap. proxy output lines small charge-up of internal nodes isolation of eval. stage from sense amplifier elimination of crowbar 25%-65% reduction in energy over operating range of frequencies with small area overhead Type 4 Type 1 Type 2 Type 3 Sim. With 32- bit RC adder with clock generator 0.13  m

26 Robustness to Variations in Power Supply Delay variation is less than 5% for a 10% variation in power supply Type 4 circuit seen to be relatively insensitive to power supply variation compared to CMOS

27 Conclusions and Future Work Conclusions: Design of 3 structures to improve energy recovery efficiency at low frequencies without use of diodes, multiple clock domains A new domino style topology resulting in substantial energy savings with minimal area overhead Relatively insensitive to power supply variations Future work: Improve resonance of the Type 4 circuit Redesign on the clock generator to investigate potential power savings Performance of the circuit post-layout and comparisons Continuing investigations into other kinds of logic structures