Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #24 Adiabatic CMOS cont. Wed., Mar. 13.

Slides:



Advertisements
Similar presentations
Fig Typical voltage transfer characteristic (VTC) of a logic inverter, illustrating the definition of the critical points.
Advertisements

Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
Introduction to CMOS VLSI Design Sequential Circuits.
VLSI Design EE 447/547 Sequential circuits 1 EE 447/547 VLSI Design Lecture 9: Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
ECE C03 Lecture 81 Lecture 8 Memory Elements and Clocking Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
MICROELETTRONICA Sequential circuits Lection 7.
Modern VLSI Design 4e: Chapter 5 Copyright  2008 Wayne Wolf Topics n Memory elements. n Basics of sequential machines.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 4, 2011 Synchronous Circuits.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits David Harris Harvey Mudd College Spring 2004.
CP208 Digital Electronics Class Lecture 11 May 13, 2009.
Sequential Circuits. Outline  Floorplanning  Sequencing  Sequencing Element Design  Max and Min-Delay  Clock Skew  Time Borrowing  Two-Phase Clocking.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Synchronous Digital Design Methodology and Guidelines
Introduction to CMOS VLSI Design Lecture 13: SRAM
1 Clockless Logic Montek Singh Tue, Mar 23, 2004.
1 Clockless Logic Montek Singh Tue, Mar 16, 2004.
Introduction to CMOS VLSI Design SRAM/DRAM
Modern VLSI Design 2e: Chapter 5 Copyright  1998 Prentice Hall PTR Topics n Memory elements. n Basics of sequential machines.
Computer ArchitectureFall 2008 © August 20 th, Introduction to Computer Architecture Lecture 2 – Digital Logic Design.
Lecture 7: Power.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
Contemporary Logic Design Sequential Logic © R.H. Katz Transparency No Chapter #6: Sequential Logic Design Sequential Switching Networks.
1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines.
Lecture 21, Slide 1EECS40, Fall 2004Prof. White Lecture #21 OUTLINE –Sequential logic circuits –Fan-out –Propagation delay –CMOS power consumption Reading:
Low Power Design and Adiabatic Circuits P.Ranjith M.Tech(ICT)
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Lecture 11 – Design Concepts.
Low-Power CMOS Logic Circuit Topic Review 1 Part I: Overview (Shaw) Part II: (Vincent) Low-Power Design Through Voltage Scaling Estimation and Optimization.
Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #21 Principles of Adiabatic Processes Wed., Feb. 27.
Lecture 2 1 Computer Elements Transistors (computing) –How can they be connected to do something useful? –How do we evaluate how fast a logic block is?
Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #27 Reversible Computing Theory I: Reversible Logic Models Wed.,
School of Computer Science G51CSA 1 Computer Systems Architecture Fundamentals Of Digital Logic.
Chapter 07 Electronic Analysis of CMOS Logic Gates
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n Latches and flip-flops. n RAMs and ROMs.
Ratioed Circuits Ratioed circuits use weak pull-up and stronger pull-down networks. The input capacitance is reduced and hence logical effort. Correct.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 3: September 3, 2014 Gates from Transistors.
Memory and Storage Dr. Rebhi S. Baraka
Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #25 Limits on Adiabatics: Friction, Leakage, & Clock/Power Supplies.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 12.1 EE4800 CMOS Digital IC Design & Analysis Lecture 12 SRAM Zhuo Feng.
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.
CHAPTER-2 Fundamentals of Digital Logic. Digital Logic Digital electronic circuits are used to build computer hardware as well as other products (digital.
Basics of Energy & Power Dissipation
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 38: December 10, 2010 Energy and Computation.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #28 Reversible Scaling Analysis I: Cost Models & Leakage-Free Limit.
EE141 © Digital Integrated Circuits 2nd Combinational Circuits 1 A few notes for your design  Finger and multiplier in schematic design  Parametric analysis.
Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #23 Adiabatic Electronics & CMOS Mon., Mar. 11.
Introduction to Computing Systems and Programming Digital Logic Structures.
EE141 Combinational Circuits 1 Chapter 6 (I) Designing Combinational Logic Circuits Dynamic CMOS LogicDynamic CMOS Logic V1.0 5/4/2003.
CS151 Introduction to Digital Design Chapter 5: Sequential Circuits 5-1 : Sequential Circuit Definition 5-2: Latches 1Created by: Ms.Amany AlSaleh.
Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #22 Adiabatic Electronics & CMOS Fri., Mar. 1.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 20: October 25, 2010 Pass Transistors.
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Lecture 11: Sequential Circuit Design
Chapter #6: Sequential Logic Design
Appendix B The Basics of Logic Design
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
Sequential Circuits: Latches
Fundamentals of Computer Science Part i2
Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec
CSE 370 – Winter Sequential Logic - 1
Sequential Circuits: Latches
Sequential Logic.
Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec
Presentation transcript:

Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #24 Adiabatic CMOS cont. Wed., Mar. 13

Administrivia & Overview Don’t forget to keep up with homework!Don’t forget to keep up with homework! –We are  8 out of 14 weeks into the course. You should have earned  ~57 points by now.You should have earned  ~57 points by now. Course outline:Course outline: –Part I&II, Background, Fundamental Limits - done –Part III, Future of Semiconductor Technology - done –Part IV, Potential Future Computing Technologies - done –Part V, Classical Reversible Computing Fundamentals of Adiabatic Processes & logic - last Wed. & Fri. ( Spring Break )Fundamentals of Adiabatic Processes & logic - last Wed. & Fri. ( Spring Break ) Adiabatic electronics & CMOS logic families, - Mon. & TODAYAdiabatic electronics & CMOS logic families, - Mon. & TODAY Limits of adiabatics: Leakage and clock/power supplies. TODAYLimits of adiabatics: Leakage and clock/power supplies. TODAY RevComp theory I: Emulating Irreversible Machines - Fri. 3/15RevComp theory I: Emulating Irreversible Machines - Fri. 3/15 RevComp theory II: Bounds on Space-Time Overheads - Mon. 3/18RevComp theory II: Bounds on Space-Time Overheads - Mon. 3/18 (plus ~7 more lectures…)(plus ~7 more lectures…) –Part VI, Quantum Computing –Part VII, Cosmological Limits, Wrap-Up

Adiabatic computing in CMOS Monday: Adiabatic switching, split- level retractile & pipelined logic. Today: 2-Level Adiabatic Logic, general adiabatic logic

Some Timing Terminology For sequential adiabatic circuits: Tick: Time for a single ramp transitionTick: Time for a single ramp transition –adiabatic speed fraction f times the RC gate delay. Phase: Latency for a data value to propagate forward by 1 pipeline stage.Phase: Latency for a data value to propagate forward by 1 pipeline stage. Cycle: Minimum period for all timing information to return to its initial state.Cycle: Minimum period for all timing information to return to its initial state. Diadic: Two retractile levels per gateDiadic: Two retractile levels per gate –permits inverting or non-inverting logic. Dual rail: Two wires per logic valueDual rail: Two wires per logic value –permits universal logic with monodic gates Monadic: only 1 level

Some Figures of Demerit Some quantities we may wish to minimize:Some quantities we may wish to minimize: –Ticks/phase: proportional to logic propagation latencyproportional to logic propagation latency –Ticks/cycle: reciprocal to rate of data throughputreciprocal to rate of data throughput –Transistor-ticks/cycle: reciprocal to HW cost-efficiencyreciprocal to HW cost-efficiency –Number of required clock/power input signals: supplying these may be a significant component of system costsupplying these may be a significant component of system cost –Number of distinct voltage levels required: may affect reliability/power tradeoffmay affect reliability/power tradeoff

Some Interesting Questions About pipelined, sequential, fully-adiabatic CMOS logic:About pipelined, sequential, fully-adiabatic CMOS logic: –Q: Does it require an intermediate voltage level? A: No, you can get by with only 2 different levels.A: No, you can get by with only 2 different levels. –Q: What is the minimum number of externally provided timing signals you can get away with? A:  4 (  12 if split levels are used)A:  4 (  12 if split levels are used) –Q: Can the order-N different timing signals needed for long retractile cascades be internally generated within an adiabatic circuit? A: Yes, but not statically, unless N 2 hardware is usedA: Yes, but not statically, unless N 2 hardware is used –where N is the number of stages per full sequential cycle We now demonstrate these answers.We now demonstrate these answers.

Some Timing Examples See next slide for some detailed timing diagrams. N-level retractile cascades:N-level retractile cascades: –2N ticks/phase × 1 phase/cycle = 2N ticks/cycle 3-phase fully-static diadic SCRL3-phase fully-static diadic SCRL –8 ticks/phase × 3 phases/cycle = 24 ticks/cycle 2-phase fully-static monadic SCRL2-phase fully-static monadic SCRL –5 ticks/phase × 2 phases/cycle = 10 ticks/cycle 2-phase fully-static diadic SCRL2-phase fully-static diadic SCRL –6 ticks/phase × 2 phases/cycle = 12 ticks/cycle 6 tick/cycle dynamic SCRL detailed previously:6 tick/cycle dynamic SCRL detailed previously: –1 tick/phase × 6 phases/cycle = 6 ticks/cycle

Some SCRL timing diagrams

2LAL: 2-level Adiabatic Logic Dual-rail T-gate symbol:Dual-rail T-gate symbol: Basic buffer element:Basic buffer element: –cross-coupled T-gates Only 4 different timing signals, 4 ticks per cycle:Only 4 different timing signals, 4 ticks per cycle: –  i rises during tick i, falls during tick (i+2) mod 4 1 tick/phase × 4 phases/cycle = 4 ticks/cycle!1 tick/phase × 4 phases/cycle = 4 ticks/cycle! –Optimizes latency & throughput per gate. P P P :: in out 11 0 Tick # 00 11 22 33 AB P P A B A B

2LAL Cycle of Operation in in  1 in=0 0101 0101 1010 1111 out  1 out=0 0000 0000 in  0 1111 out  0 Tick number:

Input-Barrier, Clocked-Bias Latching N (1) Input conditionally lowers barrier (logic w. series/parallel barriers) (2) Clock applies bias force; conditional bit flip (3) Input removed, raising barrier & locking in state-change (4) Clock bias can retract. 2LAL is an example of this. 1 1 Input pulse Pulse ends

Shift Register Structure 1-tick delay per logic stage:1-tick delay per logic stage: Logic pulse timing & propagation:Logic pulse timing & propagation: in 22 11 33 22 44 33 out 11 44 in

More complex logic functions Non-inverting Boolean functions:Non-inverting Boolean functions: For inverting functions, must use quad-rail logic encoding:For inverting functions, must use quad-rail logic encoding: –To invert, just swap the rails! Zero-transistor “inverters.”Zero-transistor “inverters.” A B  A ABAB A B  ABAB A0A0 A0A0 A1A1 A1A1 A = 0A = 1

Hardware Efficiency issues Hardware efficiency: How many logic operations per unit hardware per unit time?Hardware efficiency: How many logic operations per unit hardware per unit time? Hardware spacetime complexity: How much hardware for how much time per logic op?Hardware spacetime complexity: How much hardware for how much time per logic op? We’re interested in minimizing: (# of transistors) × (# of ticks) / (gate cycle)We’re interested in minimizing: (# of transistors) × (# of ticks) / (gate cycle) SCRL inverter, w. return path:SCRL inverter, w. return path: –(8 transistors)  (6 ticks) = 48 transistor-ticks Quad-rail 2LAL buffer stage:Quad-rail 2LAL buffer stage: –(16 transistors)  (4 ticks) = 64 transistor-ticks

More SCRL vs. 2LAL SCRL reversible NAND, w. all inverters:SCRL reversible NAND, w. all inverters: –(23 transistors)  (6 ticks) = 138 T-ticks Quad-rail 2LAL AND:Quad-rail 2LAL AND: –(48 transistors)  (4 ticks) = 192 T-ticks Result of comparison: Although 2LAL minimizes # of rails, and # ticks/cycle, it does not minimize overall spacetime complexity.Result of comparison: Although 2LAL minimizes # of rails, and # ticks/cycle, it does not minimize overall spacetime complexity. –The question of whether 6-tick SCRL really minimizes per-op spacetime complexity among pipelined fully-adiabatic CMOS logics is still open. An opportunity for you to make a contribution!An opportunity for you to make a contribution!

Minimizing Power-Clock Signals How many external clock signals required?How many external clock signals required? –N-level-deep retractile cascade logic: 2N waveforms × 1 phase = 2N signals2N waveforms × 1 phase = 2N signals –6 tick/cycle, 6-phase dynamic SCRL: 6 waveforms × 6 phases = 36 signals6 waveforms × 6 phases = 36 signals –24 tick/cycle, 3-phase static SCRL: 12 waveforms × 3 phases = 36 signals12 waveforms × 3 phases = 36 signals –4 tick/cycle, 2LAL: 1 waveform × 4 phases = 4 signals!1 waveform × 4 phases = 4 signals! It turns out that 12 signals are sufficient to implement any combination of 2-level or 3- level logics (including retractile) on-chip!It turns out that 12 signals are sufficient to implement any combination of 2-level or 3- level logics (including retractile) on-chip!

How to Do It Circular 2LAL shifter; pulse-gated clocksCircular 2LAL shifter; pulse-gated clocks Tick # P0P0 P1P1 P2P2 P3P3 00 11 22 33 in P1P1 P0P0 P2P2 P1P1 P3P3 P2P2 out P0P0 P3P3 22 22 0 2

12-rail system: pros & cons Pros:Pros: –Completely solves adiabatic timing design problem –Enables mixtures of retractile, SCRL, and other logic styles on 1 chip –Enables simple fully-adiabatic SRAM & DRAM Cons:Cons: –Timing signals are dynamic –Known fully-static alternatives use order N 2 gates and signals for N-tick-long cycles –N can be large in a chip that includes deep retractile networks –Energy waste in driving the source/drain junction capacitances of all the T-gates even when timing pulse isn’t present (SOI reduces these parasitics)

Fully-Adiabatic DRAM cell 6T, 6 lines/row, 1 line/column (in/out together)6T, 6 lines/row, 1 line/column (in/out together) Read cycle:Read cycle: –Initially:  lines neutral, out neutral, R off –R for desired row turns on –  for desired row splits, driving out column –R turns off, out is read –  merges, out is reset Write cycle:Write cycle: –First, do read cycle. –in is set to out –W turns on –in changed to new value...

Fully-Adiabatic SRAM 10-T, 10 lines/row, 1 line/column10-T, 10 lines/row, 1 line/column Operation similar to DRAM, except:Operation similar to DRAM, except: Read-out:Read-out: T2 off; N2 retracts; T3 on; N2 asserts; T2 on, T3 off Write:Write: T2 off; N2 retracts; N1 retracts, copy of M presented on input; T1 on; in changes; T1 off, N1 asserts; N2 asserts; T2 on M N1N2 T1T2T3 in out