Presentation is loading. Please wait.

Presentation is loading. Please wait.

Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #24 Adiabatic CMOS cont. Wed., Mar. 13.

Similar presentations


Presentation on theme: "Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #24 Adiabatic CMOS cont. Wed., Mar. 13."— Presentation transcript:

1 Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #24 Adiabatic CMOS cont. Wed., Mar. 13

2 Administrivia & Overview Don’t forget to keep up with homework!Don’t forget to keep up with homework! –We are  8 out of 14 weeks into the course. You should have earned  ~57 points by now.You should have earned  ~57 points by now. Course outline:Course outline: –Part I&II, Background, Fundamental Limits - done –Part III, Future of Semiconductor Technology - done –Part IV, Potential Future Computing Technologies - done –Part V, Classical Reversible Computing Fundamentals of Adiabatic Processes & logic - last Wed. & Fri. (----------------------- Spring Break ------------------------)Fundamentals of Adiabatic Processes & logic - last Wed. & Fri. (----------------------- Spring Break ------------------------) Adiabatic electronics & CMOS logic families, - Mon. & TODAYAdiabatic electronics & CMOS logic families, - Mon. & TODAY Limits of adiabatics: Leakage and clock/power supplies. TODAYLimits of adiabatics: Leakage and clock/power supplies. TODAY RevComp theory I: Emulating Irreversible Machines - Fri. 3/15RevComp theory I: Emulating Irreversible Machines - Fri. 3/15 RevComp theory II: Bounds on Space-Time Overheads - Mon. 3/18RevComp theory II: Bounds on Space-Time Overheads - Mon. 3/18 (plus ~7 more lectures…)(plus ~7 more lectures…) –Part VI, Quantum Computing –Part VII, Cosmological Limits, Wrap-Up

3 Adiabatic computing in CMOS Monday: Adiabatic switching, split- level retractile & pipelined logic. Today: 2-Level Adiabatic Logic, general adiabatic logic

4 Some Timing Terminology For sequential adiabatic circuits: Tick: Time for a single ramp transitionTick: Time for a single ramp transition –adiabatic speed fraction f times the RC gate delay. Phase: Latency for a data value to propagate forward by 1 pipeline stage.Phase: Latency for a data value to propagate forward by 1 pipeline stage. Cycle: Minimum period for all timing information to return to its initial state.Cycle: Minimum period for all timing information to return to its initial state. Diadic: Two retractile levels per gateDiadic: Two retractile levels per gate –permits inverting or non-inverting logic. Dual rail: Two wires per logic valueDual rail: Two wires per logic value –permits universal logic with monodic gates Monadic: only 1 level

5 Some Figures of Demerit Some quantities we may wish to minimize:Some quantities we may wish to minimize: –Ticks/phase: proportional to logic propagation latencyproportional to logic propagation latency –Ticks/cycle: reciprocal to rate of data throughputreciprocal to rate of data throughput –Transistor-ticks/cycle: reciprocal to HW cost-efficiencyreciprocal to HW cost-efficiency –Number of required clock/power input signals: supplying these may be a significant component of system costsupplying these may be a significant component of system cost –Number of distinct voltage levels required: may affect reliability/power tradeoffmay affect reliability/power tradeoff

6 Some Interesting Questions About pipelined, sequential, fully-adiabatic CMOS logic:About pipelined, sequential, fully-adiabatic CMOS logic: –Q: Does it require an intermediate voltage level? A: No, you can get by with only 2 different levels.A: No, you can get by with only 2 different levels. –Q: What is the minimum number of externally provided timing signals you can get away with? A:  4 (  12 if split levels are used)A:  4 (  12 if split levels are used) –Q: Can the order-N different timing signals needed for long retractile cascades be internally generated within an adiabatic circuit? A: Yes, but not statically, unless N 2 hardware is usedA: Yes, but not statically, unless N 2 hardware is used –where N is the number of stages per full sequential cycle We now demonstrate these answers.We now demonstrate these answers.

7 Some Timing Examples See next slide for some detailed timing diagrams. N-level retractile cascades:N-level retractile cascades: –2N ticks/phase × 1 phase/cycle = 2N ticks/cycle 3-phase fully-static diadic SCRL3-phase fully-static diadic SCRL –8 ticks/phase × 3 phases/cycle = 24 ticks/cycle 2-phase fully-static monadic SCRL2-phase fully-static monadic SCRL –5 ticks/phase × 2 phases/cycle = 10 ticks/cycle 2-phase fully-static diadic SCRL2-phase fully-static diadic SCRL –6 ticks/phase × 2 phases/cycle = 12 ticks/cycle 6 tick/cycle dynamic SCRL detailed previously:6 tick/cycle dynamic SCRL detailed previously: –1 tick/phase × 6 phases/cycle = 6 ticks/cycle

8 Some SCRL timing diagrams

9 2LAL: 2-level Adiabatic Logic Dual-rail T-gate symbol:Dual-rail T-gate symbol: Basic buffer element:Basic buffer element: –cross-coupled T-gates Only 4 different timing signals, 4 ticks per cycle:Only 4 different timing signals, 4 ticks per cycle: –  i rises during tick i, falls during tick (i+2) mod 4 1 tick/phase × 4 phases/cycle = 4 ticks/cycle!1 tick/phase × 4 phases/cycle = 4 ticks/cycle! –Optimizes latency & throughput per gate. P P P :: in out 11 00 0 1 2 3 Tick # 00 11 22 33 AB P P A B A B

10 2LAL Cycle of Operation in in  1 in=0 0101 0101 1010 1111 out  1 out=0 0000 0000 in  0 1111 out  0 Tick number: 0 1 2 3

11 Input-Barrier, Clocked-Bias Latching 0 0 0 1 1 0 N (1) Input conditionally lowers barrier (logic w. series/parallel barriers) (2) Clock applies bias force; conditional bit flip (3) Input removed, raising barrier & locking in state-change (4) Clock bias can retract. 2LAL is an example of this. 1 1 Input pulse Pulse ends

12 Shift Register Structure 1-tick delay per logic stage:1-tick delay per logic stage: Logic pulse timing & propagation:Logic pulse timing & propagation: in 22 11 33 22 44 33 out 11 44 in 1 2 3 4...

13 More complex logic functions Non-inverting Boolean functions:Non-inverting Boolean functions: For inverting functions, must use quad-rail logic encoding:For inverting functions, must use quad-rail logic encoding: –To invert, just swap the rails! Zero-transistor “inverters.”Zero-transistor “inverters.” A B  A ABAB A B  ABAB A0A0 A0A0 A1A1 A1A1 A = 0A = 1

14 Hardware Efficiency issues Hardware efficiency: How many logic operations per unit hardware per unit time?Hardware efficiency: How many logic operations per unit hardware per unit time? Hardware spacetime complexity: How much hardware for how much time per logic op?Hardware spacetime complexity: How much hardware for how much time per logic op? We’re interested in minimizing: (# of transistors) × (# of ticks) / (gate cycle)We’re interested in minimizing: (# of transistors) × (# of ticks) / (gate cycle) SCRL inverter, w. return path:SCRL inverter, w. return path: –(8 transistors)  (6 ticks) = 48 transistor-ticks Quad-rail 2LAL buffer stage:Quad-rail 2LAL buffer stage: –(16 transistors)  (4 ticks) = 64 transistor-ticks

15 More SCRL vs. 2LAL SCRL reversible NAND, w. all inverters:SCRL reversible NAND, w. all inverters: –(23 transistors)  (6 ticks) = 138 T-ticks Quad-rail 2LAL AND:Quad-rail 2LAL AND: –(48 transistors)  (4 ticks) = 192 T-ticks Result of comparison: Although 2LAL minimizes # of rails, and # ticks/cycle, it does not minimize overall spacetime complexity.Result of comparison: Although 2LAL minimizes # of rails, and # ticks/cycle, it does not minimize overall spacetime complexity. –The question of whether 6-tick SCRL really minimizes per-op spacetime complexity among pipelined fully-adiabatic CMOS logics is still open. An opportunity for you to make a contribution!An opportunity for you to make a contribution!

16 Minimizing Power-Clock Signals How many external clock signals required?How many external clock signals required? –N-level-deep retractile cascade logic: 2N waveforms × 1 phase = 2N signals2N waveforms × 1 phase = 2N signals –6 tick/cycle, 6-phase dynamic SCRL: 6 waveforms × 6 phases = 36 signals6 waveforms × 6 phases = 36 signals –24 tick/cycle, 3-phase static SCRL: 12 waveforms × 3 phases = 36 signals12 waveforms × 3 phases = 36 signals –4 tick/cycle, 2LAL: 1 waveform × 4 phases = 4 signals!1 waveform × 4 phases = 4 signals! It turns out that 12 signals are sufficient to implement any combination of 2-level or 3- level logics (including retractile) on-chip!It turns out that 12 signals are sufficient to implement any combination of 2-level or 3- level logics (including retractile) on-chip!

17 How to Do It Circular 2LAL shifter; pulse-gated clocksCircular 2LAL shifter; pulse-gated clocks 0 1 2 3 Tick # P0P0 P1P1 P2P2 P3P3 00 11 22 33 in P1P1 P0P0 P2P2 P1P1 P3P3 P2P2 out P0P0 P3P3 22 22 0 2

18 12-rail system: pros & cons Pros:Pros: –Completely solves adiabatic timing design problem –Enables mixtures of retractile, SCRL, and other logic styles on 1 chip –Enables simple fully-adiabatic SRAM & DRAM Cons:Cons: –Timing signals are dynamic –Known fully-static alternatives use order N 2 gates and signals for N-tick-long cycles –N can be large in a chip that includes deep retractile networks –Energy waste in driving the source/drain junction capacitances of all the T-gates even when timing pulse isn’t present (SOI reduces these parasitics)

19 Fully-Adiabatic DRAM cell 6T, 6 lines/row, 1 line/column (in/out together)6T, 6 lines/row, 1 line/column (in/out together) Read cycle:Read cycle: –Initially:  lines neutral, out neutral, R off –R for desired row turns on –  for desired row splits, driving out column –R turns off, out is read –  merges, out is reset Write cycle:Write cycle: –First, do read cycle. –in is set to out –W turns on –in changed to new value...

20 Fully-Adiabatic SRAM 10-T, 10 lines/row, 1 line/column10-T, 10 lines/row, 1 line/column Operation similar to DRAM, except:Operation similar to DRAM, except: Read-out:Read-out: T2 off; N2 retracts; T3 on; N2 asserts; T2 on, T3 off Write:Write: T2 off; N2 retracts; N1 retracts, copy of M presented on input; T1 on; in changes; T1 off, N1 asserts; N2 asserts; T2 on M N1N2 T1T2T3 in out


Download ppt "Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #24 Adiabatic CMOS cont. Wed., Mar. 13."

Similar presentations


Ads by Google