Presentation is loading. Please wait.

Presentation is loading. Please wait.

 C. H. Ziesler et al., 2003 Energy Recovery Design for Low-Power ASICs Conrad H. Ziesler 1 Joohee Kim 1 Suhwan Kim 2 Marios C. Papaefthymiou 1 1 Advanced.

Similar presentations


Presentation on theme: " C. H. Ziesler et al., 2003 Energy Recovery Design for Low-Power ASICs Conrad H. Ziesler 1 Joohee Kim 1 Suhwan Kim 2 Marios C. Papaefthymiou 1 1 Advanced."— Presentation transcript:

1  C. H. Ziesler et al., 2003 Energy Recovery Design for Low-Power ASICs Conrad H. Ziesler 1 Joohee Kim 1 Suhwan Kim 2 Marios C. Papaefthymiou 1 1 Advanced Computer Architecture Laboratory University of Michigan, Ann Arbor 2 T. J. Watson Research Center IBM Research, Yorktown Heights

2  C. H. Ziesler et al., 2003

3 Tutorial Outline Introduction to energy recovery Application of energy recovery to SOC design ● Fine-grain dynamic pipelines ● Finite state machines ● Memory arrays ● Multi-GHz clocking

4  C. H. Ziesler et al., 2003 Introduction to Energy Recovery ● Power dissipation in static CMOS design ● Energy recovery operation ● Implementation issues ● Quick glance at history

5  C. H. Ziesler et al., 2003 Static CMOS Power ● Leakage Power ● Crowbar Power ● Active Power

6  C. H. Ziesler et al., 2003 Static CMOS Active Power ● Active power to transfer charge to/from output capacitor – Energy stored in capacitor when charged to voltage V is ½ CV 2 – Energy dissipated in R1 to charge capacitor is ½ CV 2 – To discharge capacitor, all energy stored in it is dissipated in R2 ● Note: Voltage supply level fixed

7  C. H. Ziesler et al., 2003 What if we had a time-varying power-supply? ● Turn on switch when time- varying voltage source is at same level as the voltage on output capacitor. ● Voltage on output capacitor slews up or down at same rate as voltage source. ● Most of the energy stored in capacitor is returned to time- varying power-supply.

8  C. H. Ziesler et al., 2003 Model Simplification ● Use same switch for charging and discharging currents (if it can conduct in both directions). ● Time-varying voltage source Vpc, called power-clock, provides energy and synchronization. ● Textbooks sometimes draw ideal energy recovery model using current source instead of voltage source. Principal of operation is the same.

9  C. H. Ziesler et al., 2003 Example: Linear Ramp time Vpc IRIR VRVR PRPR E R = V R I R Vdd VCVC

10  C. H. Ziesler et al., 2003 Energy Dissipation ● Integrate charging or discharging current through resistor R ● Dissipation for either charging or discharging is approximately (RC/T) CV 2 – T is rise time of Vpc (linear ramp) If T >> RC delay and recovered energy is efficiently recycled, power savings can be substantial.

11  C. H. Ziesler et al., 2003 Implementation Issues ● How is power-clock generated? ● What capacitances should one try to recover energy from? ● How does the timing work if power and clock are intermingled? ?????

12  C. H. Ziesler et al., 2003 Power-Clock Generation Challenges ● Need to drive a large mostly capacitive load with a controlled rise/fall time waveform. ● Capacitive load may change as different gates switch on or off. ● Must do so with much less than CV 2 dissipation, otherwise gains are lost.

13  C. H. Ziesler et al., 2003 Where to recover energy from? ● Power-clock is time-varying – Synchronize energy recovery circuits with power-clock. ● Energy recovery requires small RC delay with respect to rise/fall time T – Aim at small RC/T ratio. ● Energy recovery saves on active switching power – Does not necessarily make sense to apply it on low-activity loads.

14  C. H. Ziesler et al., 2003 Good Candidates ● Large capacitive loads that frequently switch – Clock networks – Memory bit lines – LCD row/column drivers ● High activity loads that are synchronous to power-clock – Dynamic pipelined logic

15  C. H. Ziesler et al., 2003 Quick Glance at History ● Physics (1970s) ● Logical reversibility of computation ● Connection to thermodynamics (“adiabatic” computing) ● No absolute minimum to energy dissipation, if computing is arbitrarily slow. (Does not have to be slow, however.) ● Engineering (1990s) ● Logic circuitry ● Energy recycling circuitry ● VLSI prototyping

16  C. H. Ziesler et al., 2003 ● The case for reversible computation P. Solomon and D. Frank – IWLPD'94 ● Asymptotically zero-energy split-level charge recovery logic S.G. Younis and T. F. Knight, Jr. – IWLPD’94 ● Clock-powered CMOS: A hybrid adiabatic logic style for energy-efficient computing N. Tzartzanis and W. C. Athas – ARVLSI'99 And many, many others.... Sample Design Points from’90s

17  C. H. Ziesler et al., 2003 Split-Level Charge-Recovery Logic input   /P P output Initially,  and /  at Vdd/2, P at Gnd, and /P at Vdd. On valid input, the pass gate is turned on by gradually swinging P and /P. Rails  and /  “split”, gradually swinging to Vdd and Gnd. As soon as output is sampled, pass gate is turned off. Internal node is restored by gradually swinging  and /  back to Vdd/2. When is the gate output restored?  P internal node

18  C. H. Ziesler et al., 2003 Reversible Pipelines Gate outputs are restored using a reverse pipeline whose elements perform the inverse function of the forward pipeline Multiple phases required Split-level charge recovery logic block input EFGH    P1P1 P P2P2 E -1 F -1 G -1 H -1  /P 1   /P 2 P2P2 P1P1    output node set by E, restored by F -1...

19  C. H. Ziesler et al., 2003 The Remainder of this Tutorial ● Application of energy recovery to SOC Modules – Low switching activity state-machines or datapaths – Fine-grain pipelines with high switching activity – Multi-gigahertz class design – Memory and memory-like arrays – Interconnect and I/O Defining Characteristics ● Low overhead, fast operation, no reversibility ● Energy recovering chips that work and save power over conventional operation!

20  C. H. Ziesler et al., 2003 Targeting Energy Recovery to ASICs and SOC Modules Module characterization – Throughput requirements ● How much time is availible to compute? ● Is the application easily pipelined? – Expected switching activity ● Can it be easily reduced? – Memory, computation, or control? – Clocking requirements ● Fixed frequency? Range of frequencies?

21  C. H. Ziesler et al., 2003 Low Switching Activity Finite-State Machines or Datapaths

22  C. H. Ziesler et al., 2003 Low Switching Activity Modules ● Best place to focus efforts is on clock tree and flip-flops – Switching activity is low everywhere else – Every capacitance connected to the clock switches every cycle ● Consider applying resonant clocking system

23  C. H. Ziesler et al., 2003 Sample Dissipation Breakdown ● Focus on clock dissipation — 20—50% of total power ● Apply energy recovery — Single-phase clock node ● Target design automation — Automate energy recovery

24  C. H. Ziesler et al., 2003 Resonant Clock ASIC ● Compatible with ASIC flow ● Synthesized by Conrad Ziesler and Joohee Kim using in-house standard-cells library and commercial tools ● Energy recovering clock tree and SRAM word/bit lines ● Low-cost bulk CMOS process: TSMC 0.25  m, 108-pin PGA package, through MOSIS ● High frequency (300MHz) ● Low voltage (1.0-1.5V)

25  C. H. Ziesler et al., 2003 ASIC Statistics ● Discrete wavelet transform (DWT) ● 3,897 gates, 413 ffs ● 15,571 transistors ● 400um x 900um ● 13.6 pF, 21 nH ● 300 MHz, 1.5V ● 0.25um logic process Dual-mode DWT Clock generator

26  C. H. Ziesler et al., 2003 Chip Microphotograph

27  C. H. Ziesler et al., 2003 System Overview...

28  C. H. Ziesler et al., 2003 The Energy Recovering Flip-Flop ● Clock signal: Single-phase resonant sinusoid ● Probe activates state element only if next state differs from present state. ● Low voltage operation at high speeds ● Delay similar to conventional flip-flops ● Fully compatible with standard-cell ASIC flow 14 transistors 84  m 2 probestate element

29  C. H. Ziesler et al., 2003 Flip-Flop Power Characterization Order of magnitude difference between idle (D, Q constant) and active (D, Q changing) dissipation. frequency (MHz) Flip-Flip Energy per Cycle 1000 100 10 1 energy (fJ) 200 300 500

30  C. H. Ziesler et al., 2003 Resonant Clock Generator ● Resonate entire clock capacitance with small inductor ● Pump resonant system with NMOS switch at appropriate times ● NMOS switch only conducts incremental losses whenever “on” NMOS Switch Control Pre-driver Driver

31  C. H. Ziesler et al., 2003 Clock Generator Operation

32  C. H. Ziesler et al., 2003 Our Contributions ● Application of energy recovery to ASIC clock network – Fully synthesized ASIC – Resonant LC tank forms power-clock – On-chip power-clock generator w/ off-chip inductor ● Fabrication in 0.25  m standard CMOS process – Compare energy recovering clocking system with conventional clocking system ● Direct DC power measurements for complete system – Correct operation, 100—300MHz – 20—50% savings in total power consumption – 80—90% savings in clock power

33  C. H. Ziesler et al., 2003 Recovering vs. Conventional Hardware ● Synthesized dual-mode ASIC — Conventional — Energy recovery ● Dual-mode flip-flop cell — Conventional clock tree with conventional flip-flop — Resonant clock tree with energy- recovering flip-flop ● Direct comparison of dissipation at target throughput using identical hardware structures

34  C. H. Ziesler et al., 2003 Correct Function signature output (conventional) signature output (energy recovery) signature output (Verilog simulator)

35  C. H. Ziesler et al., 2003 Summary Energy recovery technologies for reducing clock dissipation ● Single-phase sinusoidal clock ● Efficient, LC resonant clock generator ● Low power sinusoidally clocked flip-flop Key attributes ● Compatible with ASIC design flow, low overhead ● High frequency (100—300MHz) ● Low voltage (1—1.5V) ● Real, working chips (in 0.25  m logic process)

36  C. H. Ziesler et al., 2003 Break

37  C. H. Ziesler et al., 2003 Fine-Grain Pipelines with High Switching Activity

38  C. H. Ziesler et al., 2003 High Switching Activity Pipelines Problem Parameters ● Can't reduce switching activity ● Need lots of throughput ● Can easily do fine-grained pipelining Solution ● Dynamic energy recovering logic – True single-phase source-coupled adiabatic logic (SCAL)

39  C. H. Ziesler et al., 2003 SCAL-D Logic Family ● Dynamic logic family that works with simple sinusoidal LC resonant clock ● Alternate PMOS and NMOS type gates like NMOS/PMOS ''zipper'' domino logic ● Test chip: 200MHz 8-bit multiplier implemented in 0.5  m bulk silicon process

40  C. H. Ziesler et al., 2003 SCAL-D Topology ● Sense amplifier ● Precharge diodes ● Current switches ● Evaluation tree ● Current tail ● Power-clock Vss bias at bt af bf of ot NMOS NAND gate Power clock ● Single-phase sinusoidal power-clock ● Minimum-size low-swing evaluation tee ● Built-in state element ● Dual-rail noise-tolerant design

41  C. H. Ziesler et al., 2003 SCAL-D Operation NMOS Precharge Phase ● rising edge of power-clock ● charge transferred to load

42  C. H. Ziesler et al., 2003 SCAL-D Operation NMOS Evaluate Phase ● peak of power-clock ● non-adiabatic evaluation current ● current purposefully limited

43  C. H. Ziesler et al., 2003 SCAL-D Operation NMOS Sense Phase ● falling edge of power clock ● sense amplifiers drive load

44  C. H. Ziesler et al., 2003 SCAL-D Operation NMOS Hold Phase ● negative peak of power clock ● next pipeline stage samples outputs

45  C. H. Ziesler et al., 2003 SCAL-D Implicit Pipelining ● Free pipelining : no flip-flops needed Example: Pipelined "and—full adder" cell from array multiplier SCAL-D: 85 transistors Static CMOS: 100 transistors

46  C. H. Ziesler et al., 2003 ● Minimalist approach ● Simple tools: magic and spice ● Low-cost standard CMOS process: HP 0.5um, 40-pin DIP package, through MOSIS. ● Operational chip demonstrates practicality of energy-recovering circuit design ● Non-trivial size (8-bit operands, on-chip clock, self-test) ● High throughput ● Low energy dissipation Multiplier Chip ● Suhwan Kim (while still at UM) and Conrad Ziesler received First Prize in VLSI Design Contest, DAC 2001.

47  C. H. Ziesler et al., 2003 Chip Microphotograph

48  C. H. Ziesler et al., 2003 Test Chip Overview ● Two multipliers with self-test per chip (minimum size die) ● Integrated power-clock generator ● Resonant LC oscillator

49  C. H. Ziesler et al., 2003 Input BILBO (self-test) Product array Multiplicand buffers Result summation Result buffers Self-test Control Output BILBO (self-test) Multiplier and Self-Test ● 9,048 devices in multiplier array, 2,806 devices in self-test circuitry ● Implemented entirely in energy-recovering dynamic logic family.

50  C. H. Ziesler et al., 2003 Energy Comparison 2.7V 1.9V 2.2V 1.9V 1.6V 2.9V 50 100200 Frequency (MHz) 100 200 300 400 500 Dissipation per Cycle (pJ) 0 140 Energy recovery 4 stage static CMOS 2.3V 3.0V 2.0V 8 stage static CMOS 2 stage static CMOS 4x

51  C. H. Ziesler et al., 2003 ● Zero-voltage switching ● LC- resonant clock generation ● External/bondwire inductor L ● Resistive/capacitive adiabatic load ● Compact: 170 x 115 um Single-Phase Power-Clock Generator Power Clock Pulse Gen Ring Osc Gate Drive Power Switches: S1, S2 25 tr. 19 tr. 10 tr. Ring Osc. Pulse Gen. Gate Drive _+_+ _+_+ load S2 S1 PC L Vdd Vss Vbp Vbn

52  C. H. Ziesler et al., 2003 Switch Timings ● Inductor current builds linearly when switches are on. ● Peak switch current less than peak inductor current. ● Switch S1 turned on at positive voltage peak. ● Switch S2 turned on at negative voltage peak. ● Fixed ''on-window'' controlled by pulse generator. Inductor current Output voltage

53  C. H. Ziesler et al., 2003 ● Single-phase sinusoidal waveform @140MHz ● ~60pF load, ~10nH external inductor ● One DC supply (Vdd, Vss), two DC biases (NMOS, PMOS) Power-Clock Waveform Vdd Vss PMOS Bias NMOS Bias Power-Clock

54  C. H. Ziesler et al., 2003 Multi-GHz Clocking

55  C. H. Ziesler et al., 2003 Multi-Gigahertz Designs ● Need speed more than anything ● Clock distribution and skew biggest problems ● Retiming desirable for performance ● Need multiple phases of clock ● Clock power dominates

56  C. H. Ziesler et al., 2003 Rotary Clock TM Principles ● Consider MultiGig’s Rotary-Clock network (related slides courtesy of John Wood, MultiGig Inc.) ● Multiple transmission line loops arranged in grid ● Each loop supports a square-wave oscillation in lock step with neighbors ● Small variations between loops average/cancel out ● Ultra-high frequency low-skew clocking

57  C. H. Ziesler et al., 2003 Multi-Ring Visualization Phase lock at junctions without PLL

58  C. H. Ziesler et al., 2003 Numerical Example – Large Chip Process: 0.18u CMOS 1.8v Size: 15 x 15 mm Global clock: 2.5 GHz FFs: 2 million, Total capacitance: 10,000pF Metal: Width 40u, Spacing 40u, Thickness 1.5u Copper x 2 Active Area: < 1% Grid X pitch: 1300u, Grid Y pitch: 1300u Power CV 2 F: 78 W Rotary Power: 6 W

59  C. H. Ziesler et al., 2003 Chip Microphotograph

60  C. H. Ziesler et al., 2003 Test Chip Measurements Power: 75% less than CV 2 F of clock capacitance

61  C. H. Ziesler et al., 2003 Test chip #2 Switched-capacitor tuning (100MHz +/- 35% measured) Test chip #3 – quad ring 3.5 GHz Tunable (varactor +/- 10%) Jitter – below measurable levels Later Test Chips

62  C. H. Ziesler et al., 2003 Benefits of Rotary Clock Architecture ● Scalable in size and frequency. ● Reduces dynamic clock power. ● Guaranteed near-zero skew. ● Precise skew scheduling possible. ● Negligible jitter. ● Inherently low noise ● Tolerant to process, temperature, and supply variation.

63  C. H. Ziesler et al., 2003 Memories and Array-Like Structures

64  C. H. Ziesler et al., 2003 Memory and Array Structures ● Many heavily loaded wires all switching ● Already optimized algorithms to reduce number of accesses ● Already using low-power sense-amplifiers ● Consider energy recovery on the bit wires

65  C. H. Ziesler et al., 2003 Breakdown of CPU Dissipation strongARM, JSSC 1996 Memory area: 60% of CPU

66  C. H. Ziesler et al., 2003 Memory Power Long bit/word lines with large capacitance high power consumption 256x256 array

67  C. H. Ziesler et al., 2003 Energy Recovery Driver Power source Q 0 - V DD C Sinusoidal power-clock Synchronization ? — Correctness — Efficiency E = ( RC / t ) CV DD 2 ER Driver

68  C. H. Ziesler et al., 2003 Outline Energy recovering driver — Synchronization — Feedback Energy recovering SRAM — Operation Simulation of full-custom ERSRAM Voltage scaling behavior

69  C. H. Ziesler et al., 2003 One Extreme: Fully Gradual Transition Maximum power efficiency Relatively slow operation ON PC driver output

70  C. H. Ziesler et al., 2003 The Other Extreme: Abrupt Transition Low power efficiency Fast operation ON PC driver output

71  C. H. Ziesler et al., 2003 Partially Gradual Transition Energy efficient Fast ON PC driver output

72  C. H. Ziesler et al., 2003 ER Driver Core Pull-up control Pull-down control PC ch dch Single-phase power-clock Transmission gate ( wide range of operation ) Synchronizing circuitry driver output

73  C. H. Ziesler et al., 2003 Pull-Up Control PC ch_out PC ch ch_out Transistors sized to achieve correct timing and pulse width

74  C. H. Ziesler et al., 2003 Pull-Down Control PC dch_out Transistors sized to achieve correct timing and pulse width PC dch dch_out

75  C. H. Ziesler et al., 2003 Tolerance to Control Timing Variations dch ch Minimum required Maximum possible Minimum required Maximum possible PC driver output

76  C. H. Ziesler et al., 2003 Dissipation During Consecutive Charging/Discharging PC output Consecutive discharging PC output Consecutive charging

77  C. H. Ziesler et al., 2003 Complete Structure with Feedback Pull-up control Pull-down control PC ch dch Feedback circuitry —Prevents redundant dissipation during consecutive charging/discharging driver output

78  C. H. Ziesler et al., 2003 ERSRAM Operation WL BLT BLF writeidleread WL: Explicit discharge after each access for low power BL: Precharge low (with modified sense amp) for single cycle read and write

79  C. H. Ziesler et al., 2003 ERSRAM Architecture 128 x 256 Cell array Bl driver Sense amp Wl driver 128 x 256 Cell array Bl driver Sense amp Wl driver 2 x 128 x 256 Only drivers and sense amplifiers are different from that of conventional SRAM

80  C. H. Ziesler et al., 2003 Simulation TSMC 0.35mm process. Full-custom 256x256 conventional and ER SRAM Hspice simulation

81  C. H. Ziesler et al., 2003 Power Breakdown

82  C. H. Ziesler et al., 2003 Wide Operation Range Tolerant to variations in operating conditions Memory failure due to mistiming in sense amp enable  : Conv. SRAM O : ERSRAM

83  C. H. Ziesler et al., 2003 Voltage Scaling Functions correctly down to 0.7V, 1MHz

84  C. H. Ziesler et al., 2003 Summary Static RAM with novel energy recovering driver Single-phase power-clock Single-cycle read and write High speed, low complexity Operation range : 0.7V, 1MHz – 3.5V, 500MHz with 0.35  m process Energy efficiency: 2.6x at 3V, 300MHz, alternating read-write access

85  C. H. Ziesler et al., 2003 Previous Work on SRAMs Multi-phase power-clock Multi-cycle operation Relatively high complexity and low/moderate speeds Inefficient during consecutive charging Not voltage scalable  Somasekhar, Ye and Roy, ISLPED 1995  Tzartzanis and Athas, ISLPED 1996  Moon and Jeong, JSSC 1998  Avery and Jabri, ISLPED 1998  Kwon, Lim and Chae, ISLPED 2000  Ng and Lau, JCSC 2000  Tzartzanis, Athas and Svensson, ESSCC 2000

86  C. H. Ziesler et al., 2003 Interconnect and I/O ● Highly capacitive parallel buses that take an entire cycle for data transfer ● Can't reduce switching activity or drive voltage ● Low slew-rates desirable ● Consider energy recovering driver – Techniques similar to memories

87  C. H. Ziesler et al., 2003 Tutorial Summary Introduction to energy recovery Application of energy recovery to SOC design – Finite state machines – Fine-grain dynamic pipelines – Multi-GHz clocking – Memory arrays and I/O Functional chips in bulk silicon demonstrating substantial energy savings and fast operation in practice

88  C. H. Ziesler et al., 2003 Reference Material Energy recovery group web site at U. Michigan http://www.eecs.umich.edu/acal/energyrecovery Extensive list of references in our tutorial paper in SOC’03 proceedings The Physics of Computation R. Feynman


Download ppt " C. H. Ziesler et al., 2003 Energy Recovery Design for Low-Power ASICs Conrad H. Ziesler 1 Joohee Kim 1 Suhwan Kim 2 Marios C. Papaefthymiou 1 1 Advanced."

Similar presentations


Ads by Google