from High-frequency Clocks using DC-DC Converters Energy Recovery from High-frequency Clocks using DC-DC Converters Mehdi Alimadadi, Samad Sheikhaei, Guy Lemieux, Shahriar Mirabbasi, William Dunford University of British Columbia, Canada Patrick Palmer University of Cambridge, UK
Clock power in high-performance CPUs Problem Clock power in high-performance CPUs CPU Year Clock Power % Power for Clock Clock Power Intel McKinley 2002 (180nm) 1 GHz 130W 33% 43W Intel Montecito 2005 (90nm) 2.5 GHz 85W 30% 25W IBM Power 6 2007 (65nm) 5 GHz > 100W 22% > 22W Cause Charge big clock capacitor Cclk with energy Discharge Cclk energy to GND (WASTE IT!!) Repeat every clock cycle
Primary Contribution of This Work Discharge Cclk using DC-DC converter instead of GND Use converter to power useful load (Rload) Integrated clock drivers with DC-DC converters Net savings in power Voltage feedback (for regulation) Useful Load
Summary Results Explore 3 main DC-DC power converter topologies Buck converter our previous work [ ISSCC 2007 ] Boost converter this paper [ ISVLSI 2008 ] Buck-boost converter this paper [ ISVLSI 2008 ] 90nm layouts, 3GHz operation, < 0.3mm2 Clock-only power (input) Extra power to operate converter (input) Converter output power % clock energy recovered Buck converter [ ISSCC2007 ] 40mW 16mW 26mW 50% Boost converter 100mW 25mW 28mW 20% Buck-boost converter 72mW 48mW 30%
Background
Background – Typical Clocking Architecture Bottom mesh Final H-tree Clock Source Level 3 Gaters & Final drivers Level 1 & Level 2 H-tree
Background – Typical Clocking Architecture Clock distribution Majority of energy used by final drivers Levels 1, 2 H-trees Tunable delays (CVDs) to eliminate skew Low-swing, differential low power, noise immunity ~ 5W of power Level 3 Gaters reduce clock activity 50-85% (Power6) Can’t eliminate all activity still need a clock to compute Final clock drivers Full-rail swing tapered inverters drive hundreds latches, high power H-tree with ends shorted by Mesh low skew, high power ~15W to 40W of power
Background –Reducing Clock Power Clock distribution Low-swing (differential) signals Final drivers need full-rail Resonant clocking (saves 80%) Final drivers need square clock Final clock drivers Adiabatic switching Low-performance, < 100MHz Double-edge clocking Feasible, but complex flip-flops, larger loads Compatible with energy recovery in this paper
Background – Switch Mode Power Supplies Basic DC-DC converter topologies Buck Step down 0 Vout VDD Boost Step up VDD Vout Buck-boost Negative step up/down Vout 0
Background – Switch Mode Power Supplies DC-DC buck converter CMOS inverter as power switches Implementation of zero-voltage switching (ZVS) Turn on NMOS when Vinv= 0 Turn on PMOS when Vinv=Vdd
Integrated clock driver / power converter Background ISSCC 2007 Design ZVS delay circuit Integrated clock driver / power converter
Integration of Clock and SMPS CPU clock: 3GHz clock and large Cclk SMPS: large Mp, Mn drive chain
Integration of Clock and SMPS Combine the driver circuits
Key Concept: Energy Recycling Benefits Shared driver chain Cclk added to SMPS Red path NMOS drains Cclk wastes charge! Blue path Delay NMOS turn-on recovers clock charge! ZVS (zero voltage switching) in power electronics
ZVS Detailed Operation ZVS delay circuit D Delay only rising edge of Vn Implemented inside the clock chain
ZVS Detailed Operation (Mode 1) Mode 1 (0 < t < DTsw) Mp is ON Current builds up in the inductor Cclk charges up D = Duty cycle Tsw = Switching period
ZVS Detailed Operation (Mode 2) Mode 2 (DTsw < t < DTsw+Tzvs) Both power transistors are OFF Inductor current discharges Cclk Cclk charge is recycled to output load D = Duty cycle Tsw = Period Tzvs = ZVS delay
ZVS Detailed Operation (Mode 3) Mode 3 (DTsw+Tzvs < t < Tsw) Mn turns ON when Vclk 0 ZVS for Mn Inductor current decreases linearly D = Duty cycle Tsw = Period Tzvs = ZVS delay
Detailed Operation ZVS delay circuit for Mn Delay rising edge of Vn
Detailed Operation ZVS delay circuit for Mn Falling edges of Vp and Vn are synchronized
Simulation Voltages
Simulation Currents
Effective Efficiency How to measure power efficiency after clock drivers are integrated with DC-DC converters ? Converter gets “free energy” from clock Effective efficiency: how efficient a regular (standalone) power converter must be to equal the efficiency of integrated clock/power converter Raw efficiency Effective efficiency
Buck Converter – Simulation Results Open loop converter (no regulation) Higher efficiency at lowest duty cycle because only a fixed amount of energy is available from Cclk
ISSCC 2007 90nm test chip 1mm2, buck converter 0.27mm2
Buck Converter – Chip Measurement vs. Simulation Results Chip Measurement Simulation (3GHz)
ISVLSI 2008 New Design 1 Boost Converter
Boost Converter Basic operation 0th order result… Vout = D/(1-D)*Vdd Vclk provides power & timing 0th order result… Vout = D/(1-D)*Vdd
Boost Converter
Boost Converter – Simulation Results Open loop converter (no regulation) Higher efficiency at lowest duty cycle because only a fixed amount of energy is available from Cclk
ISVLSI 2008 New Design 2 Buck-boost Converter
Buck-boost Converter Basic operation Vclk provides power & timing 0th order result… Vout = -D2/(1-D)*Vdd
Buck-boost Converter
Buck-boost Converter Open loop converter (no regulation) Higher efficiency at lowest duty cycle because only a fixed amount of energy is available from Cclk
Results and Comparisons
Summary Results Clock-only power (input) Extra power to operate converter (input) Converter output power % clock energy recovered Buck converter [ ISSCC2007 ] 40mW 16mW 26mW 50% Boost converter 100mW 25mW 28mW 20% Buck-boost converter 72mW 48mW 30% 90nm layouts, 3GHz operation, < 0.3mm2
Comparative Results IBM Power6 100W@1V, 341mm2 Cclk = 13pF/mm2 Other work: fully on-chip DC-DC buck converter S. Abedinpour, B. Bakkaloglu, and S. Kiaei, "A Multi-Stage Interleaved Synchronous Buck Converter with Integrated Output Filter in a 0.18µm SiGe Process," ISSCC 2006, pp. 356–357 27mm2, 45MHz 65% power efficiency This work 0.27, 0.26, 0.20 mm2, including 0.1mm2 inductor area, 3GHz Cclk 20pF, equiv to 1.6mm2 of Power6 area DC-DC converter adds 12.5% area overhead LC filter: 310pH inductor, 350pF capacitor L and C similar and dominate layout area can stack to cut area in half Buck: 75 – 185% effective power efficiency (50% recovered) Boost: 25 – 110% effective power efficiency (20% recovered) Buck-boost: 20 – 66% effective power efficiency (30% recovered)
Conclusion Key concepts Limitations Future work High switching frequency saves area Combined drivers saves area and switching loss Recycled charge converter load discharges Cclk ZVS delay circuit lower power loss Limitations Regulation needs variable duty cycle clock May introduce additional clock jitter Mostly suitable for edge-triggered blocks (no latches) Future work Lots of improvements to make!
Thank you! Questions ?