On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.

On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown University

Outline  Introduction  Adjustable Buffering Methodology  Experiments & Results  Conclusions

Power: First-Class Objective Power bottleneck to Moore’s law Power-frequency tradeoff exists in CMOS circuits Much higher power required to operate at high frequency Techniques to exploit power-frequency tradeoff are of interest Allow high freq. operation Can give significant power reduction when max. performance not required Mainstream approach: Dynamic voltage and frequency scaling (DVFS) Power-frequency tradeoff with V DD scaling

Dynamic V DD & Freq. Scaling Scale down V DD and freq. when high performance not needed Limitations of DVFS V DD cannot be scaled down indefinitely Range of V DD scaling is small and diminishing Extremely high power at high V DD  reduce max. V DD High V th to reduce leakage, noise margins, variability, soft errors  increase min. V DD Discrete allowed voltages Ideal frequency-power from V DD scaling Actual frequency-power from DVFS Our objective: enable additional modes to exploit frequency-power tradeoff Useable when V DD cannot be scaled further Useable without DVFS

Proposal: Adjustable Buffering Our approach, like DVFS, provides runtime-selectable low- power modes  supplement or replace DVFS Our approach, like DVFS, provides runtime-selectable low- power modes  supplement or replace DVFS Key idea: Lot of logic added for performance, not functionality  Turn this logic off when high-performance not needed Key idea: Lot of logic added for performance, not functionality  Turn this logic off when high-performance not needed Poor interconnect scaling  large number of repeaters Poor interconnect scaling  large number of repeaters 20-30% of cells are repeaters 20-30% of cells are repeaters Fat repeaters are used to improve delay but consume a lot of power Fat repeaters are used to improve delay but consume a lot of power We modify repeaters to dynamically adjust their driving capacity We modify repeaters to dynamically adjust their driving capacity 32X16X / 32X Select Transform

Outline  Introduction  Adjustable Buffering Methodology  Experiments & Results  Conclusions

Adjustable Repeater Design We add PMOS-NMOS pair to turn half the devices off dynamically What power components are likely to reduce in low-power mode? Short-circuit power: during switching, PMOS & NMOS ON momentarily  short circuit between V DD and V SS High when transition time (slew) is large Subthreshold leakage: when one of PMOS-NMOS pair between V DD and V SS ON Traditional Inverter (INVX8) Adjustable Inverter “LPM” = ON  only half devices operational (low-power mode). “LPM” = OFF  all devices operational (high-performance mode). Control Gate

Adjustable Repeater Requirements Low area overhead Low area overhead Added PMOS-NMOS pair (LPM devices) takes area Added PMOS-NMOS pair (LPM devices) takes area LPM signal to be routed or locally generated LPM signal to be routed or locally generated Layout of the new cell must be simple and low area overhead Layout of the new cell must be simple and low area overhead High performance in high-performance mode High performance in high-performance mode On-resistance of LPM devices may reduce performance On-resistance of LPM devices may reduce performance Good power reduction in low-power mode Good power reduction in low-power mode

Area Overhead Problem: High performance needed when LPM signal OFF Problem: High performance needed when LPM signal OFF  use large control gates  large area overhead Solution: Share control gates among multiple repeaters Control Gate Delay overhead: increase in delay of adjustable repeater over traditional repeater

Control Gate Sharing Fewer control gates but virtual V DD (V’ DD ) and V SS (V’ SS ) need routing How many control gates needed? How many control gates needed? Compute simultaneous switching rate (SSR) by finding the max. #repeaters that have overlapping timing windows. Time = O(RlogR) (R = #repeaters) Compute simultaneous switching rate (SSR) by finding the max. #repeaters that have overlapping timing windows. Time = O(RlogR) (R = #repeaters) Find total width of all repeater devices controlled by CGs (=WR) Find total width of all repeater devices controlled by CGs (=WR) For good performance, width of control gates = 4 x SSR x WR For good performance, width of control gates = 4 x SSR x WR Typical SSR=~10%  small area overhead LPM devices shared by two inverters V’ DD V’ SS

Ensuring High Performance Problem: Adjustable repeaters ~5% slower when LPM signal OFF Problem: Adjustable repeaters ~5% slower when LPM signal OFF  Up to ~5% reduction in circuit performance Solution: do not use adjustable repeaters on timing-critical paths Solution: do not use adjustable repeaters on timing-critical paths Additional constraint: slew constraints not violated when LPM signal is OFF or ON. Additional constraint: slew constraints not violated when LPM signal is OFF or ON. We characterize adjustable repeaters (i.e., find delay, slew, power, input capacitance) and then substitute traditional repeaters with adjustable repeaters subject to delay and slew constraints. We characterize adjustable repeaters (i.e., find delay, slew, power, input capacitance) and then substitute traditional repeaters with adjustable repeaters subject to delay and slew constraints.  No loss in circuit performance & no slew violations

Power Reduction in Low-Power Mode Traditional Inverter Adjustable Inverter OFF Short-Circuit EnergyLeakage LVT43%28% SVT35%26% HVT22% Reduction in short-circuit energy and leakage for INVX8 Short-circuit energy and leakage reduce

Outline  Introduction  Adjustable Buffering Methodology  Experiments & Results  Conclusions

Experimental Validation Circuits: s38417 (8,890 cells), AES (15,272), OpenRisc (46,732) Circuits: s38417 (8,890 cells), AES (15,272), OpenRisc (46,732) Tools: Synopsys HSPICE (SPICE), Design Compiler (synthesis, timing and power analysis); Cadence SoC Encounter (P&R), SignalStorm (library characterization); Artisan TSMC 90nm library models Tools: Synopsys HSPICE (SPICE), Design Compiler (synthesis, timing and power analysis); Cadence SoC Encounter (P&R), SignalStorm (library characterization); Artisan TSMC 90nm library models Other settings: power and timing analysis at slow corner, V DD of 1.1V and 0.9V, activity factor of 0.01. Other settings: power and timing analysis at slow corner, V DD of 1.1V and 0.9V, activity factor of 0.01.

Results: Power Reduction Both dynamic and leakage power reduce Both dynamic and leakage power reduce 6-12% reduction in total power at low-power mode 6-12% reduction in total power at low-power mode V DD =1.1 LPM=0 We perform comparative analysis of: We perform comparative analysis of: Circuit with DVFS Circuit with DVFS + LPM V DD =1.1 LPM=1 V DD =0.9 LPM=0 V DD =0.9 LPM=1

Results: Area Overhead Logic area overhead due to control gates Logic area overhead due to control gates Depends on SSR Depends on SSR Smaller if control gates can be placed in whitespace Smaller if control gates can be placed in whitespace Routing overhead Routing overhead LPM, LPM routed to control gates LPM, LPM routed to control gates routing overhead depends on locations of control gates routing overhead depends on locations of control gates # control gates small  overhead small # control gates small  overhead small V’ DD, V’ SS routed to all repeaters V’ DD, V’ SS routed to all repeaters For overhead estimation, nets assumed to be Steiner trees For overhead estimation, nets assumed to be Steiner trees

Outline  Introduction  Adjustable Buffering Methodology  Experiments & Results  Conclusions

Conclusions Presented a novel technique that dynamically trades off power and performance by turning off devices not needed at less than max. performance Presented a novel technique that dynamically trades off power and performance by turning off devices not needed at less than max. performance Both leakage and dynamic power reduce; total power reduction is 6-12% on our testcases Both leakage and dynamic power reduce; total power reduction is 6-12% on our testcases By sharing of control gates, area overhead reduced to <5.57% By sharing of control gates, area overhead reduced to <5.57% No adverse affect on performance of the circuit when LPM signal OFF No adverse affect on performance of the circuit when LPM signal OFF Future work: Future work: Actual layout of adjustable repeaters with routing of V’ DD, V’ SS, LPM nets to accurately estimate power, performance, area impacts Actual layout of adjustable repeaters with routing of V’ DD, V’ SS, LPM nets to accurately estimate power, performance, area impacts Customization of more cells especially clock repeaters to further improve power-performance tradeoff Customization of more cells especially clock repeaters to further improve power-performance tradeoff

Thank You Questions?

On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.

Similar presentations

Presentation on theme: "On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.

Similar presentations

Presentation on theme: "On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown."— Presentation transcript:

Similar presentations

About project

Feedback