UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.

UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory Presented new technique to dynamically trade-off power- performance that turns off devices not needed at less than peak performancePresented new technique to dynamically trade-off power- performance that turns off devices not needed at less than peak performance Both leakage and dynamic power reduce; total power reduction is 6-12% on our testcasesBoth leakage and dynamic power reduce; total power reduction is 6-12% on our testcases By sharing of LPM devices, area overhead reduced to <5.57%By sharing of LPM devices, area overhead reduced to <5.57% No adverse affect on performance of the circuit when LPM signal is off.No adverse affect on performance of the circuit when LPM signal is off. Ongoing work: Actual layout of custom repeater with routing of V’ DD, V’ SS, LPM nets to accurately estimate power, performance, area overheadActual layout of custom repeater with routing of V’ DD, V’ SS, LPM nets to accurately estimate power, performance, area overhead Customizing more cells especially clock repeaters to further improve power-performance trade-off.Customizing more cells especially clock repeaters to further improve power-performance trade-off. Presented new technique to dynamically trade-off power- performance that turns off devices not needed at less than peak performancePresented new technique to dynamically trade-off power- performance that turns off devices not needed at less than peak performance Both leakage and dynamic power reduce; total power reduction is 6-12% on our testcasesBoth leakage and dynamic power reduce; total power reduction is 6-12% on our testcases By sharing of LPM devices, area overhead reduced to <5.57%By sharing of LPM devices, area overhead reduced to <5.57% No adverse affect on performance of the circuit when LPM signal is off.No adverse affect on performance of the circuit when LPM signal is off. Ongoing work: Actual layout of custom repeater with routing of V’ DD, V’ SS, LPM nets to accurately estimate power, performance, area overheadActual layout of custom repeater with routing of V’ DD, V’ SS, LPM nets to accurately estimate power, performance, area overhead Customizing more cells especially clock repeaters to further improve power-performance trade-off.Customizing more cells especially clock repeaters to further improve power-performance trade-off. Problem: High performance when LPM signal on  use large LPM devices  large area overhead Solution: Share LPM devices among multiple repeaters Fewer LPM devices but virtual V DD (V’ DD ) and V SS (V’ SS ) need routing Note: All LPM devices drive V’ DD and V’ SS How many LPM devices needed? Compute simultaneousCompute simultaneous switching rate (SSR) by finding the max. #repeaters that have overlapping timing windows. Time = O(RlogR) (R = #repeaters) Find total width of all repeater devices (=W R )Find total width of all repeater devices (=W R ) For good performance, width of LPM devices = 2xSSRxW RFor good performance, width of LPM devices = 2xSSRxW R Typical SSR=~10%  small area overhead Problem: High performance when LPM signal on  use large LPM devices  large area overhead Solution: Share LPM devices among multiple repeaters Fewer LPM devices but virtual V DD (V’ DD ) and V SS (V’ SS ) need routing Note: All LPM devices drive V’ DD and V’ SS How many LPM devices needed? Compute simultaneousCompute simultaneous switching rate (SSR) by finding the max. #repeaters that have overlapping timing windows. Time = O(RlogR) (R = #repeaters) Find total width of all repeater devices (=W R )Find total width of all repeater devices (=W R ) For good performance, width of LPM devices = 2xSSRxW RFor good performance, width of LPM devices = 2xSSRxW R Typical SSR=~10%  small area overhead We add PMOS-NMOS pair to turn half devices off dynamically What power components likely to reduce? Short-circuit power: During switching, PMOS & NMOS ON momentarily  short circuit between V DD and V SSShort-circuit power: During switching, PMOS & NMOS ON momentarily  short circuit between V DD and V SS High when transition time (slew) is large Subthreshold leakage: when one of PMOS-NMOS pair between V DD and V SS ONSubthreshold leakage: when one of PMOS-NMOS pair between V DD and V SS ONRequirements: Low area overheadLow area overhead Added PMOS-NMOS pair (LPM devices) take area LPM (low-power mode) signal to be routed or locally generated Layout of the new cell must be simple and low area overhead High performance when LPM signal OFFHigh performance when LPM signal OFF On-resistance of LPM devices may reduce performance Good power-performance trade-offGood power-performance trade-off We add PMOS-NMOS pair to turn half devices off dynamically What power components likely to reduce? Short-circuit power: During switching, PMOS & NMOS ON momentarily  short circuit between V DD and V SSShort-circuit power: During switching, PMOS & NMOS ON momentarily  short circuit between V DD and V SS High when transition time (slew) is large Subthreshold leakage: when one of PMOS-NMOS pair between V DD and V SS ONSubthreshold leakage: when one of PMOS-NMOS pair between V DD and V SS ONRequirements: Low area overheadLow area overhead Added PMOS-NMOS pair (LPM devices) take area LPM (low-power mode) signal to be routed or locally generated Layout of the new cell must be simple and low area overhead High performance when LPM signal OFFHigh performance when LPM signal OFF On-resistance of LPM devices may reduce performance Good power-performance trade-offGood power-performance trade-off On-Line Adjustable Buffering for Runtime Power Reduction ( http://vlsicad.ucsd.edu ) Puneet Sharma † (sharma@ucsd.edu) Advisor: Prof. Andrew B. Kahng ‡† Jointly with Mr. Sherief Reda ‡ † Electrical & Computer Engineering ‡ Computer Science & Engineering CMOS Power: Operational – dynamic and leakageOperational – dynamic and leakage Standby – leakageStandby – leakage Approaches to reduce operational power: Supply voltage (V DD ) scalingSupply voltage (V DD ) scaling Dynamic V DD and frequency scaling (DVFS)Dynamic V DD and frequency scaling (DVFS) DVFS used to provide dynamic power-performance tradeoff  Switch to low-power mode if high performance not needed VDD already small to reduce dynamic power  Dynamic voltage scaling reduces noise margins  DVFS difficult to use due to reduced V DD Our approach, like DVFS, provides dynamic low-power, low- performance modes  supplement or replace DVFS Key idea: Many devices added for performance not functionality  Turn those devices off when high-performance not needed Poor interconnect scaling  large number of repeaters We modify repeaters to dynamically adjust their driving capacity CMOS Power: Operational – dynamic and leakageOperational – dynamic and leakage Standby – leakageStandby – leakage Approaches to reduce operational power: Supply voltage (V DD ) scalingSupply voltage (V DD ) scaling Dynamic V DD and frequency scaling (DVFS)Dynamic V DD and frequency scaling (DVFS) DVFS used to provide dynamic power-performance tradeoff  Switch to low-power mode if high performance not needed VDD already small to reduce dynamic power  Dynamic voltage scaling reduces noise margins  DVFS difficult to use due to reduced V DD Our approach, like DVFS, provides dynamic low-power, low- performance modes  supplement or replace DVFS Key idea: Many devices added for performance not functionality  Turn those devices off when high-performance not needed Poor interconnect scaling  large number of repeaters We modify repeaters to dynamically adjust their driving capacity Experimental Setup Circuits: s38417 (8,890 cells), AES (15,272), OpenRisc (46,732) Tools: Synopsys HSPICE (SPICE), Design Compiler (synthesis, timing and power analysis), Cadence SoC Encounter (P&R), SignalStorm (library characterization), TSMC 90nm library models Other settings: power and timing analysis at slow corner, V DD of 1.1V and 0.9V, activity factor of 0.01. Power Reduction Results Cell-level results: when LPM signal is turned ONCell-level results: when LPM signal is turned ON 20-20% reduction in leakage20-20% reduction in leakage 15-30% reduction in short-circuit power (for same slew)15-30% reduction in short-circuit power (for same slew) 45-65% increase in delay45-65% increase in delay Circuit-level results:Circuit-level results: Both dynamic and leakage power reduceBoth dynamic and leakage power reduce 6-12% reduction in total power at low performance modes6-12% reduction in total power at low performance modes Area Overhead Estimation Area overhead due to LPM devices is 0.91% to 5.57%. May be smaller as LPM devices placeable in whitespace.Area overhead due to LPM devices is 0.91% to 5.57%. May be smaller as LPM devices placeable in whitespace. Routing overhead: V’ DD and V’ SS nets routed as min. Steiner trees and found shorter than scanchain; LPM signal has short wirelength as #LPM devices is small.Routing overhead: V’ DD and V’ SS nets routed as min. Steiner trees and found shorter than scanchain; LPM signal has short wirelength as #LPM devices is small. Experimental Setup Circuits: s38417 (8,890 cells), AES (15,272), OpenRisc (46,732) Tools: Synopsys HSPICE (SPICE), Design Compiler (synthesis, timing and power analysis), Cadence SoC Encounter (P&R), SignalStorm (library characterization), TSMC 90nm library models Other settings: power and timing analysis at slow corner, V DD of 1.1V and 0.9V, activity factor of 0.01. Power Reduction Results Cell-level results: when LPM signal is turned ONCell-level results: when LPM signal is turned ON 20-20% reduction in leakage20-20% reduction in leakage 15-30% reduction in short-circuit power (for same slew)15-30% reduction in short-circuit power (for same slew) 45-65% increase in delay45-65% increase in delay Circuit-level results:Circuit-level results: Both dynamic and leakage power reduceBoth dynamic and leakage power reduce 6-12% reduction in total power at low performance modes6-12% reduction in total power at low performance modes Area Overhead Estimation Area overhead due to LPM devices is 0.91% to 5.57%. May be smaller as LPM devices placeable in whitespace.Area overhead due to LPM devices is 0.91% to 5.57%. May be smaller as LPM devices placeable in whitespace. Routing overhead: V’ DD and V’ SS nets routed as min. Steiner trees and found shorter than scanchain; LPM signal has short wirelength as #LPM devices is small.Routing overhead: V’ DD and V’ SS nets routed as min. Steiner trees and found shorter than scanchain; LPM signal has short wirelength as #LPM devices is small. Problem: Custom repeaters ~5% slower when LPM signal OFF  Up to ~5% reduction in circuit performance Solution: use custom repeaters only on non- timing-critical paths Additional constraint: slew constraints not violated when LPM signal is OFF or ON. We characterize custom repeaters (i.e., find delay, slew, power, input capacitance) and then perform remapping with synthesis tool subject to delay and slew constraints.  No loss in circuit performance & no slew violations Problem: Custom repeaters ~5% slower when LPM signal OFF  Up to ~5% reduction in circuit performance Solution: use custom repeaters only on non- timing-critical paths Additional constraint: slew constraints not violated when LPM signal is OFF or ON. We characterize custom repeaters (i.e., find delay, slew, power, input capacitance) and then perform remapping with synthesis tool subject to delay and slew constraints.  No loss in circuit performance & no slew violations Power-performance for circuitPower-performance for circuit AES shown Utilize slack to reduce powerUtilize slack to reduce power when high performance not needed Power lowered or unchangedPower lowered or unchanged with LPM Alternatively, unchanged orAlternatively, unchanged or higher performance given power budget Higher performance per wattHigher performance per watt Power-performance for circuitPower-performance for circuit AES shown Utilize slack to reduce powerUtilize slack to reduce power when high performance not needed Power lowered or unchangedPower lowered or unchanged with LPM Alternatively, unchanged orAlternatively, unchanged or higher performance given power budget Higher performance per wattHigher performance per watt Restricting Area Overhead Introduction Custom Repeater Design Ensuring High Performance Power-Performance Tradeoff Experimental Validation Conclusions & Ongoing Work Traditional Inverter Custom Inverter LPM devices shared by two inverters Power-performance w/ DVFS & DVFS combined w/ LPM

UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.

Similar presentations

Presentation on theme: "UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.

Similar presentations

Presentation on theme: "UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD."— Presentation transcript:

Similar presentations

About project

Feedback