CSE477 L26 System Power.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 26: Low Power Techniques in Microarchitectures and Memories.

Slides:



Advertisements
Similar presentations
COEN 180 SRAM. High-speed Low capacity Expensive Large chip area. Continuous power use to maintain storage Technology used for making MM caches.
Advertisements

Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
Leakage Energy Management in Cache Hierarchies L. Li, I. Kadayif, Y-F. Tsai, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, and A. Sivasubramaniam Penn State.
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
Managing Static (Leakage) Power S. Kaxiras, M Martonosi, “Computer Architecture Techniques for Power Effecience”, Chapter 5.
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Power Reduction Techniques For Microprocessor Systems
Synchronous Digital Design Methodology and Guidelines
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 Complex Digital Circuits Design Lecture 2: Timing Issues; [Adapted from Rabaey’s Digital Integrated.
Fall 06, Sep 19, 21 ELEC / Lecture 6 1 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic.
Introduction to CMOS VLSI Design Lecture 18: Design for Low Power David Harris Harvey Mudd College Spring 2004.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Low Power Design in CMOS [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey.
Designing Combinational Logic Circuits: Part2 Alternative Logic Forms:
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 14: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Die-Hard SRAM Design Using Per-Column Timing Tracking
CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 Low Power Design in Microarchitectures and Memories [Adapted from Mary Jane Irwin (
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Lecture 5 – Power Prof. Luke Theogarajan
Lecture 7: Power.
Lecture 21, Slide 1EECS40, Fall 2004Prof. White Lecture #21 OUTLINE –Sequential logic circuits –Fan-out –Propagation delay –CMOS power consumption Reading:
Low Power Design of Integrated Systems Assoc. Prof. Dimitrios Soudris
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
CSE477 L12&13 Low Power.1Irwin&Vijay, PSU, 2002 TKT-1527 Digital System Design Issues Designing for Low Power Mary Jane Irwin ( )
CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 TKT-1527 Digital System Design Issues Low Power Techniques in Microarchitectures and Memories Mary Jane.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
1 VLSI Design SMD154 LOW-POWER DESIGN Magnus Eriksson & Simon Olsson.
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers,
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
Digital Integrated Circuits Chpt. 5Lec /29/2006 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Washington State University
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 12.1 EE4800 CMOS Digital IC Design & Analysis Lecture 12 SRAM Zhuo Feng.
CSE477 L24 RAM Cores.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 24: RAM Cores Mary Jane Irwin ( )
Why Power Matters Packaging costs Power supply rail design
CSE477 L23 Memories.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 23: Semiconductor Memories Mary Jane Irwin (
CSE477 L07 Pass Transistor Logic.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 07: Pass Transistor Logic Mary Jane Irwin (
Leakage reduction techniques Three major leakage current components 1. Gate leakage ; ~ Vdd 4 2. Subthreshold ; ~ Vdd 3 3. P/N junction.
경종민 Low-Power Design for Embedded Processor.
CSE477 L12&13 Low Power.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 12&13: Designing for Low Power Mary Jane Irwin (
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 22: Memery, ROM
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Sp09 CMPEN 411 L14 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 14: Designing for Low Power [Adapted from Rabaey’s Digital Integrated Circuits,
Patricia Gonzalez Divya Akella VLSI Class Project.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
Sp09 CMPEN 411 L21 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 21: Shifters, Decoders, Muxes [Adapted from Rabaey’s Digital Integrated Circuits,
CSE477 L21 Multiplier Design.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (
CSE477 L25 Memory Peripheral.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 25: Peripheral Memory Circuits Mary Jane Irwin (
CS203 – Advanced Computer Architecture
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 19: Timing Issues; Introduction to Datapath.
CSE477 L06 Static CMOS Logic.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 06: Static CMOS Logic Mary Jane Irwin (
CSE477 L27 System Interconnect.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 27: System Level Interconnect Mary Jane Irwin (
LOW POWER DESIGN METHODS
Overview Motivation (Kevin) Thermal issues (Kevin)
CS203 – Advanced Computer Architecture
Temperature and Power Management
CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
SECTIONS 1-7 By Astha Chawla
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 26: Low Power Techniques in Microarchitectures.
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 22: Shifters, Decoders, Muxes Mary Jane.
Lecture 7: Power.
Lecture 7: Power.
Presentation transcript:

CSE477 L26 System Power.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 26: Low Power Techniques in Microarchitectures and Memories Mary Jane Irwin ( ) [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]

CSE477 L26 System Power.2Irwin&Vijay, PSU, 2003 Review: CMOS Energy & Power Equations E = C L V DD 2 P 0  1 + t sc V DD I peak P 0/1  1/0 + V DD I leak P = C L V DD 2 f + t sc V DD I peak f + V DD I leak f = P * f clock Dynamic power (~90% today and decreasing relatively) Short-circuit power (~8% today and decreasing absolutely) Leakage power (~2% today and increasing)

CSE477 L26 System Power.3Irwin&Vijay, PSU, 2003 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active (Dynamic) Logic design Reduced V dd TSizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage (Standby) Multi-V T Stack effect Pin ordering Sleep Transistors Multi-V dd Variable V T Input control Variable V T

CSE477 L26 System Power.4Irwin&Vijay, PSU, 2003 Reducing Power and Energy of Interconnects  Share long data buses with time multiplexing (S 1 uses even cycles, S 2 odd) S2S2 S1S1 D1D1 D2D2 S1S1 S2S2 D2D2 D1D1  Buses are a significant source of power dissipation due to high switching activities and large capacitive loading l 15% of total power in Alpha l 30% of total power in Intel  But what if data samples are correlated (e.g., sign bits)?

CSE477 L26 System Power.5Irwin&Vijay, PSU, 2003 Bus Multiplexing and Correlated Data Streams Bit position MSB LSB Bit switching probabilities  For a shared (multiplexed) bus advantages of data correlation are lost (bus carries samples from two uncorrelated data streams) l Bus sharing should not be used for positively correlated data streams l Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) - more random switching

CSE477 L26 System Power.6Irwin&Vijay, PSU, 2003 Reducing Power and Energy of Memories  Active power in memory of m columns and n rows P = V DD I DD where I DD = I array + I decode + I periphery = [mi act + m(n-1)i hld ] + [(n+m)C DE V int f] + [C PT V int f + I DCP ] l As expected, it is proportional to the size of the memory and is typically dominated by the array  Partition the memory array into multiple smaller banks (see L23.11) so that only the addressed bank is activated l improves speed and lowers power -word line and bit line capacitances are reduced -number of bit cells activated reduced l At some point the delay and power overhead associated with the bank decoding circuit dominates (2 to 8 banks typical)

CSE477 L26 System Power.7Irwin&Vijay, PSU, 2003 Divided Word Line  Divide RAM cells in each row into blocks where the cells in each block are accessed by a local word line (LWL)  Only the memory cells in the activated block have their bit line pairs driven l improves speed (by decreasing word line capacitance) l lowers power dissipation (by decreasing the number of BL pairs activated) BSL LD WL i WL i+1 LWL i LWL i+1 Local decoder Block select line RAM cell BL j BL j+1 BL j+m Row block

CSE477 L26 System Power.8Irwin&Vijay, PSU, 2003 Bit Line Segmentation  Divide RAM cells in each column into blocks where each block has its own local bit line (LBL) - only the memory cells in the activated block present a load on the bit line l lowers power dissipation (by decreasing bit line capacitance) -e.g., from more than 1pF for a 16Kb DRAM to ~200fF for a 64Mbit DRAM Switch to isolate segment LBL i+n,j LBL i,j BL j WL i SWL i+n,j SWL i,j  Row decoder logic also identifies the segment (SWL)  Has minimal effect on performance

CSE477 L26 System Power.9Irwin&Vijay, PSU, 2003 Glitch Reduction by Pipelining  Glitches depend on the logic depth of the circuit - gates deeper in the logic network are more prone to glitching l arrival times of the gate inputs are more spread due to delay imbalances l usually affected more by primary input switching  Reduce logic depth by adding pipeline registers l additional energy used by the clock and pipeline registers PC FetchDecodeExecuteMemoryWriteBack Instruction MAR MDR I$D$ clk pipeline stage isolation register

CSE477 L26 System Power.10Irwin&Vijay, PSU, 2003 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active (Dynamic) Logic design Reduced V dd TSizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage (Standby) Multi-V T Stack effect Pin ordering Sleep Transistors Multi-V dd Variable V T Input control Variable V T

CSE477 L26 System Power.11Irwin&Vijay, PSU, 2003 Clock Gating  Gate off clock to idle functional units l e.g., floating point units l need logic to generate disable signal -increases complexity of control logic -consumes power -timing critical to avoid clock glitches at OR gate output l additional gate delay on clock signal -gating OR gate can replace a buffer in the clock distribution tree  Most popular method for power reduction of clock signals and functional units RegReg clock disable Functional unit

CSE477 L26 System Power.12Irwin&Vijay, PSU, 2003 Clock Gating in a Pipelined Datapath  For idle units (e.g., floating point units in Exec stage, WB stage for instructions with no write back operation) PC FetchDecodeExecuteMemoryWriteBack Instruction MAR MDR I$D$ clk No FPNo WB

CSE477 L26 System Power.13Irwin&Vijay, PSU, 2003 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active (Dynamic) Logic design Reduced V dd TSizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage (Standby) Multi-V T Stack effect Pin ordering Sleep Transistors Multi-V dd Variable V T Input control Variable V T

CSE477 L26 System Power.14Irwin&Vijay, PSU, 2003 Review: Dynamic Power as a Function of V DD  Decreasing the V DD decreases dynamic energy consumption (quadratically)  But, increases gate delay (decreases performance) V DD (V) t p(normalized)  So if multiple levels of V DD are provided for use at run time, the clock frequency must also be adjusted.

CSE477 L26 System Power.15Irwin&Vijay, PSU, 2003 Dynamic Frequency and Voltage Scaling  Always run at the lowest supply voltage that meets the timing constraints l DFS (dynamic frequency scaling) saves only power (e.g., Intel’s SpeedStep) l DVS (dynamic voltage scaling) + DFS saves both energy and power (e.g., Transmeta’s LongRun)  A DVS+DFS system requires the following l A programmable clock generator (PLL) -PLL from 200MHz  700MHz in increments of 33MHz l A supply regulation loop that sets the minimum V DD necessary for operation at the desired frequency -32 levels of V DD from 1.1V to 1.6V l An operating system that sets the required frequency + supply voltage to meet the task completion deadlines -heavier load  ramp up V DD, when stable speed up clock -lighter load  slow down clock, when PLL locks onto new rate, ramp down V DD

CSE477 L26 System Power.16Irwin&Vijay, PSU, 2003 Dynamic Thermal Management (DTM)  Trigger mechanism: on- chip temperature sensors l Based on differential voltage change across two diodes of different sizes l Usually requires more than one sensor l Hysteresis and delay are problems  When to begin responding? l Trigger level set too high means higher packaging costs l Trigger level set too low means frequent triggering and loss in performance  Choose trigger level to exploit difference between average and worst case power  An example of DVS + DFS in action

CSE477 L26 System Power.17Irwin&Vijay, PSU, 2003 DTM Initiation and Response Mechanisms  Operating system or micro-architectural initiation mechanism? l Hardware support can reduce the performance penalty by 20-30%  Response mechanism – DVS+DFS l Incurs some delay since there is a OS context switch needed to set the new level of DVS + DFS l Increasing the trigger level reduces the frequency of context switching to set DVS + DFS  The use of a thermal window (100Kcycles+) can help to “smooth” short thermal spikes

CSE477 L26 System Power.18Irwin&Vijay, PSU, 2003 DTM Activation and Deactivation Cycle Trigger Reached Turn Response On Initiation Delay  Initiation Delay – OS interrupt/handler  Response Delay – Invocation time (adjust clock, V DD ) Response Delay Policy Delay Check Temp  Policy Delay – Number of cycles engaged Check Temp Shutoff Delay Turn Response Off  Shutoff Delay – Disabling time (re-adjust clock, V DD ) temperature DTM trigger level Cooling capacity without DTM Cooling capacity with DTM savings

CSE477 L26 System Power.19Irwin&Vijay, PSU, 2003 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active (Dynamic) Logic design Reduced V dd TSizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage (Standby) Multi-V T Stack effect Pin ordering Sleep Transistors Multi-V dd Variable V T Input control Variable V T

CSE477 L26 System Power.20Irwin&Vijay, PSU, 2003 Speculated Power of a 15mm  P

CSE477 L26 System Power.21Irwin&Vijay, PSU, 2003 Review: Variable V T at Run Time  Reducing the V T increases the sub-threshold leakage current (exponentially) V T = V T0 +  (  |-2  F + V SB | -  |-2  F |) where V T0 is the threshold voltage at V SB = 0, V SB is the source- bulk (substrate) voltage,  is the body-effect coefficient V SB (V) V T (V)  But, reducing V T decreases gate delay (increases performance) l For an n-channel device, the substrate is normally tied to ground (V SB = 0) l A negative bias on V SB causes V T to increase l Adjusting the substrate bias at run time is called adaptive body- biasing (ABB) or dynamic threshold scaling (DTS) -Requires a triple well fab process

CSE477 L26 System Power.22Irwin&Vijay, PSU, 2003 DTS  DTS can accomplish a variety of goals l Lower the leakage in standby mode by increasing V T to its maximum value l Compensate for threshold variations across the chip during normal operation l Throttle the throughput (by increasing V T ) to lower both the active and leakage power based on performance requirements  Substrate biasing can be implemented on a complete chip, on a block-by-block basis, or on a cell-by-cell basis. l Per-cell granularity of substrate biasing has an area cost  Unfortunately, the effectiveness of DTS is decreasing with technology scaling due to inherently lower body- effect factors V SB,p V SB,n

CSE477 L26 System Power.23Irwin&Vijay, PSU, 2003 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active (Dynamic) Logic design Reduced V dd TSizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage (Standby) Multi-V T Stack effect Pin ordering Sleep Transistors Multi-V dd Variable V T Input control Variable V T

CSE477 L26 System Power.24Irwin&Vijay, PSU, 2003 Reducing Power in Standby (Sleep) Mode  For idle components, all power dissipation is due to leakage  Can reduce leakage by DTS  Or can reduce leakage by gating the supply rails when the circuit is in sleep mode l in normal mode, sleep = 1 and the sleep transistors must present as small a resistance as possible (via sizing) l in sleep mode, sleep = 0, the transistor stack effect reduces leakage by orders of magnitude Virtual V DD Virtual GND V DD !sleep sleep  Or can eliminate leakage by switching off the power supply (but lose the memory state)

CSE477 L26 System Power.25Irwin&Vijay, PSU, 2003 Reducing Standby Power in Memories  Leakage in memory arrays is becoming a major issue l leakage increase from 0.18  m to 0.13  m is a factor of almost 7  Techniques to control memory array leakage l turn off unused banks by switching off the power supply l apply DTS to non-active cells (maintains state) -memory cannot be accessed at speed when running on the lower V T l exploit transistor stacking (maintains state) V DD I leakage (A) 0.13  m l lower the supply voltage (maintains state) -memory cannot be access when running on the lower supply

CSE477 L26 System Power.26Irwin&Vijay, PSU, 2003 Leakage Controlled SRAM Cell Alternatives 0 1 Asymmetric SRAM Cell Gate control Virtual GND Gated-GND SRAM Cell 0 1  Cell state preserved  Hardware versus software control of “mode” V DD (1V) V DD Low (.3V) Drowsy SRAM Cell !drowsy drowsy Cell Leakage Bit line leakage

CSE477 L26 System Power.27Irwin&Vijay, PSU, 2003 Leakage Controlled SRAM Savings and “Costs” bits, 70 nm, 1 ns cycle

CSE477 L26 System Power.28Irwin&Vijay, PSU, 2003 Leakage Controlled Cache Microarchitecture to prevent accessing drowsy lines word line word line drivers row decoder Reset Global Set !Q Q 0.3V (drowsy) 1V (active) word line power line SRAMs wordline gate Set: drowsy Reset: active

CSE477 L26 System Power.29Irwin&Vijay, PSU, 2003 Hardware Controlled Drowsy Cache  Cache energy reduction l standby energy by 71% to 76% l total energy by 54% to 58%  Run time increase l 0.41%  Put cache lines into a low-power mode periodically independent of the access history l Periodic global set counter (~4000 cycles has good E-D trade-off) asserts drowsy signal -don’t need counters/predictor states for each line

CSE477 L26 System Power.30Irwin&Vijay, PSU, 2003 Next Lecture and Reminders  Next lecture l System level interconnect -Reading assignment – Rabaey, et al, Chapter 9  Reminders l Project final reports due on-line by 5:00pm on Friday, December 5 th l Final grading negotiations/correction (except for the final exam) must be concluded by December 10 th l Final exam scheduled -Tuesday, December 16 th from 10:10 to noon in 118 and 113 Thomas