Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477 CSE477 VLSI Digital Circuits Fall 2002 Lecture 26: Low Power Techniques in Microarchitectures.

Slides:



Advertisements
Similar presentations
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Advertisements

Power Reduction Techniques For Microprocessor Systems
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Introduction to CMOS VLSI Design Lecture 18: Design for Low Power David Harris Harvey Mudd College Spring 2004.
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 14: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 Low Power Design in Microarchitectures and Memories [Adapted from Mary Jane Irwin (
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Lecture 5 – Power Prof. Luke Theogarajan
Lecture 7: Power.
The CMOS Inverter Slides adapted from:
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 TKT-1527 Digital System Design Issues Low Power Techniques in Microarchitectures and Memories Mary Jane.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
1 VLSI Design SMD154 LOW-POWER DESIGN Magnus Eriksson & Simon Olsson.
Power Reduction for FPGA using Multiple Vdd/Vth
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Washington State University
CSE477 L24 RAM Cores.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 24: RAM Cores Mary Jane Irwin ( )
CSE477 L23 Memories.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 23: Semiconductor Memories Mary Jane Irwin (
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
Sp09 CMPEN 411 L14 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 14: Designing for Low Power [Adapted from Rabaey’s Digital Integrated Circuits,
Patricia Gonzalez Divya Akella VLSI Class Project.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
CSE477 L21 Multiplier Design.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (
CS203 – Advanced Computer Architecture
LOW POWER DESIGN METHODS
CSE477 L26 System Power.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 26: Low Power Techniques in Microarchitectures and Memories.
COE 360 Principles of VLSI Design Delay. 2 Definitions.
Overview Motivation (Kevin) Thermal issues (Kevin)
Power-Optimal Pipelining in Deep Submicron Technology
Smruti R. Sarangi IIT Delhi
CS203 – Advanced Computer Architecture
Temperature and Power Management
CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design
The Inverter EE4271 VLSI Design Professor Shiyan Hu Office: EERC 518
VLSI design Short channel Effects in Deep Submicron CMOS
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
SECTIONS 1-7 By Astha Chawla
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 25: Peripheral Memory Circuits Mary Jane.
Hot Chips, Slow Wires, Leaky Transistors
Basics of Energy & Power Dissipation
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Architecture & Organization 1
October 10, 2017 Kjell Jeppson, Lena Peterson Chapter 5 Power in W & H
Downsizing Semiconductor Device (MOSFET)
EE141 Chapter 3 VLSI Design The Devices March 28, 2003.
COUPING WITH THE INTERCONNECT
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Microarchitectural Techniques for Power Gating of Execution Units
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 27: System Level Interconnect Mary Jane.
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 22: Shifters, Decoders, Muxes Mary Jane.
Challenges in Nanoelectronics: Process Variability
Architecture & Organization 1
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2003 Lecture 22: Shifters, Decoders, Muxes Mary Jane.
Notes on Diodes 1. Diode saturation current:  
332:479 Concepts in VLSI Design Lecture 24 Power Estimation
Downsizing Semiconductor Device (MOSFET)
ELEC 6970: Low Power Design Class Project By: Sachin Dhingra
Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
The Inverter EE4271 VLSI Design Dr. Shiyan Hu Office: EERC 731
Lecture 7: Power.
Reading (Rabaey et al.): Sections 3.5, 5.6
Leakage Power Reduction Techniques
Lecture 7: Power.
Reading: Hambley Ch. 7; Rabaey et al. Secs. 5.2, 5.5, 6.2.1
Technology scaling Currently, technology scaling has a threefold objective: Reduce the gate delay by 30% (43% increase in frequency) Double the transistor.
Presentation transcript:

Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477 CSE477 VLSI Digital Circuits Fall 2002 Lecture 26: Low Power Techniques in Microarchitectures and Memories Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]

Review: Energy & Power Equations E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD Ileakage P = CL VDD2 f01 + tscVDD Ipeak f01 + VDD Ileakage f01 = P01 * fclock Dynamic power (~90% today and decreasing relatively) Short-circuit power (~8% today and decreasing absolutely) Leakage power (~2% today and increasing) f0->1 represents the energy consuming transition

Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Non-active Modules Run Time Active Logic Design Reduced Vdd Sizing Multi-Vdd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage + Multi-VT Sleep Transistors Variable VT + Variable VT Physical capacitance: circuit style selection, transistor sizing, placement and routing, architectural optimizations. input and output rise/fall times – determines short-circuit power device thresholds and temperature – impacts leakage power switching activity

Bus Multiplexing Buses are a significant source of power dissipation due to high switching activities and large capacitive loading 15% of total power in Alpha 21064 30% of total power in Intel 80386 Share long data buses with time multiplexing (S1 uses even cycles, S2 odd) S2 S1 D1 D2 S1 S2 D2 D1 But what if data samples are correlated (e.g., sign bits)?

Correlated Data Streams For a shared (multiplexed) bus advantages of data correlation are lost (bus carries samples from two uncorrelated data streams) Bus sharing should not be used for positively correlated data streams Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) - more random switching Bit switching probabilities MSB LSB Bit position

Glitch Reduction by Pipelining Glitches depend on the logic depth of the circuit - gates deeper in the logic network are more prone to glitching arrival times of the gate inputs are more spread due to delay imbalances usually affected more by primary input switching Reduce logic depth by adding pipeline registers additional energy used by the clock and pipeline registers Fetch Decode Execute Memory WriteBack Five stage pipeline (originally for performance, but also helps with energy) PC I$ Instruction MAR D$ MDR pipeline stage isolation register clk

Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Non-active Modules Run Time Active Logic Design Reduced Vdd Sizing Multi-Vdd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage + Multi-VT Sleep Transistors Variable VT + Variable VT Physical capacitance: circuit style selection, transistor sizing, placement and routing, architectural optimizations. input and output rise/fall times – determines short-circuit power device thresholds and temperature – impacts leakage power switching activity

Clock Gating Most popular method for power reduction of clock signals and functional units Gate off clock to idle functional units e.g., floating point units need logic to generate disable signal increases complexity of control logic consumes power timing critical to avoid clock glitches at OR gate output additional gate delay on clock signal gating OR gate can replace a buffer in the clock distribution tree R e g clock disable Functional unit Motorola’s Mcore numbers for number of pipeline latches disabled by clock gating on pipeline stall

Clock Gating in a Pipelined Datapath For idle units (e.g., floating point units in Exec stage, WB stage for instructions with no write back operation) Fetch Decode Execute Memory WriteBack PC I$ Instruction MAR D$ MDR clk No FP No WB

Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Non-active Modules Run Time Active Logic Design Reduced Vdd Sizing Multi-Vdd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage + Multi-VT Sleep Transistors Variable VT + Variable VT Physical capacitance: circuit style selection, transistor sizing, placement and routing, architectural optimizations. input and output rise/fall times – determines short-circuit power device thresholds and temperature – impacts leakage power switching activity

Review: Dynamic Power as a Function of VDD Decreasing the VDD decreases dynamic energy consumption (quadratically) But, increases gate delay (decreases performance) tp(normalized) VDD (V) Good design practice keeps drain diffusion areas as small as possible Self-loading is when the intrinsic capacitance (diffusion capacitance) starts to dominate the extrinsic load formed by wiring and fanout. Propagation delay of a CMOS inverter as a function of supply voltage (normalized wrt delay at 2.5V supply). While the delay is relatively insensitive to supply variations for higher values of VDD, a sharp increase can be observed starting around 2VT. This operation regions should be avoided for high performance! Increasing VDD also has reliability concerns - oxide breakdown, hot-electron effects - that enforce firm upper bounds on the supply voltage in deep submicron processes. Lowering VDD slows down the gate! Determine the critical path(s) at design time and use high VDD for the transistors on those paths for speed. Use a lower VDD on the other logic to reduce dynamic energy consumption.

Dynamic Frequency and Voltage Scaling Intel’s SpeedStep Hardware that steps down the clock frequency (dynamic frequency scaling – DFS) when the user unplugs from AC power PLL from 650MHz  500MHz CPU stalls during SpeedStep adjustment Transmeta LongRun Hardware that applies both DFS and DVS (dynamic supply voltage scaling) 32 levels of VDD from 1.1V to 1.6V PLL from 200MHz  700MHz in increments of 33MHz Triggered when CPU load change is detected by software heavier load  ramp up VDD, when stable speed up clock lighter load  slow down clock, when PLL locks onto new rate, ramp down VDD CPU stalls only during PLL relock (< 20 microsec) Always keeps clock frequency within limits required by VDD to avoid clock skew problems

Dynamic Thermal Management (DTM) Trigger Mechanism: When do we enable DTM techniques? Initiation Mechanism: How do we enable technique? Response Mechanism: What technique do we enable?

DTM Trigger Mechanisms Mechanism: How to deduce temperature? Direct approach: on-chip temperature sensors Based on differential voltage change across 2 diodes of different sizes May require >1 sensor Hysteresis and delay are problems Policy: When to begin responding? Trigger level set too high means higher packaging costs Trigger level set too low means frequent triggering and loss in performance Choose trigger level to exploit difference between average and worst case power Average case power dissipation can often be much lower than worst case due to Aggressive clock gating Applications variations Underutilized resources Not enough ILP Currently about a 30% difference Likely to further diverge…

DTM Initiation and Response Mechanisms Operating system or microarchitectural control? Hardware support can reduce performance penalty by 20-30% Initiation of policy incurs some delay When using DVS and/or DFS, much of the performance penalty can be attributed to enabling/disabling overhead Increasing policy delay reduces overhead; smarter initiation techniques would help as well Thermal window (100Kcycles+) Larger thermal windows “smooth” short thermal spikes

DTM Activation and Deactivation Cycle Turn Response On Initiation Delay Initiation Delay – OS interrupt/handler Shutoff Delay Turn Response Off Shutoff Delay – Disabling time (e.g., re-adjust clock) Policy Delay – Number of cycles engaged Policy Delay Check Temp Trigger Reached Check Temp Response Delay – Invocation time (e.g., adjust clock) Response Delay

DTM Savings Benefits Designed for cooling capacity without DTM Temperature Designed for cooling capacity without DTM Designed for cooling capacity with DTM System Cost Savings DTM trigger level DTM Disabled DTM/Response Engaged Time

Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Non-active Modules Run Time Active Logic Design Reduced Vdd Sizing Multi-Vdd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage + Multi-VT Sleep Transistors Variable VT + Variable VT Physical capacitance: circuit style selection, transistor sizing, placement and routing, architectural optimizations. input and output rise/fall times – determines short-circuit power device thresholds and temperature – impacts leakage power switching activity

Speculated Power of a 15mm mP

Review: Leakage as a Function of Design Time VT Reducing the VT increases the sub- threshold leakage current (exponentially) But, reducing VT decreases gate delay (increases performance) Determine the critical path(s) at design time and use low VT devices on the transistors on those paths for speed. Use a high VT on the other logic for leakage control.

Review: Variable VT (ABB) at Run Time VT = VT0 + (|-2F + VSB| - |-2F|) where VT0 is the threshold voltage at VSB = 0 VSB is the source-bulk (substrate) voltage  is the body-effect coefficient For an n-channel device, the substrate is normally tied to ground A negative bias causes VT to increase from 0.45V to 0.85V Adjusting the substrate bias at run time is called adaptive body-biasing (ABB) VT (V) Value of VGS where strong inversion occurs is VT VSB is the source to bulk (body) or substrate bias (note relationship to body effect) For tox = 5 nm then Cox = 7fF/micron^2 (typical tox less than 10nm (==100 angstroms) in today’s technology) Typical values for NA = 10**15 atoms/cm**3 and ND = 10**16 atoms/cm**3 Observe that the threshold voltage has a positive value for a normal NMOS device and a negative for a normal PMOS device VSB (V)

Next Lecture and Reminders System level interconnect Reading assignment – Rabaey, et al, xx Reminders Project final reports due December 5th Final grading negotiations/correction (except for the final exam) must be concluded by December 10th Final exam scheduled Monday, December 16th from 10:10 to noon in 118 and 121 Thomas