Temperature-Aware Design Presented by Mehul Shah 4/29/04
The Problem Power & Thermal densities are increasing 50W/cm 2, 100W/cm 50nm technology Power density doubles every 3 years Operating Vdd scaling much more slowly (ITRS) Cost of cooling rising exponentially $1 - $3 per Watt of power dissipation Packages designed for worst case power Hot spots – heat dissipation non-uniform across chip Low-Power design techniques not sufficient Big Hammer : Global Clock Gating limits performance
Impact of Temperature on Design Increased Delay, Lower Reliability Slower Transistors Carrier mobility lower at higher temperature Inverter 35% slower at 110 o C vs. 60 o C Higher Leakage Power By orders of magnitude at higher temperature Leakage becoming more significant than switching power Higher Metal Resistivity Copper 39% more resistive at 120 o C vs. 20 o C Lower Mean-Time-To-Failure (MTF) MTF = MTF o exp (E a / k b T) MTF decreases exponentially w/ Temperature
Moral of the Story Problem: Temperature adversely affects power, performance & reliability Solution: “Temperature-Aware” Design
Temperature Aware Design Thermal Modeling Estimate Operating Temperature Simple : Allow architects to easily reason about thermal effects Detailed : Model runtime temperature at Functional-Unit granularity Computationally Efficient Flexible : Easily extend to novel architectures Dynamic Thermal Management Use runtime behavior and thermal status to adjust/distribute workload among Functional-Units
Talk Outline Thermal Modeling Model Description Validation & Case Studies Dynamic Thermal Management Results Conclusions
References Kevin Skadron et. al, “Temperature-Aware Microarchitecture” Wei Huang et. al, Compact Thermal Modeling for Temperature-Aware Design”
Thermal Modeling Thermal model interacts with Power, Performance, Reliability models Design convergence requires several iterations
Heat Flow vs. Electrical Phenomenon Both can be described by the same differential equations Heat Flow = Electrical Current Temperature = Voltage Capacitance = Heat Absorption Capacity Describe design as a Thermal RC circuit Node = Functional Block Solve RC equations to obtain Node Temperature
HotSpot Package
Equivalent Model
Equivalent Model (Continued) Die Area divided into micro-architectural blocks Spreader, Sink divided into five blocks Rsp, Rhs areas under the die Trapezoids not covered by the die R convective represents thermal resistance from package to air RC Model Vertical R’s : heat flow between layers Lateral R’s : heat diffusion within a layer R1 = Block1 to Spreader, R2 = Block1 to rest of the chip R = t / k * A t : thickness k : thermal conductivity of the material A : Cross-sectional area C = c * t * A c : thermal capacitance per unit volume Require empirical scaling factor due to lumped model
HotSpot Validation
Fallacy of Using a Power Metric
Compact Thermal Model
Equivalent Model
Equivalent Model (Cont.) Compact Model vs. HotSpot Arbitrary granularity grid Thermal interface material Spreader, Interface under the die are divided into chip granularity Primary Heat Flow Path R vertical = t / (k * A) C = Alpha * c p * ρ * A Alpha : To account for lumped capacitor model C p : specific heat ρ : material density
Equivalent Model (Secondary Path) Interconnect Thermal Model Self-heating power & wire length prediction Pself = I 2 R R = ρ m * L / A m
Equivalent Model (Secondary Path, Cont.) Equivalent Thermal Resistance
Model Validation & Evaluation (Primary) Steady State Transient
Model Validation (Secondary)
Case Study
Thermal Management Dynamic Thermal Management Emergency Threshold temperature above which chip is in thermal violation Trigger Threshold temperature above which DTM is applied
DTM Techniques Temperature-Tracking Frequency Scaling Feedback controlled Fetch Toggling Migrating Computation Dynamic Voltage Scaling (DVS) Global Clock Gating
DTM Results
Conclusions Accurate Thermal models are essential for early design estimation Models are similar to electrical RC networks Arbitrary granularity for localized temperature information Model all parts of the package Architectural Techniques can reduce demands on the IC package by Dynamically adjusting workload to avoid emergencies Reducing Hot Spots