Lev Finkelstein ISCA/Thermal Workshop 6/2004 Overview Motivation (Kevin) Thermal issues (Kevin) Power modeling (David) Thermal management (David) Optimal DTM (Lev) Clustering (Antonio) Power distribution (David) What current chips do (Lev) HotSpot (Kevin) Lev Finkelstein ISCA/Thermal Workshop 6/2004
Optimal DTM strategies
Lev Finkelstein ISCA/Thermal Workshop 6/2004 Agenda Motivation DTM as an optimization problem Thermal models Theoretical analysis Numerical approach Example Possible applications Lev Finkelstein ISCA/Thermal Workshop 6/2004
Lev Finkelstein ISCA/Thermal Workshop 6/2004 Optimal behavior There are various DTM techniques Can we say that a DTM method is good enough? Can we say that a DTM method may be tuned to perform well? Are there optimal strategies at all? Lev Finkelstein ISCA/Thermal Workshop 6/2004
Lev Finkelstein ISCA/Thermal Workshop 6/2004 DVS scenario for 2 cores Air Heat Sink Heat Pipe Silicon CPU1 CPU2 Unit 2D power Map freq volt freq volt DVS1 DVS2 Lev Finkelstein ISCA/Thermal Workshop 6/2004
Optimal behavior (cont’d) We need a methodology for analysis of optimal behavior Offline analysis of existing strategies Guide for the design of new strategies Lev Finkelstein ISCA/Thermal Workshop 6/2004
DTM as an optimization problem What is the optimization criterion? What is the set of possible strategies? What are the constraints? Lev Finkelstein ISCA/Thermal Workshop 6/2004
DTM optimization criterion Goal: maximize the “performance” To a first approximation, similar to frequency maximization Lev Finkelstein ISCA/Thermal Workshop 6/2004
Lev Finkelstein ISCA/Thermal Workshop 6/2004 DVS strategies A strategy f(t) dictates how to change frequency and voltage An optimal strategy extracts more clock cycles than any other legal strategy Frequency Time Instead Lev Finkelstein ISCA/Thermal Workshop 6/2004
Lev Finkelstein ISCA/Thermal Workshop 6/2004 Constraints Thermal constraints: do not exceed Tmax Frequency constraints: do not exceed maximal frequency Additional constraints: frequency should be consistent with the voltage Lev Finkelstein ISCA/Thermal Workshop 6/2004
Lev Finkelstein ISCA/Thermal Workshop 6/2004 RC networks Heat Pipe Silicon Heat Sink Lev Finkelstein ISCA/Thermal Workshop 6/2004
Lev Finkelstein ISCA/Thermal Workshop 6/2004 1D heat flow Power in Core Heat Pipe Heat Sink Lev Finkelstein ISCA/Thermal Workshop 6/2004
Lev Finkelstein ISCA/Thermal Workshop 6/2004 3D heat flow Lev Finkelstein ISCA/Thermal Workshop 6/2004
Lev Finkelstein ISCA/Thermal Workshop 6/2004 Theoretical analysis Build a mathematical formulation of the problem Solve the optimization problem by one of the existing theoretical methods Feasible only for simple thermal models Lev Finkelstein ISCA/Thermal Workshop 6/2004
Theoretical analysis (cont’d) Optimal strategy for a single-RC model1 1From Cohen et. al, “On Estimating Optimal Performance of CPU Dynamic Thermal Management." Computer Architecture Letters, Volume 2, Oct. 2003 Lev Finkelstein ISCA/Thermal Workshop 6/2004
Theoretical analysis (cont’d) Optimal strategy consists of three stages: Start from the maximal frequency Decrease exponentially until the temperature reaches Tmax Run with the “natural” frequency that keeps the temperature on Tmax Lev Finkelstein ISCA/Thermal Workshop 6/2004
Lev Finkelstein ISCA/Thermal Workshop 6/2004 Numerical approach The mathematical problem is solved by numerical methods May handle rather complex thermal models Used as an offline procedure Lev Finkelstein ISCA/Thermal Workshop 6/2004
Numerical approach (cont’d) Developed a methodology based on mathematical programming Handles large RC networks May handle time-dependent power profiles, leakage power, etc. Lev Finkelstein ISCA/Thermal Workshop 6/2004
Lev Finkelstein ISCA/Thermal Workshop 6/2004 Example A 3D thermal model Power is assumed to be proportional to the cube of the frequency A constant power profile A non-uniform power distribution Lev Finkelstein ISCA/Thermal Workshop 6/2004
Optimal strategy behavior Power starts at a high value and decreases exponentially until the maximal temperature is reached The maximal junction temperature is maintained, while the power approaches the steady state Area of potential performance gain Lev Finkelstein ISCA/Thermal Workshop 6/2004
Optimal strategy behavior (cont’d) A non-typical thermal behavior (decreases while power increases) The reason – a non-uniform power map Lev Finkelstein ISCA/Thermal Workshop 6/2004
Possible applications Combining with power predictor Optimization of activity migration Just a nice fact – for a single-RC thermal model a PID controller that is tuned for performance behaves similarly to the optimal strategy Lev Finkelstein ISCA/Thermal Workshop 6/2004