Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load *Chunta Chu, Xinyi Zhang, Lei He, and Tom Tong Jing Electrical.

Slides:



Advertisements
Similar presentations
THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.
Advertisements

International Symposium on Low Power Electronics and Design Qing Xie, Mohammad Javad Dousti, and Massoud Pedram University of Southern California ISLPED.
Performance, Energy and Thermal Considerations of SMT and CMP architectures Yingmin Li, David Brooks, Zhigang Hu, Kevin Skadron Dept. of Computer Science,
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
Multi Dimensional Steady State Heat Conduction P M V Subbarao Associate Professor Mechanical Engineering Department IIT Delhi It is just not a modeling.
CML CML Presented by: Aseem Gupta, UCI Deepa Kannan, Aviral Shrivastava, Sarvesh Bhardwaj, and Sarma Vrudhula Compiler and Microarchitecture Lab Department.
3D-STAF: Scalable Temperature and Leakage Aware Floorplanning for Three-Dimensional Integrated Circuits Pingqiang Zhou, Yuchun Ma, Zhouyuan Li, Robert.
National Tsing Hua University Po-Yang Hsu,Hsien-Te Chen,
An Efficient Method for Chip-Level Statistical Capacitance Extraction Considering Process Variations with Spatial Correlation W. Zhang, W. Yu, Z. Wang,
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
1 Thermal Via Placement in 3D ICs Brent Goplen, Sachin Sapatnekar Department of Electrical and Computer Engineering University of Minnesota.
Stochastic Physical Synthesis for FPGAs with Pre-routing Interconnect Uncertainty and Process Variation Yan Lin and Lei He EE Department, UCLA
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Estimation of Wirelength Reduction for λ-Geometry vs. Manhattan Placement and Routing H. Chen, C.-K. Cheng, A.B. Kahng, I. Mandoiu, and Q. Wang UCSD CSE.
Circuit Performance Variability Decomposition Michael Orshansky, Costas Spanos, and Chenming Hu Department of Electrical Engineering and Computer Sciences,
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
Analytical Thermal Placement for VLSI Lifetime Improvement and Minimum Performance Variation Andrew B. Kahng †, Sung-Mo Kang ‡, Wei Li ‡, Bao Liu † † UC.
Thermal-Aware SoC Test Scheduling with Test Set Partitioning and Interleaving Zhiyuan He 1, Zebo Peng 1, Petru Eles 1 Paul Rosinger 2, Bashir M. Al-Hashimi.
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
Efficient Decoupling Capacitance Budgeting Considering Operation and Process Variations Yiyu Shi*, Jinjun Xiong +, Chunchen Liu* and Lei He* *Electrical.
University of Michigan Electrical Engineering and Computer Science 1 StageNet: A Reconfigurable CMP Fabric for Resilient Systems Shantanu Gupta Shuguang.
CS 7810 Lecture 15 A Case for Thermal-Aware Floorplanning at the Microarchitectural Level K. Sankaranarayanan, S. Velusamy, M. Stan, K. Skadron Journal.
Slide 1 U.Va. Department of Computer Science LAVA Architecture-Level Power Modeling N. Kim, T. Austin, T. Mudge, and D. Grunwald. “Challenges for Architectural.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
CAD for Physical Design of VLSI Circuits
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
An evaluation of HotSpot-3.0 block-based temperature model
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
INTRODUCTION TO CHROMATOGRAPY
Regularity-Constrained Floorplanning for Multi-Core Processors Xi Chen and Jiang Hu (Department of ECE Texas A&M University), Ning Xu (College of CST Wuhan.
Scalable Symbolic Model Order Reduction Yiyu Shi*, Lei He* and C. J. Richard Shi + *Electrical Engineering Department, UCLA + Electrical Engineering Department,
1 ISAT Module III: Building Energy Efficiency Topic 7: Transient Heating and Air Conditioning Loads  Thermal Admittance  Intermittent Heating 
Optimal XOR Hashing for a Linearly Distributed Address Lookup in Computer Networks Christopher Martinez, Wei-Ming Lin, Parimal Patel The University of.
Thermal Analysis and PCB design for GaN Power Transistor Pedro A. Rivera, Daniel Costinett Universidad del Turabo, University of Tennessee A more reliable,
Stochastic Current Prediction Enabled Frequency Actuator for Runtime Resonance Noise Reduction Yiyu Shi*, Jinjun Xiong +, Howard Chen + and Lei He* *Electrical.
1 Power/Temperature analysis of register file architecture for superscalar processor Hardware/Software co-design term-end project R 水沼 仁志 2004/06/08.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
HEAT TRANSFER FINITE ELEMENT FORMULATION
DTM and Reliability High temperature greatly degrades reliability
Thermal-aware Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work was supported.
Routability-driven Floorplanning With Buffer Planning Chiu Wing Sham Evangeline F. Y. Young Department of Computer Science & Engineering The Chinese University.
Multi-Split-Row Threshold Decoding Implementations for LDPC Codes
On Exploiting Transient Social Contact Patterns for Data Forwarding in Delay-Tolerant Networks 1 Wei Gao Guohong Cao Tom La Porta Jiawei Han Presented.
Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects C. Long, L. J. Simonson, W. Liao and L. He EDA Lab, EE Dept.
Sunpyo Hong, Hyesoon Kim
Department of Electrical and Computer Engineering University of Wisconsin - Madison Optimizing Total Power of Many-core Processors Considering Voltage.
IPR: In-Place Reconfiguration for FPGA Fault Tolerance Zhe Feng 1, Yu Hu 1, Lei He 1 and Rupak Majumdar 2 1 Electrical Engineering Department 2 Computer.
1 Floorplanning of Pipelined Array (FoPA) Modules using Sequence Pairs Matt Moe Herman Schmit.
1 RELOCATE Register File Local Access Pattern Redistribution Mechanism for Power and Thermal Management in Out-of-Order Embedded Processor Houman Homayoun,
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
CS203 – Advanced Computer Architecture
The Early Days of Automatic Floorplan Design
1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,
Die Stacking (3D) Microarchitecture Bryan Black, Murali Annavaram, Ned Brekelbaum, John DeVale, Lei Jiang, Gabriel H. Loh1, Don McCauley, Pat Morrow, Donald.
Chapter 3: One-Dimensional Steady-State Conduction
Chapter 4 Interconnect.
An Improved Split-Row Threshold Decoding Algorithm for LDPC Codes
Chapter 5b Stochastic Circuit Optimization
Jinghong Liang,Tong Jing, Xianlong Hong Jinjun Xiong, Lei He
Chapter 5b Stochastic Circuit Optimization
An Automated Design Flow for 3D Microarchitecture Evaluation
Yiyu Shi*, Jinjun Xiong+, Howard Chen+ and Lei He*
Yiyu Shi*, Wei Yao*, Jinjun Xiong+ and Lei He*
Die Stacking (3D) Microarchitecture -- from Intel Corporation
Yiyu Shi*, Jinjun Xiong+, Chunchen Liu* and Lei He*
Yiyu Shi*, Jinjun Xiong+, Chunchen Liu* and Lei He*
Presentation transcript:

Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load *Chunta Chu, Xinyi Zhang, Lei He, and Tom Tong Jing Electrical Engineering Department University of California, Los Angeles, 90095, CA This work was partially supported by NSF CAREER award and a UC MICRO grant sponsored by Altera and Intel Chunta Chu is now with Apache Design Solutions

Outline Motivation Problem formulation and models Experimental results Conclusion 1

Motivation Ever increasing integration level and clock rate lead to increased temperature and temperature gradient  Extra clock skew and performance degradation  Excessive leakage  Increased cooling cost Increased clock needs interconnect pipelining Microprocessor floorplan should smooth the temperature gradient and also take into account interconnect pipelining 2

Existing Work Quick but not accurate [Han: TACS ’ 05]  Model temperature by deterministic heat diffusion model  No consideration of interconnect pipelining More accurate but far less efficient [Sankaranarayanan: JILP ’ 05] and [Nookala: ISLPED ’ 06]  Calculate temperature for each potential floorplanning  No explicit interconnect pipelining 3

Primary Contribution An efficient yet effective floorplanning  Explicit modeling of interconnect pipelining by TPWL model [Long:DAC ’ 04]  Stochastic heat diffusion model to avoid temperature calculation  Reduce highest temperature by up to 3 o C and run up to 27x faster compared with the existing most accurate solution [Sankaranarayanan: JILP ’ 05]] 4

Outline Motivation Problem formulation and models Experimental results Conclusion 5

Problem Formulation Find a floorplanning for given soft modules of a microprocessor Minimize  where CPI is average cycles per instruction 6

CPI Model [He-Long,DAC’04 ] Pre-calculate CPI for a number of floorplans based on predicted trajectory in the solution space Table lookup to calculate CPI for a new floorplan by interpolation based on its distance to floorplans with known CPI Less than 3% error compared to cycle accurate uArch simulation 7

Deterministic Heat Diffusion Model [Han: TACS ’ 05] The heat diffusion between two modules M i and M j and are the average power densities over time  The total heat diffusion for module M i The bigger the heat diffusion is, the smaller the temperature gradient and Tmax are 8 H H (a) (b)

Recast of Problem Formulation Find a floorplanning for given soft modules of a microprocessor Minimize  9

Primary Limitation of Deterministic Heat Diffusion Average power density ignores power load correlation (a) Transient temperature is higher when power is positively correlated (b) Transient temperature is lower when power is negatively correlated 10

Power Correlation of Alpha-chip in SimpleScalar (a) Positively correlated (b) uncorrelated 11

Calculation of Power Correlation Treat power for each module as a stochastic process Obtain samples of the above stochastic process for each module as transient power simulated over SPEC2000 benchmarks Compute power correlation between modules as co-variance between the above stochastic processes 12

Correlation between Modules 1Decode2Branch3RAT4RUU 5LSQ6IALU17IALU28IALU3 9IntReg10IL111DL112IALU4 13FPAdd14FPMul15FPReg16L2_1 17L2_218L2_3 13

Correlation between Modules 1Decode2Branch3RAT4RUU 5LSQ6IALU17IALU28IALU3 9IntReg10IL111DL112IALU4 13FPAdd14FPMul15FPReg16L2_1 17L2_218L2_3 14 Correlation between modules 3 and10 is 0.9

Other Limitations: It Ignores Dead Space Without considering dead space may lead to higher Temperature. 15 Floorplan has dead spaces and some modules can diffuse more heat to the dead space. Ex.M1’s temperature is lower in (a) than that in (b)

Other Limitations: It ignores module geometry M1 has higher temperature in (a) than in (b), since M2’s area is smaller than M3’s area Power density: M1>>M4>M2=M3 16 Besides shared length between modules, the depth of the adjacent module also have to be considered.

Other Limitations: It ignores border effect Module can diffuse different amount of heat to the border depending on the package design 17

Stochastic Heat Diffusion Model Given m modules, n dead spaces, and power vector Pi=[p i1, …,p iT ] over T time steps for module M i  Mean power density for module M i A i is the area for module M i, P Di is the transient power density vector, which equals P i /A i. E(X) is the expectation value of vector X 18

Stochastic Heat Diffusion Model (Cond.)  If the adjacent module M j or dead space N j is totally inside the window, we modify P Dj to 19

Stochastic Heat Diffusion Model (Cond.)  Heat diffusion to the adjacent modules L ij :shared length bewteen M i and M j  Heat diffusion to the adjacent dead spaces, C ij :shared length between M i and N j  Heat diffusion to the border B i :shared length between M i and the border Con_lateral and Con_adjacent: unit thermal conductance 20

Stochastic Heat Diffusion Model (Cond.) Given m modules, n dead spaces,  Power density covariance between M i and M j E(P Di,P Dj ) is the expectation value of P Di P Dj over T timesteps  The standard deviation of the total heat diffusion for module M i 21

Stochastic Heat Diffusion Model (Cond.) The total stochastic heat diffusion for M i Given Z potential hottest modules, the total stochastic heat flow is  W i : weight proportional to 22

Outline Motivation Problem formulation and models Experimental results Conclusion 23

Implementation and Experiment uP 90nm Issue Width4 Die Area (mm 2 )100 Die Thickness (mm)0.5 Heat Spreader (mm 2 )900 Heat Sink (mm 2 ) The floorplanner uses sequence pair based simulated annealing [PARQUET] Experiments consider  SPEC2000 benchmarks  One SuperScalar processors for 90nm technology  Modules are soft and the aspect ratio is between 0.33 ~3 and L2 is partitioned into three modules

Comparison with HotSpot tool [JILP ’ 05] [JILP’05 ] directly calculates temperature but ignores interconnect piplelining Our model  Reduces temperature by up to 3 o C with 1.34% increase in area  Runs up to 27x faster uP in 90nm Tmax( o C)Area(mm 2 )(WS)Runtime(s) [JILP’05] (4.7%)2300 Ours (5.6%)85 Impact -3.2%+1.34%1/27x 25

Impact of Thermal Modeling Our stochastic thermal model can reduce temperature up to 8.9 o C  Compared to the thermal-oblivious floorplanner Compared with the deterministic model, our model obtains up to 3.2 o C reduction of the on-chip peak temperature, and 1.13x better CPI performance. uP in 90nm Obj.CPITmax( o C)Area(mm 2 )WS(%) BestAvgBestAvgBestAvg AC (3.05)122.4(6.89) ACH d % % % % 122.0(6.67) +2.9% 125.3(9.08) +2.3% ACH s % % % % 121.1(6.10) +2.2% 123.2(7.36) +0.6% Obj: A: area C: CPI H d : [Han:TACS’05] H s : Ours 26

Conclusions We have developed a stochastic heat diffusion model to effectively capture correlation between transient power over workload We have also developed an efficient yet effective thermal-aware uP floorplanning In the future, we will extend to 3D integration and multi-core processors 27