1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,

Slides:



Advertisements
Similar presentations
THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.
Advertisements

International Symposium on Low Power Electronics and Design Qing Xie, Mohammad Javad Dousti, and Massoud Pedram University of Southern California ISLPED.
Tunable Sensors for Process-Aware Voltage Scaling
-1- VLSI CAD Laboratory, UC San Diego Post-Routing BEOL Layout Optimization for Improved Time- Dependent Dielectric Breakdown (TDDB) Reliability Tuck-Boon.
Efficient Design and Analysis of Robust Power Distribution Meshes Puneet Gupta Blaze DFM Inc. Andrew B. Kahng.
NTHU-CS VLSI/CAD LAB TH EDA De-Shiuan Chiou Da-Cheng Juan Yu-Ting Chen Shih-Chieh Chang Department of CS, National Tsing Hua University, Taiwan Fine-Grained.
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
PERFORMANCE OPTIMIZATION OF SINGLE-PHASE LEVEL-SENSITIVE CIRCUITS BARIS TASKIN AND IVAN S. KOURTEV UNIVERSITY OF PITTSBURGH DEPARTMENT OF ELECTRICAL ENGINEERING.
Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
Rasit Onur Topaloglu University of California San Diego Computer Science and Engineering Department Ph.D. candidate “Location.
Yuanlin Lu Intel Corporation, Folsom, CA Vishwani D. Agrawal
Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.
Numerical Methods for Engineers MECH 300 Hong Kong University of Science and Technology.
Statistical Crosstalk Aggressor Alignment Aware Interconnect Delay Calculation Supported by NSF & MARCO GSRC Andrew B. Kahng, Bao Liu, Xu Xu UC San Diego.
Interconnect Network Modeling Motivation: Investigate the response of a complex interconnect network to external RF interference or internal coupling between.
NTHU-CS VLSI/CAD LAB TH EDA Student : Da-Cheng Juan Advisor : Shih-Chieh Chang Fine-Grained Sleep Transistor Sizing Algorithm for Leakage Power Minimization.
Chung-Kuan Cheng†, Andrew B. Kahng†‡,
EECS Department, Northwestern University, Evanston Thermal-Induced Leakage Power Optimization by Redundant Resource Allocation Min Ni and Seda Ogrenci.
Toward Performance-Driven Reduction of the Cost of RET-Based Lithography Control Dennis Sylvester Jie Yang (Univ. of Michigan,
Efficient Decoupling Capacitance Budgeting Considering Operation and Process Variations Yiyu Shi*, Jinjun Xiong +, Chunchen Liu* and Lei He* *Electrical.
CS 7810 Lecture 15 A Case for Thermal-Aware Floorplanning at the Microarchitectural Level K. Sankaranarayanan, S. Velusamy, M. Stan, K. Skadron Journal.
University of Michigan Electrical Engineering and Computer Science 1 Online Timing Analysis for Wearout Detection Jason Blome, Shuguang Feng, Shantanu.
Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load *Chunta Chu, Xinyi Zhang, Lei He, and Tom Tong Jing Electrical.
Noise and Delay Uncertainty Studies for Coupled RC Interconnects Andrew B. Kahng, Sudhakar Muddu † and Devendra Vidhani ‡ UCLA Computer Science Department,
Decoupling Capacitance Allocation for Power Supply Noise Suppression Shiyou Zhao, Kaushik Roy, Cheng-Kok Koh School of Electrical & Computer Engineering.
More Realistic Power Grid Verification Based on Hierarchical Current and Power constraints 2 Chung-Kuan Cheng, 2 Peng Du, 2 Andrew B. Kahng, 1 Grantham.
Simulated Annealing G.Anuradha. What is it? Simulated Annealing is a stochastic optimization method that derives its name from the annealing process used.
UC San Diego / VLSI CAD Laboratory Reliability-Constrained Die Stacking Order in 3DICs Under Manufacturing Variability Tuck-Boon Chan, Andrew B. Kahng,
-1- UC San Diego / VLSI CAD Laboratory Methodology for Electromigration Signoff in the Presence of Adaptive Voltage Scaling Wei-Ting Jonas Chan, Andrew.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Logic Optimization Mohammad Sharifkhani. Reading Textbook II, Chapters 5 and 6 (parts related to power and speed.) Following Papers: –Nose, Sakurai, 2000.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters Q. Tang, T. Mukherjee, Sandeep K. S. Gupta Department of Computer.
SoC TAM Design to Minimize Test Application Time Advisor Dr. Vishwani D. Agrawal Committee Members Dr. Victor P. Nelson, Dr. Adit D. Singh Apr 9, 2015.
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
A Power Grid Analysis and Verification Tool Based on a Statistical Prediction Engine M.K. Tsiampas, D. Bountas, P. Merakos, N.E. Evmorfopoulos, S. Bantas.
Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
RFIC – Atlanta June 15-17, 2008 RMO1C-3 An ultra low power LNA with 15dB gain and 4.4db NF in 90nm CMOS process for 60 GHz phase array radio Emanuel Cohen.
Outline Introduction: BTI Aging and AVS Signoff Problem
TSV-Constrained Micro- Channel Infrastructure Design for Cooling Stacked 3D-ICs Bing Shi and Ankur Srivastava, University of Maryland, College Park, MD,
DTM and Reliability High temperature greatly degrades reliability
Analysis of Cache Tuner Architectural Layouts for Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
Thermal-aware Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work was supported.
On Reducing Mesh Delay for Peer- to-Peer Live Streaming Dongni Ren, Y.-T. Hillman Li, S.-H. Gary Chan Department of Computer Science and Engineering The.
Power Integrity Test and Verification CK Cheng UC San Diego 1.
Variation. 2 Sources of Variation 1.Process (manufacturing) (physical) variations:  Uncertainty in the parameters of fabricated devices and interconnects.
1 Thermal Management of Datacenter Qinghui Tang. 2 Preliminaries What is data center What is thermal management Why does Intel Care Why Computer Science.
Attacking the Power-Wall by Using Near-threshold Cores Liang Wang
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
BME 353 – BIOMEDICAL MEASUREMENTS AND INSTRUMENTATION MEASUREMENT PRINCIPLES.
Department of Electrical and Computer Engineering University of Wisconsin - Madison Optimizing Total Power of Many-core Processors Considering Voltage.
-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,
-1- Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products Samyoung Bang #, Kwangsoo Han ‡, Andrew B.
MultiScale Sensing: A new paradigm for actuated sensing of dynamic phenomena Diane Budzik Electrical Engineering Department Center for Embedded.
-1- UC San Diego / VLSI CAD Laboratory Optimal Reliability-Constrained Overdrive Frequency Selection in Multicore Systems Andrew B. Kahng and Siddhartha.
CS203 – Advanced Computer Architecture
Characterizing Processors for Energy and Performance Management Harshit Goyal and Vishwani D. Agrawal Department of Electrical and Computer Engineering,
PROCEED: Pareto Optimization-based Circuit-level Evaluation Methodology for Emerging Devices Shaodi Wang, Andrew Pan, Chi-On Chui and Puneet Gupta Department.
SizeCap: Efficiently Handling Power Surges for Fuel Cell Powered Data Centers Yang Li, Di Wang, Saugata Ghose, Jie Liu, Sriram Govindan, Sean James, Eric.
Multiscale energy models for designing energy systems with electric vehicles André Pina 16/06/2010.
Energy Efficient Power Distribution on Many-Core SoC
Yiyu Shi*, Jinjun Xiong+, Chunchen Liu* and Lei He*
Yiyu Shi*, Jinjun Xiong+, Chunchen Liu* and Lei He*
Lev Finkelstein ISCA/Thermal Workshop 6/2004
Presentation transcript:

1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California, Los Angeles This work is supported in part by NSF Variability Expedition grant CCF

Outline Overview Accumulation Model and Management Policies Problem Formulation Experimental Results Conclusion 2

Hardware Reliability Margin 3 Parametric margin Voltage/Frequency or sign-off corners E.g., BTI, HCI Physical margin Metal width, layout spacing E.g., current-dependent minimum metal width for EM Typically worst-case driven Mostly derived at hardware design time Uncertainty in workload, circuit operating points etc.

Reliability vs. Operating Points Most reliability-related phenomena depends heavily on the circuit operating points Voltage, Frequency, Temperature etc. 4

Dynamic Range of Operations Efficiency needed for the Dark Silicon Era Multi/Many-core design with less powerful cores Low voltage/current/power -> less margin “Turbo X”: Turbo Boost (Intel), Turbo Core (AMD) Under certain conditions High voltage/current/power-> more margin 5 Moderate Parallel Intensive Single-thread Workload Low stress states Reliability margin Known pessimistic Known optimistic High stress states

Dark Silicon Contexts Pessimism depends on the difference between peak power/temperature and sustainable power/temperature Quantify silicon “darkness” Dark ratio: Power constraint Limit on maximum instantaneous power Thermal constraint Limit on maximum on-chip temperature 6

Margining Methodology 7 Formulate as workload optimization Maximize the reliability degradation Still meets the power/thermal constraints

Outline Overview Accumulation Model and Management Policies Problem Formulation Experimental Results Conclusion 8

Dynamic Reliability Model Most reliability models are static Derived for constant voltage/current/temperature Need a highly dynamic model for optimization Comparing different degradation scenarios 9 P1 P3 t v P1 P3 P2 t v vs.

Accumulation Model Some can be derived from the model itself E.g., EM can be modeled by effective current density J eff Other can be derived by simulator E.g. Worst-case BTI degradation can be derived by simulating different power state ordering and picking the worst-case Fitting and interpolation can also be used 10 Time spent in each power states Worst-case degradation at the end of lifetime Accumulation Model

Spatial problem vs. Temporal problem With accumulation model, reliability degradation can be modeled as temporal distribution problems The workload and power/thermal constraints are spatial problems 11 P1P3P1 P2P1P2 P1 P2 P3 t v

System Management Policy We assume a fair round-robin policy Iterate scheduling priorities among all processor cores Iterating frequency can be of hours to days Assuming this policy because: Simple: open-loop, reasonable to assume at hardware design time Effective: sufficient iterations to balance workload during typical hardware life time of multiple years Pessimistic: more sophisticated policies are likely to perform better, i.e., margin is pessimistic 12

Bridging Spatial and Temporal Problems Management policy will iterate workload among all cores Spatial distribution is equivalent to temporal distribution 13 P1P3P1 P2P1P2 P1 P2 P3 t v Spatial constraints Temporal distribution

Outline Overview Accumulation Model and Management Policies Problem Formulation Experimental Results Conclusion 14

Optimization Under Power Constraints x is the number of cores at each power states Also the input to the accumulation model f(x) P is the power corresponding to the power states P max is the power constraint Formulated as Integer Linear Programing (ILP) problem 15

Thermal Problem Thermal limit can be reached by two scenarios Heat up then cool down (left) Constant temperature (right) The constant stress will result in worse degradation Higher average temperature More time in high power state 16

Optimization Under Thermal Constraints S is time spend in each power states for each cores A is the temperature sensitivity matrix Temperature increase per unit power T max is the maximum temperature constraint T bak is the background power for each cores Formulated as Linear Programming (LP) problem 17

Outline Overview Accumulation Model and Management Policies Problem Formulation Experimental Results Conclusion 18

Experimental Setup Power model Based on a commercial processor benchmark Using libraries characterized at different supply voltages from 0.6V to 0.9V Thermal model Using HotSpot simulator Consider the cases of 2x2, 4x4, 8x8 and 16x16 cores BTI: both NBTI and PBTI EM: metal sized to have the same current density (MTTF) 19

Local Power Network EM Results 20 Power constraint Thermal constraint 40% reduction

Signal Wire EM Results 21 Power constraint Thermal constraint 60% reduction

BTI Results 22 Power constraint Thermal constraint 20% reduction

Conclusion We propose hardware reliability margining methodology for chips in the dark silicon era We formulate the margining problem under power and thermal constraints Experimental results show that at 60% dark ratio, our method can achieve 40%-60% reduction in metal width margin and 20% reduction in BTI delay margin 23

Backup slides 24

EM Accumulation Model Effective current density: For local power mesh Jeff can be calculated by average power consumed For signal wires: J eff is proportional to V * f 25

BTI Accumulation Model Two steps: Identify the worst-case ordering by simulator Worst BTI degradation happen when power states are applied in increasing order of stress voltages Fitting the accumulation model First pick a set of power state distribution sample x Simulate the degradation g(x) Assuming the fitting function is Formulated as: 26