DTM and Reliability High temperature greatly degrades reliability

Slides:

Advertisements

Similar presentations

THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.

Advertisements

International Symposium on Low Power Electronics and Design Qing Xie, Mohammad Javad Dousti, and Massoud Pedram University of Southern California ISLPED.

Performance, Energy and Thermal Considerations of SMT and CMP architectures Yingmin Li, David Brooks, Zhigang Hu, Kevin Skadron Dept. of Computer Science,

CPE555A: Real-Time Embedded Systems

Introduction and Background  Power: A Critical Dimension for Embedded Systems  Dynamic power dominates; static /leakage power increases faster  Common.

CML CML Presented by: Aseem Gupta, UCI Deepa Kannan, Aviral Shrivastava, Sarvesh Bhardwaj, and Sarma Vrudhula Compiler and Microarchitecture Lab Department.

Real-Time Scheduling CIS700 Insup Lee October 3, 2005 CIS 700.

Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad.

- Sam Ganzfried - Ryan Sukauye - Aniket Ponkshe. Outline Effects of asymmetry and how to handle them Design Space Exploration for Core Architecture Accelerating.

Aleksandra Tešanović Low Power/Energy Scheduling for Real-Time Systems Aleksandra Tešanović Real-Time Systems Laboratory Department of Computer and Information.

Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms Lin Huang, Feng Yuan and Qiang Xu Reliable Computing Laboratory Department.

Investigating the Effect of Voltage- Switching on Low-Energy Task Scheduling in Hard Real-Time Systems Paper review Presented by Chung-Fu Kao.

Bogdan Tanasa, Unmesh D. Bordoloi, Petru Eles, Zebo Peng Department of Computer and Information Science, Linkoping University, Sweden December 3, 2010.

1 of 30 June 14, 2000 Scheduling and Communication Synthesis for Distributed Real-Time Systems Paul Pop Department of Computer and Information Science.

Reliability-Aware Frame Packing for the Static Segment of FlexRay Bogdan Tanasa, Unmesh Bordoloi, Petru Eles, Zebo Peng Linkoping University, Sweden 1.

CSE 421 Algorithms Richard Anderson Lecture 6 Greedy Algorithms.

1 Temperature-Aware Resource Allocation and Binding in High Level Synthesis Authors: Rajarshi Mukherjee, Seda Ogrenci Memik, and Gokhan Memik Presented.

System-Wide Energy Minimization for Real-Time Tasks: Lower Bound and Approximation Xiliang Zhong and Cheng-Zhong Xu Dept. of Electrical & Computer Engg.

EE 249, Fall Discussion: Scheduling Haibo Zeng Amit Mahajan.

By Group: Ghassan Abdo Rayyashi Anas to’meh Supervised by Dr. Lo’ai Tawalbeh.

Thermal-Aware SoC Test Scheduling with Test Set Partitioning and Interleaving Zhiyuan He 1, Zebo Peng 1, Petru Eles 1 Paul Rosinger 2, Bashir M. Al-Hashimi.

Temperature-Aware Design Presented by Mehul Shah 4/29/04.

ECE 510 Brendan Crowley Paper Review October 31, 2006.

CS 7810 Lecture 15 A Case for Thermal-Aware Floorplanning at the Microarchitectural Level K. Sankaranarayanan, S. Velusamy, M. Stan, K. Skadron Journal.

Embedded System Design Framework for Minimizing Code Size and Guaranteeing Real-Time Requirements Insik Shin, Insup Lee, & Sang Lyul Min CIS, Penn, USACSE,

Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load *Chunta Chu, Xinyi Zhang, Lei He, and Tom Tong Jing Electrical.

Tradeoff Analysis for Dependable Real-Time Embedded Systems during the Early Design Phases Junhe Gan.

Balancing energy demand and supply without forecasts: online approaches and algorithms Giorgos Georgiadis.

VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.

Embedded System Design Framework for Minimizing Code Size and Guaranteeing Real-Time Requirements Insik Shin, Insup Lee, & Sang Lyul Min CIS, Penn, USACSE,

Baoxian Zhao Hakan Aydin Dakai Zhu Computer Science Department Computer Science Department George Mason University University of Texas at San Antonio DAC.

Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.

Efficient and Scalable Computation of the Energy and Makespan Pareto Front for Heterogeneous Computing Systems Kyle M. Tarplee 1, Ryan Friese 1, Anthony.

Approximation Algorithms for Task Allocation with QoS and Energy Considerations Bader N. Alahmad.

© J. Christopher Beck Lecture 5: Project Planning 2.

Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and David H.C. Du Dept. of.

Company name KUAS HPDS A Realistic Variable Voltage Scheduling Model for Real-Time Applications ICCAD Proceedings of the 2002 IEEE/ACM international conference.

PAPER PRESENTATION Real-Time Coordination of Plug-In Electric Vehicle Charging in Smart Grids to Minimize Power Losses and Improve Voltage Profile IEEE.

1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.

Utility Maximization for Delay Constrained QoS in Wireless I-Hong Hou P.R. Kumar University of Illinois, Urbana-Champaign 1 /23.

Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.

The 32nd IEEE Real-Time Systems Symposium Meeting End-to-End Deadlines through Distributed Local Deadline Assignment Shengyan Hong, Thidapat Chantem, X.

P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.

An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science,

CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.

1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.

Thermal-Aware Scheduling for Real-time Applications in Embedded Systems Adam Lewis, Soumik Ghosh, and N.-F. Tzeng (A) Approved for public release; distribution.

Thermal-aware Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work was supported.

Energy-Aware Scheduling for Aperiodic Tasks on Multi-core Processors Dawei Li and Jie Wu Department of Computer and Information Sciences Temple University,

1 Real-Time Scheduling. 2Today Operating System task scheduling –Traditional (non-real-time) scheduling –Real-time scheduling.

CSCI1600: Embedded and Real Time Software Lecture 24: Real Time Scheduling II Steven Reiss, Fall 2015.

Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.

Multimedia Computing and Networking Jan Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs.

Optimal Superblock Scheduling Using Enumeration Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis

Migration Cost Aware Task Scheduling Milestone Shraddha Joshi, Brian Osbun 10/24/2013.

Department of Electrical and Computer Engineering University of Wisconsin - Madison Optimizing Total Power of Many-core Processors Considering Voltage.

Task Mapping and Partition Allocation for Mixed-Criticality Real-Time Systems Domițian Tămaș-Selicean and Paul Pop Technical University of Denmark.

ECE692 Course Project Proposal Cache-aware power management for multi-core real-time systems Xing Fu Khairul Kabir 16 September 2009.

-1- UC San Diego / VLSI CAD Laboratory Optimal Reliability-Constrained Overdrive Frequency Selection in Multicore Systems Andrew B. Kahng and Siddhartha.

Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.

1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,

Optimization of Time-Partitions for Mixed-Criticality Real-Time Distributed Embedded Systems Domițian Tămaș-Selicean and Paul Pop Technical University.

Introduction | Model | Solution | Evaluation

Fault-Tolerant NoC-based Manycore system: Reconfiguration & Scheduling

Dynamic Voltage Scaling

Richard Anderson Lecture 6 Greedy Algorithms

Processes and operating systems

Richard Anderson Lecture 7 Greedy Algorithms

Richard Anderson Winter 2019 Lecture 7

Richard Anderson Autumn 2015 Lecture 7

Presentation transcript:

Reliability-aware Thermal Management for Hard Real-time Applications on Multi-core Processors I like to thank the session chair for his generous introduction and good afternoon to everyone. Today I will be presenting techniques for maximizing the reliability of multi-core processors for real-time applications through the use of thermal management techniques. Vinay Hanumaiah1 and Sarma Vrudhula2 1Electrical Engineering , Arizona State University 2Computer Science Engineering , Arizona State University

DTM and Reliability High temperature greatly degrades reliability high peak temperature large no. of thermal cycles 10°C – 15°C increase reduces reliability by half Multi-cores have large temporal and spatial thermal variations higher gradients  higher reliability degradation requires invoking DTM more often DTM allows complex objectives and granular control First, let us understand the need for dynamic thermal management or DTM to enhance the reliability of processors. It has been shown that temperature affects reliability to a great extent. This degradation of reliability can be due to: the effect of high peak temperatures, large number of thermal cycles It has been estimated that a 10—15 C increase in temperature can decrease the reliability by half. In particular multi-core processors have lot more temporal and spatial thermal variations than single-core processors, this leads to higher temperature gradients, which degrades reliability much more. In order to avoid this degradation in reliability, DTM has to be invoked more often than for single-core processors Finally, DTM is versatile in incorporating complex objectives, e.g. it is possible to have a DTM technique that can address combined minimization of peak temperature and thermal cycles. Also, DTM has lot more control knobs, like speed, voltage, migration, which can be used suitably to improve reliability.

Related Work Effects of temperature on reliability Coskun:Sigmetrics’07 Lu:IEEEMICRO’05 Min. peak temperature with deadline constraints Chantem:DATE’08 (many-core, task allocation), Jayaseelan:ICCAD’08 (single, task sequence) Maximize throughput Wang:ECRTS’06 (thermal, timing, single-core) Murali:CODES’07 (thermal, no deadlines, many-core) We briefly mention the related literature in the area of DTM for reliability. The first two works does comprehensive research on the effects of temperature on reliability and the need for DTM to address reliability There have been many other works in the area of DTM related to our work, but they lack in addressing these: Do not consider transient cores speed determination. This is a much harder problem than allocation and sequencing of tasks as we will see it later Leakage power dependence on temperature Do not consider both the thermal and deadline constraints for a many core processor.

What is our Contribution? Determine optimal speed profile For many core processor Minimize peak temperature Satisfy task deadlines, while considering start times include leakage dependence on temperature Here is our contribution through this work. We propose the derivation of an optimal transient speed profile for a multi-core processor to minimize the peak temperature of operation while satisfying both task deadlines and start times We use accurate power and thermal models, including leakage dependence on temperature

Power and Thermal Model Figure on the left shows the full hotspot RC thermal model which uses the electro-thermal analogy modeling resistance and capacitances for heat spreading and storing respectively has the granularity of a functional block has several layers like die, TIM, heat spreader, sink and cooling to account for differential heat conduction Figure on the right shows a simplified thermal model used in our work in which we ignore the lateral resistances between the functional blocks as they are four times higher than the vertical resistances, hence conduct less heat, ignore the die capacitances as our tasks are much longer than the die thermal time constant, order of few ms, thus saturating the capacitances lumped package nodes An important thing to note is that this simplified model is required in our work as it yields an analytical equation for estimating the no. of instruction completed for a given interval of time. Also note that we are not compromising the accuracy of performance or temperature prediction as we have demonstrated that the error in using this simplified model is less than 6%. ignores lateral resistance ignores die capacitances Lumped package < 6% loss in accuracy required for analytical analysis Full HotSpot model Simplified thermal model

Problem Formulation Objective Given Constraints Assumptions Find cores speed profile that minimizes peak temperature Given n tasks, instruction length, power profile n cores, RC thermal model Constraints Start times and deadlines Assumptions Independent and non-identical threads One thread per core Simplified thermal model Here is our problem formulation: Our objective is to determine the time-varying speed profile of all cores such that they minimize the peak temperature We are given n tasks which are run on n cores. We know their instruction lengths, power profiles and also the thermal characteristics of cores The derived speed profile has to satisfy the start and the deadlines of tasks Our assumptions are consistent with GPUs viz. one thread per core and threads are independent of each other. The other assumption is the use of the simplified thermal model described before.

Solution Outline Step 1 – Find parametric optimal speed profile [Hanumaiah:DATE’09] Fixed maximum temperature No deadlines Step 2 – find Parameters in Step 1 for every slot To satisfy task deadlines for given initial package temperature Now we present the outline of the solution to our problem formulation. In the step 1 of our solution, we determine the parametric form of the optimal speed profile for a fixed maximum temperature, but with no deadline constraints For step 2, we divide the entire duration of tasks into several slots based on the task start times and deadlines as shown in this figure. Next, we determine the parameters of the solution in step 1 to satisfy task deadlines for every time slot for a given initial package temperature of the slot. Note that the optimality for every time slot ensures the optimality for the entire duration. By optimality, we mean achieving min. peak temperature, while satisfying boundary conditions.

Solution Outline - contd Step 3 – For every slot find initial package temperature to satisfy start times also determine global min peak temperature Step 2 can be solved only if we know the package temperature at the beginning of each slot. So in step 3 we determine the initial package temperatures such that the deadlines are satisfied and the global peak temperature is minimized.

Step1: Fixed max. temp., no deadlines Here is the optimal speed profile for a core in the Step 1, i.e. for a fixed maximum temperature and with no deadlines. The optimal speed is selected such that either the speed is at the maximum or the temperature is maintained at the maximum. The corresponding speed equation is given by this. where s_i0 is the initial speed, tau_p is the package thermal time constant and s_i,ss is the steady-state speed. Note that other than the steady-state speed, all other terms are known apriori at the beginning of a task execution.

Step 2: Fixed max. temp., with deadlines Need for Step 2 However, the step 1 by itself is not optimal if the tasks have deadlines as shown in this figure. In the top figure, we execute a pair of tasks according to Step 1, with the task 2 in blue having its deadline at t_d,2. We see that even though the speed policy in Step 1 achieves shorter makespan, the deadline for task 2 was violated. The desired execution is depicted in the bottom figure. Observe that the execution speed of task 1 is lowered before the deadline t_d,2 to allow task 2 to meet its deadline and later the speed of task 1 is increased. Notice that the makespan in the bottom figure is higher. Thus the speed profile of Step 1 has to be modified in order to meet task deadlines. This is the core of step 2.

Step 2: Fixed max. temp., with deadlines Find optimal speed profile for the critical task Determine Tpkg over the slot In step 2, we first identify the critical task for a slot, i.e. the task with its deadline in the current slot. Determine the steady-state speed of this critical task such that it completes its execution within the deadline. Notice that, this is the equation from Step 1. Determine the corresponding package temperature at all times. This is a straight forward computation as we know the power consumption and the temperature of any core. Once we know the package temperature, the total power consumption P_T can be determined from the following differential equation for package temperature computation. Find the total power PT for corresponding Tpkg

Step 2: Power allocation scheme Let tsched = unit scheduling interval Determine approx. dTpkg(tsched)/dt Find corresponding PT (tsched) PT (tsched) = PT (tsched) – Pcritical (tsched) Sort tasks according to nearest deadline Allocate max. power Pmax,i (tsched) to the earliest task PT (tsched) = PT (tsched) – Pmax,i (tsched) Continue until PT (tsched) =0 After the speed of the critical task is determined, we need to determine rest of the core speeds. For this, we define t_sched as the unit speed scheduling interval. For every scheduling interval within a slot, We determine an approximate dT_pkg/dt and the corresponding total power budget P_T for the scheduling interval. Subtracting the power consumption of the critical task, we get the remaining total power budget that needs to be allocated to the remaining tasks. Now sort the tasks according to the deadlines. Allocate the max feasible power constrained by the maximum temperature to the task with the earliest deadline as it corresponds to the maximum feasible speed of that task Continue this process for other cores in the order of their deadlines, until the power budget is completely utilized. Notice that this is a greedy approach which ensures that the tasks with the earliest deadline s are satsified first.

Step 3: Satisfy Start Times Instruction completed in each slot is monotonic with initial package temperature of slots with the maximum temperature In the step 3, we satisfy the last constraint, viz., the start times of tasks. We make an observation that the instruction completed in each slot is a monotonic function of the initial package temperature, e.g., if T_p1 is increased, then instructions completed in the future slots 2 and 3 are reduced monotonically with T_p1. On the other hand, increasing T_p1, increases the instructions executed in the previous slot 1 correspondingly. This is true with decreasing T_p1 also. Similarly, increasing the maximum temperature allows more instructions to be executed and decreasing the maximum temperature decreases the execution rate of instructions. Thus determining the initial package temperatures of slots for minimizing the global peak temperature can be solved through quasiconcave optimization. Can be solved optimally as quasiconcave (monotonic) optimization

Experimental Setup Multicore version of Alpha 21264 HotSpot – thermal model, PTScalar – power model SPEC benchmarks Dynamic power – 230 W, leakage power – 60 W Scheduling interval – 10 ms Now we move on to the experimental results. This is our experimental setup. We constructed a multi-core version of Alpha 21264, by replicating a single Alpha core and scaling them to fit the die area of the single core. We used hotspot thermal model; and PTScalar simulator for obtaining power profiles of SPEC benchmarks. We constrained our power numbers as shown here. The scheduling interval was set at 10 ms, the die thermal time constant.

Trade-off: Peak Temperature vs Deadlines In our first experiment, we tested our algorithm for two scenarios of tasks. In the first scenario, the tasks had relaxed deadlines. In the second scenario, the task deadlines were made tighter. E.g. the task gcc in green has deadline at 33 s in the relaxed deadline scenario and 29 s in the tight deadline scenario. Similarly, for other tasks. Apart from this, rest of the parameters were same, viz., the start times, tasks’ power and instruction profiles, and cores thermal characteristics. The plots on the top show the transient core speeds and the bottom plots show the corresponding temperature. Notice how the core speeds are suitably modified by the algorithm such that the tasks end exactly at their specified deadlines and thus their peak temperature of operation also changed correspondingly from 102 C in the relaxed deadline scenario to 118 C for tighter deadline scenario. Thus our algorithm finds optimal tradeoff between peak temperatures and deadlines. Relaxed deadlines Tight deadlines

Optimal Policy vs Min. Makespan Policy In our next experiment, we compared our deadline optimal algorithm with the min. makespan policy, which is same as the policy obtained in Step 1. Since the min. makespan policy tries to minimize the overall makespan at a constant maximum temperature, it is oblivious to any deadlines and thus either violates the deadlines or executes at a higher temperature. Notice that there is an 8 C increase in peak temperature for the relaxed deadline scenario. Opt. policy - relaxed deadlines Min. makespan

Discretization of Optimal Policy Finally we demonstrate the practical use of our policy. Figure on the right shows the discretized version of our continuous speed policy for the tight deadline scenario. The continuous speeds are discretized such that the highest discrete speed that is less than the continuous speed is selected. Due to this approximation, either few deadlines may be violated or the peak temperature may increase slightly as seen from the figures. Discrete version 8 speeds Continuous version

Summary Proposed reliability-aware transient speed policy Minimizes peak temperature Satisfies task deadlines and start times Includes accurate power and thermal models Optimal trade-off of peak temperature with deadlines Incorporated in Magma simulator Fast, accurate thermal-aware architectural simulator Available as open source at http://vrudhula.lab.asu.edu/magma/ In summary, we proposed a reliability-aware optimal transient speed policy, which minimizes peak temperature satisfies both start times and deadlines of tasks includes accurate power and thermal models Results showed that our algorithm is capable of trading off optimally between peak temperature and task deadlines. As a final note, our techniques have been incorporated in a thermal-aware architectural simulator called MAGMA, which is available as open source at this site. We encourage everyone to make use of it. Thank you every one for attending my talk!