An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science,

Slides:



Advertisements
Similar presentations
Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Advertisements

Scheduling Criteria CPU utilization – keep the CPU as busy as possible (from 0% to 100%) Throughput – # of processes that complete their execution per.
CPU Scheduling Tanenbaum Ch 2.4 Silberchatz and Galvin Ch 5.
SLA-Oriented Resource Provisioning for Cloud Computing
Greedy Algorithms Greed is good. (Some of the time)
Energy-Efficient System Virtualization for Mobile and Embedded Systems Final Review 2014/01/21.
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
Energy-efficient Virtual Machine Provision Algorithms for Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.
Experiments on Query Expansion for Internet Yellow Page Services Using Log Mining Summarized by Dongmin Shin Presented by Dongmin Shin User Log Analysis.
A Cyber-Physical Systems Approach to Energy Management in Data Centers Presented by Chen He Adopted form the paper authors.
Institute of Networking and Multimedia, National Taiwan University, Jun-14, 2014.
Automatic Resource Scaling for Web Applications in the Cloud Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science.
© Ibrahim Korpeoglu Bilkent University
Aleksandra Tešanović Low Power/Energy Scheduling for Real-Time Systems Aleksandra Tešanović Real-Time Systems Laboratory Department of Computer and Information.
Embedded Systems Exercise 3: Scheduling Real-Time Periodic and Mixed Task Sets 18. May 2005 Alexander Maxiaguine.
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.
Maximum Network lifetime in Wireless Sensor Networks with Adjustable Sensing Ranges Mihaela Cardei, Jie Wu, Mingming Lu, and Mohammad O. Pervaiz Department.
Distributed Process Management1 Learning Objectives Distributed Scheduling Algorithms Coordinator Elections Orphan Processes.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Power-Aware SoC Test Optimization through Dynamic Voltage and Frequency Scaling Vijay Sheshadri, Vishwani D. Agrawal, Prathima Agrawal Dept. of Electrical.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
IPDPS’14 Reviewer’s Comments. Summary Typos. Energy model ◦ Too simple. Cost function ◦ Should use more standardized cost functions such as the energy-
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.
Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.
GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov.
Progress Report 2014/02/12. Previous in IPDPS’14 Energy-efficient task scheduling on per- core DVFS architecture ◦ Batch mode  Tasks with arrival time.
1 Customer-Aware Task Allocation and Scheduling for Multi-Mode MPSoCs Lin Huang, Rong Ye and Qiang Xu CHhk REliable computing laboratory (CURE) The Chinese.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
An Energy-Efficient Hypervisor Scheduler for Asymmetric Multi- core 1 Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
Joint Power Optimization Through VM Placement and Flow Scheduling in Data Centers DAWEI LI, JIE WU (TEMPLE UNIVERISTY) ZHIYONG LIU, AND FA ZHANG (CHINESE.
Real-Time Scheduling CS4730 Fall 2010 Dr. José M. Garrido Department of Computer Science and Information Systems Kennesaw State University.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Hard Real-Time Scheduling for Low- Energy Using Stochastic Data and DVS Processors Flavius Gruian Department of Computer Science, Lund University Box 118.
Energy-Aware Scheduling for Aperiodic Tasks on Multi-core Processors Dawei Li and Jie Wu Department of Computer and Information Sciences Temple University,
Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling.
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
IIS Progress Report 2015/10/12. Problem Revisit Given a set of virtual machines, each contains some virtual cores with resource requirements. Decides.
Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS
Outline Models Design of experiments Current Scheduler Completely Fair Scheduler(CFS) ◦ Since Linux ◦ /kernel/sched.c ◦ Maintain balance (fairness)
Efficient Load Balancing Algorithm for Cloud Computing Network Che-Lun Hung 1, Hsiao-hsi Wang 2 and Yu-Chen Hu 2 1 Dept. of Computer Science & Communication.
Multimedia Computing and Networking Jan Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs.
Multi-Task Assignment for CrowdSensing in Mobile Social Network Mingjun Xiao ∗, Jie Wu†, Liusheng Huang ∗, Yunsheng Wang‡, and Cong Liu§
Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin, You-Cheng Syu, Chao-Jui Chang,
Basic Concepts Maximum CPU utilization obtained with multiprogramming
IIS Progress Report 2016/01/11. Goal Propose an energy-efficient scheduler that minimize the power consumption while providing sufficient computing resources.
Resource Provision for Batch and Interactive Workloads in Data Centers Ting-Wei Chang, Pangfeng Liu Department of Computer Science and Information Engineering,
Introduction | Model | Solution | Evaluation
Ching-Chi Lin Institute of Information Science, Academia Sinica
Chapter 2 Scheduling.
Chapter 6: CPU Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Multi-hop Coflow Routing and Scheduling in Data Centers
3: CPU Scheduling Basic Concepts Scheduling Criteria
Chapter5: CPU Scheduling
Dynamic Voltage Scaling
Chapter 6: CPU Scheduling
CPU SCHEDULING.
Jian-Jia Chen and Tei-Wei Kuo
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
IIS Progress Report 2016/01/18.
Module 5: CPU Scheduling
Presentation transcript:

An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information Engineering, National Taiwan University You-Cheng Syu, Pangfeng Liu Graduate Institute of Networking and Multimedia, Nation Taiwan University Chao-Jui Chang, Jan-Jan Wu Research Center for Information Technology Innovation, Academia Sinica Po-Wen Cheng, Wei-Te Hsu Information and Communications Research Laboratories, Industrial Technology Research Institute Good morning, everyone. I am Ching-Chi Lin, a research assistant of Institute of Information Science, Academia Sinica. Today I am going to present one of our work, An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics. This is a collaboration work between National Taiwan University and Academia Sinica. In this paper, we take both energy consumption and task performance into consideration, and present some solutions on minimizing the overall cost.

Introduction Modern processors support DVFS on a per-core basis. Dynamic Voltage and Frequency Scaling(DVFS) For the same core, increasing computing power means higher power consumption. Energy-efficient task scheduling is a fundamental issue in many application domains. Many modern processors support DVFS mechanism on a per-core basis. DVFS is short for Dynamic Voltage and Frequency Scaling. We can adjust the operating frequency of a core in order to change its computing power and energy consumption. For the same core, if we increase its operating frequency, the energy consumption increases as well.

Challenge Find a good balance between performance and power consumption. The challenge of task scheduling on cores with DVFS mechanism is to strike a balance between performance and energy consumption. It is easy to sacrifice one of them in order to benefit the other. For example, we can always use the least operating frequency to minimize the power consumption. However, the performance will suffer under such circumstance. Therefore, an energy-efficient scheduling strategy that consider both power and performance is important.

Two Scenarios Batch mode Online mode A set of computation-intensive tasks with the same arrival time. Online mode Two types of tasks with different priorities. Interactive and non-interactive Tasks can arrive at any time. We consider two scenarios in this work, batch and online. Batch mode consists of a set of computation-intensive tasks. These tasks are independent, and are all available for execution at time 0. As for online mode, there are two types of tasks, interactive and non-interactive. Interactive tasks have higher priority than non-interactive tasks. In online mode, tasks can arrive at any time.

Example: Judge System Online mode Batch mode Users submit their code/answers, and wait for their scores. Interactive: user requests, such as score querying Non-interactive: processing user submissions. Batch mode Re-judge and validate all submitted code/answers. I will use a judge system as example. A judge system verifies if the inputs from users produce the correct answer. Such system is used by programming contest such as IOI or ACM usaco. During the contest, the system is in online mode. Programmers can submit their code or answers, and wait for the scores. The interactive tasks in such scenario is the requests from user, such as problem or score querying. The interactive tasks tend to be small and require short response time. On the other hand, processing the user submission code is the non-interactive tasks. It takes longer time to finish. After the contests, we may like to validate all the submissions from users. We treat every submission as independent tasks, and are processed as batch. Thus it is in batch mode.

Our Contribution Present task scheduling strategies that solves three important issues simultaneously. The assignment of tasks to cores The execution order of tasks on a core The processing frequency for the execution of each task. In this paper, we present task scheduling strategies that solves three important issues simultaneously. The assignment of tasks to cores. The execution order of tasks on a core And the processing frequency for the execution of each task.

Our Contribution(Cont.) For batch mode, we propose Workload Based Greedy algorithm. For online mode, we propose Least Marginal Cost heuristic. Specifically, for batch mode, we propose Workload Based Greedy algorithm. For online mode, we propose Least Marginal Cost heuristic. Next we will present the models we used in this paper.

Models Task Model Assume the number of CPU cycles required to complete a task, Lk, is known. The arrival time of a task batch mode: 0. online mode: known. For our task model, we assume that L_k, the number of CPU cycles required to complete a task, is known for every task. The arrival time of a task is 0 in batch mode, and assumed to be known in online mode.

Models(Cont.) Processing frequency Only a set of discrete processing frequencies, pi, is available. The core frequency remains the same while executing a task. As for the processing frequencies, only a set of discrete processing frequencies p_i is available. The core frequency remains the same while executing a task.

Models(cont.) Power and Performance For a task jk E(pk) and T(pk) are the energy and time required to execute one cycle with frequency pk. We define the power and performance model as followed. For a task j_k, the power consumption e_k equals to L_k multiply a function E(p_k). Recalled that L_k is the number of CPU cycles required to complete a task. E(p_k) is a function of p_k, which means the energy required to execute one cycle with frequency p_k. The same goes to the performance t_k. T(p_k) means the time required to executed one cycle with p_k. We use these two functions to evaluate the power and performance of a task.

Task Scheduling in Batch Mode Two categories: Tasks with deadline Tasks without deadline Two environments: Single core Multi-core Four combinations in total. We will start with task scheduling in batch mode. There are two categories of tasks, with and without deadline. Also there are two kinds of hardware environments, single core and multi-core. Therefore, we have four combinations of task and environment in total. We will discus these four combinations.

Tasks with Deadline [Objective] Every task must meet its deadline, and the overall energy consumption is less than E*. An NP-Complete problem on both single and multi-core platform. Reduce the Partition problem. The scheduling objective of tasks with deadline is that every task must meet its deadline, and the overall energy consumption is less than a given E* We can reduce the partition problem to proof that this is an NP-Complete problem in both single and multi-core platform. Since there have been some previous researches, I’ll not spent to much time here.

Tasks without Deadline [Objective] Minimize the cost function C Re : the cost of a joule of energy Rt : the cost of a second Next is scheduling tasks without deadline. The objective is to minimize the cost function C. C is the summation of C_k for every task k. C_k consists of the cost from power and the from performance. We can write the overall cost as followed. Notice that we define the performance of a task equal to its turnaround time, i.e. the time it has to wait for all the tasks ahead, plus its own execution time. Since we can not directly add the energy and performance together, we first convert these two into cost using R_e and R_t. R_e and R_t are the cost of a joule of energy and the cost of a second, respectively. We need to decide the execution order of tasks, and the p_k for each task, in order to minimize C.

Tasks without Deadline: Single Core Rewrite cost function C into Minimize C(k, pk) for every task in order to minimize C. Define C(k) = min{C(k, pk)} C(k) is a non-increasing function of k. We can re-write C into the following, where C(k, p_k) is independent to L_k. This means that C(k, p_k) is only related to the position k in the execution order. The size of the task is not related to C(k, p_k). We can minimize cost C by minimizing C(k, p_k) for every tasks. Thus, we define C(k), which is the minimum C(k, p_k) for a position k. We proof that C(k) is a non-increasing function of k in the paper.

Minimizing the Cost Since and C(k) is non- increasing. The tasks are in non-decreasing order of Lk in an optimal solution. Choose pk for each sorted task with the minimum C(k, pk). Since we can minimize C by using C(k) for every tasks, and C(k) is a non-increasing function, the tasks should be sorted in non-decreasing order of L_k in order to get the optimal solution. We choose p_k for each sorted task with minimum C(k, p_k).

Tasks without Deadline: Multi-Core Two cases Homogeneous multi-core Same T and E for every cores. Heterogeneous multi-core Different T and E. Same idea Minimize total cost by minimizing C(k) for every task on all cores. So, that was the case in single core environment. As for multi-core, there are two cases, homogeneous and heterogeneous. The different is in the T and E of each core. However, the idea is the same, to minimize total cost by minimizing C(k) for every task on all cores. Therefore we propose Workload Based Greedy.

Workload Based Greedy Sort the tasks according to Lk in descending order. Start from the task with largest Lk Find k on core j with min Cj(k) among all cores, and assign the task to the corresponding position. Compute pk for the task. Repeat until all tasks are scheduled. Workload Based Greedy works as followed. First we sort the tasks in descending order according to L_k. Start from the task with largest L_k, we find position k on core j, such that C_j(k) is the minimum among all cores. We then assign the task to position k on core j, and compute the frequency p_k for the task. Repeat these process until all tasks are scheduled.

Workload Based Greedy Example … Core0 Core1 Core2 Execution Order J1 J2 J3 … Sorted Tasks (in descending order) J1 I’ll use an example to help you understand better. Assume there are three cores. The tasks are sorted in descending order. Recall that C(k) is a non-increasing function, thus the minimum C(k) of each core must appears at the end of the execution sequence. WBG chooses the one with the minimum C(k) among the three cores, assigns J_1 to the position and decides the processing frequency for J_1. For the next task J_2, WBG chooses the minimum C(k) from these three positions. We can keep a min heap to find the minimum C(k) among all cores efficiently.

Task Scheduling in Online Mode [Objective] minimize the total cost for every time interval during the execution of tasks. Time interval: the time between two consecutive arrival event. Now we will talk about task scheduling in online mode. The objective is to minimize the total cost for every time interval during the execution of tasks. A time interval is the time between two consecutive arrival event.

Some Assumptions Two categories of tasks: Interactive tasks Non-interactive tasks Interactive tasks have higher priority than non-interactive tasks Tasks can arrive at any time. Multi-core environment. Recall that in the online judge system example, The interactive tasks are the requests from user, such as problem or score querying. Processing the user submission code is the non-interactive tasks. Interactive tasks have higher priority than non-interactive tasks Tasks can arrive at any time in the multi-core environment.

Least Marginal Cost For every new arrival task For each core, compute the minimum cost and position of inserting the task. Insert the task to the corresponding position of the core with minimum cost among all cores. Notice that interactive tasks have higher priority than non-interactive tasks. Instead of re-distributed all tasks by Workload Based Greedy for every new task arrival event, we propose Least Marginal Cost . Least Marginal Cost find the position in a core that inserting the task there results in the least cost increasing. For every new arrival task, Least Marginal Cost compute the minimum cost and position of inserting the task on every core. It then inserts the task to the corresponding position of the core with minimum cost among all cores. Also, Least Marginal Cost is aware that interactive tasks have higher priority than non-interactive tasks. Thus the position of a non-interactive task in the execution sequence will not come before an interactive task.

Evaluation Conduct experiments to compare the overall cost between our scheduling strategy with the others. Environment: 24 physical servers, each with 4 core X5460 CPU * 2 with hyperthreading,16 GB memory, and 250 GB disk. As for the evaluation, we conduct experiments to compare the overall cost between our scheduling strategy with the others. The experimental environment is as followed.

Evaluation: Batch Mode Input: 12 benchmarks from SPEC2006int train and ref inputs For batch mode, our input is 12 benchmarks from SPEC2006int, each with train and ref inputs. The table shows the average execution time of the 24 inputs, and the parameters used in batch mode.

Experimental Results: Batch Mode Workload Based Greedy(WBG) Opportunistic Load Balancing(OLB) Power-Saving(PS) The total cost reduction is about 27% and 20% to OLB and PS, respectively. We compare our WBG with OLB and PS. The results are as followed. The left one is the cost on time. The middle one is the cost on energy, while the right figure is the overall cost. The total cost reduction of WBG is about 27% and 20% to OLB and PS, respectively

Evaluation: Online Mode Input: trace from an online judging system. 768 non-interactive tasks. 50,525 interactive tasks. Length of trace: half hour. For online mode, the input is an half hour trace from an online judge system. It consists of 768 non-interactive tasks and 50,525 interactive tasks.

Experimental Results: Online Mode Least Marginal Cost(LMC) Opportunistic Load Balancing(OLB) On-Demand(OD) The total cost reduction is about 17% and 24% to OLB and OD, respectively. We compare our LMC with OLB and OD. The left one is the cost on time. The middle one is the cost on energy, while the right figure is the overall cost. The total cost reduction of LMC is about 17% and 24% to OLB and OD, respectively

Conclusion We propose energy-efficient scheduling algorithms for multi-core systems with DVFS features. For batch mode and online mode. The experimental results show significant cost reductions. We will integrate our work into our existing judging system. To summarize, we propose energy-efficient scheduling algorithms for both batch and online mode on multi-core systems with DVFS features. The experimental results show significant cost reductions. In our future work, we will try to integrate this work with Judgegirl, a judge system currently used by the CSIE department in National Taiwan University.

Questions? This is my presentation. Thanks for listening.