Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science,

Similar presentations


Presentation on theme: "An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science,"— Presentation transcript:

1 An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information Engineering, National Taiwan University You-Cheng Syu, Pangfeng Liu Graduate Institute of Networking and Multimedia, Nation Taiwan University Chao-Jui Chang, Jan-Jan Wu Research Center for Information Technology Innovation, Academia Sinica Po-Wen Cheng, Wei-Te Hsu Information and Communications Research Laboratories, Industrial Technology Research Institute Good morning, everyone. I am Ching-Chi Lin, a research assistant of Institute of Information Science, Academia Sinica. Today I am going to present one of our work, An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics. This is a collaboration work between National Taiwan University and Academia Sinica. In this paper, we take both energy consumption and task performance into consideration, and present some solutions on minimizing the overall cost.

2 Introduction Modern processors support DVFS on a per-core basis.
Dynamic Voltage and Frequency Scaling(DVFS) For the same core, increasing computing power means higher power consumption. Energy-efficient task scheduling is a fundamental issue in many application domains. Many modern processors support DVFS mechanism on a per-core basis. DVFS is short for Dynamic Voltage and Frequency Scaling. We can adjust the operating frequency of a core in order to change its computing power and energy consumption. For the same core, if we increase its operating frequency, the energy consumption increases as well.

3 Challenge Find a good balance between performance and power consumption. The challenge of task scheduling on cores with DVFS mechanism is to strike a balance between performance and energy consumption. It is easy to sacrifice one of them in order to benefit the other. For example, we can always use the least operating frequency to minimize the power consumption. However, the performance will suffer under such circumstance. Therefore, an energy-efficient scheduling strategy that consider both power and performance is important.

4 Two Scenarios Batch mode Online mode
A set of computation-intensive tasks with the same arrival time. Online mode Two types of tasks with different priorities. Interactive and non-interactive Tasks can arrive at any time. We consider two scenarios in this work, batch and online. Batch mode consists of a set of computation-intensive tasks. These tasks are independent, and are all available for execution at time 0. As for online mode, there are two types of tasks, interactive and non-interactive. Interactive tasks have higher priority than non-interactive tasks. In online mode, tasks can arrive at any time.

5 Example: Judge System Online mode Batch mode
Users submit their code/answers, and wait for their scores. Interactive: user requests, such as score querying Non-interactive: processing user submissions. Batch mode Re-judge and validate all submitted code/answers. I will use a judge system as example. A judge system verifies if the inputs from users produce the correct answer. Such system is used by programming contest such as IOI or ACM usaco. During the contest, the system is in online mode. Programmers can submit their code or answers, and wait for the scores. The interactive tasks in such scenario is the requests from user, such as problem or score querying. The interactive tasks tend to be small and require short response time. On the other hand, processing the user submission code is the non-interactive tasks. It takes longer time to finish. After the contests, we may like to validate all the submissions from users. We treat every submission as independent tasks, and are processed as batch. Thus it is in batch mode.

6 Our Contribution Present task scheduling strategies that solves three important issues simultaneously. The assignment of tasks to cores The execution order of tasks on a core The processing frequency for the execution of each task. In this paper, we present task scheduling strategies that solves three important issues simultaneously. The assignment of tasks to cores. The execution order of tasks on a core And the processing frequency for the execution of each task.

7 Our Contribution(Cont.)
For batch mode, we propose Workload Based Greedy algorithm. For online mode, we propose Least Marginal Cost heuristic. Specifically, for batch mode, we propose Workload Based Greedy algorithm. For online mode, we propose Least Marginal Cost heuristic. Next we will present the models we used in this paper.

8 Models Task Model Assume the number of CPU cycles required to complete a task, Lk, is known. The arrival time of a task batch mode: 0. online mode: known. For our task model, we assume that L_k, the number of CPU cycles required to complete a task, is known for every task. The arrival time of a task is 0 in batch mode, and assumed to be known in online mode.

9 Models(Cont.) Processing frequency
Only a set of discrete processing frequencies, pi, is available. The core frequency remains the same while executing a task. As for the processing frequencies, only a set of discrete processing frequencies p_i is available. The core frequency remains the same while executing a task.

10 Models(cont.) Power and Performance For a task jk
E(pk) and T(pk) are the energy and time required to execute one cycle with frequency pk. We define the power and performance model as followed. For a task j_k, the power consumption e_k equals to L_k multiply a function E(p_k). Recalled that L_k is the number of CPU cycles required to complete a task. E(p_k) is a function of p_k, which means the energy required to execute one cycle with frequency p_k. The same goes to the performance t_k. T(p_k) means the time required to executed one cycle with p_k. We use these two functions to evaluate the power and performance of a task.

11 Task Scheduling in Batch Mode
Two categories: Tasks with deadline Tasks without deadline Two environments: Single core Multi-core Four combinations in total. We will start with task scheduling in batch mode. There are two categories of tasks, with and without deadline. Also there are two kinds of hardware environments, single core and multi-core. Therefore, we have four combinations of task and environment in total. We will discus these four combinations.

12 Tasks with Deadline [Objective] Every task must meet its deadline, and the overall energy consumption is less than E*. An NP-Complete problem on both single and multi-core platform. Reduce the Partition problem. The scheduling objective of tasks with deadline is that every task must meet its deadline, and the overall energy consumption is less than a given E* We can reduce the partition problem to proof that this is an NP-Complete problem in both single and multi-core platform. Since there have been some previous researches, I’ll not spent to much time here.

13 Tasks without Deadline
[Objective] Minimize the cost function C Re : the cost of a joule of energy Rt : the cost of a second Next is scheduling tasks without deadline. The objective is to minimize the cost function C. C is the summation of C_k for every task k. C_k consists of the cost from power and the from performance. We can write the overall cost as followed. Notice that we define the performance of a task equal to its turnaround time, i.e. the time it has to wait for all the tasks ahead, plus its own execution time. Since we can not directly add the energy and performance together, we first convert these two into cost using R_e and R_t. R_e and R_t are the cost of a joule of energy and the cost of a second, respectively. We need to decide the execution order of tasks, and the p_k for each task, in order to minimize C.

14 Tasks without Deadline: Single Core
Rewrite cost function C into Minimize C(k, pk) for every task in order to minimize C. Define C(k) = min{C(k, pk)} C(k) is a non-increasing function of k. We can re-write C into the following, where C(k, p_k) is independent to L_k. This means that C(k, p_k) is only related to the position k in the execution order. The size of the task is not related to C(k, p_k). We can minimize cost C by minimizing C(k, p_k) for every tasks. Thus, we define C(k), which is the minimum C(k, p_k) for a position k. We proof that C(k) is a non-increasing function of k in the paper.

15 Minimizing the Cost Since and C(k) is non- increasing.
The tasks are in non-decreasing order of Lk in an optimal solution. Choose pk for each sorted task with the minimum C(k, pk). Since we can minimize C by using C(k) for every tasks, and C(k) is a non-increasing function, the tasks should be sorted in non-decreasing order of L_k in order to get the optimal solution. We choose p_k for each sorted task with minimum C(k, p_k).

16 Tasks without Deadline: Multi-Core
Two cases Homogeneous multi-core Same T and E for every cores. Heterogeneous multi-core Different T and E. Same idea Minimize total cost by minimizing C(k) for every task on all cores. So, that was the case in single core environment. As for multi-core, there are two cases, homogeneous and heterogeneous. The different is in the T and E of each core. However, the idea is the same, to minimize total cost by minimizing C(k) for every task on all cores. Therefore we propose Workload Based Greedy.

17 Workload Based Greedy Sort the tasks according to Lk in descending order. Start from the task with largest Lk Find k on core j with min Cj(k) among all cores, and assign the task to the corresponding position. Compute pk for the task. Repeat until all tasks are scheduled. Workload Based Greedy works as followed. First we sort the tasks in descending order according to L_k. Start from the task with largest L_k, we find position k on core j, such that C_j(k) is the minimum among all cores. We then assign the task to position k on core j, and compute the frequency p_k for the task. Repeat these process until all tasks are scheduled.

18 Workload Based Greedy Example
Core0 Core1 Core2 Execution Order J1 J2 J3 Sorted Tasks (in descending order) J1 I’ll use an example to help you understand better. Assume there are three cores. The tasks are sorted in descending order. Recall that C(k) is a non-increasing function, thus the minimum C(k) of each core must appears at the end of the execution sequence. WBG chooses the one with the minimum C(k) among the three cores, assigns J_1 to the position and decides the processing frequency for J_1. For the next task J_2, WBG chooses the minimum C(k) from these three positions. We can keep a min heap to find the minimum C(k) among all cores efficiently.

19 Task Scheduling in Online Mode
[Objective] minimize the total cost for every time interval during the execution of tasks. Time interval: the time between two consecutive arrival event. Now we will talk about task scheduling in online mode. The objective is to minimize the total cost for every time interval during the execution of tasks. A time interval is the time between two consecutive arrival event.

20 Some Assumptions Two categories of tasks:
Interactive tasks Non-interactive tasks Interactive tasks have higher priority than non-interactive tasks Tasks can arrive at any time. Multi-core environment. Recall that in the online judge system example, The interactive tasks are the requests from user, such as problem or score querying. Processing the user submission code is the non-interactive tasks. Interactive tasks have higher priority than non-interactive tasks Tasks can arrive at any time in the multi-core environment.

21 Least Marginal Cost For every new arrival task
For each core, compute the minimum cost and position of inserting the task. Insert the task to the corresponding position of the core with minimum cost among all cores. Notice that interactive tasks have higher priority than non-interactive tasks. Instead of re-distributed all tasks by Workload Based Greedy for every new task arrival event, we propose Least Marginal Cost . Least Marginal Cost find the position in a core that inserting the task there results in the least cost increasing. For every new arrival task, Least Marginal Cost compute the minimum cost and position of inserting the task on every core. It then inserts the task to the corresponding position of the core with minimum cost among all cores. Also, Least Marginal Cost is aware that interactive tasks have higher priority than non-interactive tasks. Thus the position of a non-interactive task in the execution sequence will not come before an interactive task.

22 Evaluation Conduct experiments to compare the overall cost between our scheduling strategy with the others. Environment: 24 physical servers, each with 4 core X5460 CPU * 2 with hyperthreading,16 GB memory, and 250 GB disk. As for the evaluation, we conduct experiments to compare the overall cost between our scheduling strategy with the others. The experimental environment is as followed.

23 Evaluation: Batch Mode
Input: 12 benchmarks from SPEC2006int train and ref inputs For batch mode, our input is 12 benchmarks from SPEC2006int, each with train and ref inputs. The table shows the average execution time of the 24 inputs, and the parameters used in batch mode.

24 Experimental Results: Batch Mode
Workload Based Greedy(WBG) Opportunistic Load Balancing(OLB) Power-Saving(PS) The total cost reduction is about 27% and 20% to OLB and PS, respectively. We compare our WBG with OLB and PS. The results are as followed. The left one is the cost on time. The middle one is the cost on energy, while the right figure is the overall cost. The total cost reduction of WBG is about 27% and 20% to OLB and PS, respectively

25 Evaluation: Online Mode
Input: trace from an online judging system. 768 non-interactive tasks. 50,525 interactive tasks. Length of trace: half hour. For online mode, the input is an half hour trace from an online judge system. It consists of 768 non-interactive tasks and 50,525 interactive tasks.

26 Experimental Results: Online Mode
Least Marginal Cost(LMC) Opportunistic Load Balancing(OLB) On-Demand(OD) The total cost reduction is about 17% and 24% to OLB and OD, respectively. We compare our LMC with OLB and OD. The left one is the cost on time. The middle one is the cost on energy, while the right figure is the overall cost. The total cost reduction of LMC is about 17% and 24% to OLB and OD, respectively

27 Conclusion We propose energy-efficient scheduling algorithms for multi-core systems with DVFS features. For batch mode and online mode. The experimental results show significant cost reductions. We will integrate our work into our existing judging system. To summarize, we propose energy-efficient scheduling algorithms for both batch and online mode on multi-core systems with DVFS features. The experimental results show significant cost reductions. In our future work, we will try to integrate this work with Judgegirl, a judge system currently used by the CSIE department in National Taiwan University.

28 Questions? This is my presentation. Thanks for listening.

29


Download ppt "An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science,"

Similar presentations


Ads by Google