Ching-Chi Lin Institute of Information Science, Academia Sinica

Slides:



Advertisements
Similar presentations
Dynamic Thread Mapping for High- Performance, Power-Efficient Heterogeneous Many-core Systems Guangshuo Liu Jinpyo Park Diana Marculescu Presented By Ravi.
Advertisements

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
Energy-Efficient System Virtualization for Mobile and Embedded Systems Final Review 2014/01/21.
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
Energy-efficient Virtual Machine Provision Algorithms for Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.
RUN: Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor Paul Regnier † George Lima † Ernesto Massa † Greg Levin ‡ Scott Brandt ‡
Chapter 5 CPU Scheduling. CPU Scheduling Topics: Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
Institute of Networking and Multimedia, National Taiwan University, Jun-14, 2014.
Soft Real-Time Semi-Partitioned Scheduling with Restricted Migrations on Uniform Heterogeneous Multiprocessors Kecheng Yang James H. Anderson Dept. of.
Project Overview 2014/05/05 1. Current Project “Research on Embedded Hypervisor Scheduler Techniques” ◦ Design an energy-efficient scheduling mechanism.
Automatic Resource Scaling for Web Applications in the Cloud Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
Energy-Efficient Video Multicast in 4G Wireless Systems Ya-Ju Yu 1, Pi-Cheng Hsiu 2,3, and Ai-Chun Pang 1,4 1 Graduate Institute of Networking and Multimedia,
Network Aware Resource Allocation in Distributed Clouds.
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
An Energy-Efficient Hypervisor Scheduler for Asymmetric Multi- core 1 Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
FORS 8450 Advanced Forest Planning Lecture 11 Tabu Search.
Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute.
An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science,
Energy-Aware Scheduling for Aperiodic Tasks on Multi-core Processors Dawei Li and Jie Wu Department of Computer and Information Sciences Temple University,
IIS Progress Report 2015/10/12. Problem Revisit Given a set of virtual machines, each contains some virtual cores with resource requirements. Decides.
Research on Embedded Hypervisor Scheduler Techniques 2014/10/02 1.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
Basic Concepts Maximum CPU utilization obtained with multiprogramming
IIS Progress Report 2016/01/11. Goal Propose an energy-efficient scheduler that minimize the power consumption while providing sufficient computing resources.
Resource Provision for Batch and Interactive Workloads in Data Centers Ting-Wei Chang, Pangfeng Liu Department of Computer Science and Information Engineering,
2010 IEEE Global Telecommunications Conference (GLOBECOM 2010)
Greedy Algorithms.
OPERATING SYSTEMS CS 3502 Fall 2017
CPU Scheduling CSSE 332 Operating Systems
Introduction to Load Balancing:
Load Balancing and Data centers
CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016
Dynamic Graph Partitioning Algorithm
A Dynamic Critical Path Algorithm for Scheduling Scientific Workflow Applications on Global Grids e-Science IEEE 2007 Report: Wei-Cheng Lee
CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017
Computing Resource Allocation and Scheduling in A Data Center
Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang Cornell University
Distributed Energy Efficient Clustering (DEEC) Routing Protocol
Babak Sorkhpour, Prof. Roman Obermaisser, Ayman Murshed
Chapter 6: CPU Scheduling
Houssam-Eddine Zahaf, Giuseppe Lipari, Luca Abeni RTNS’17
Department of Computer Science University of California, Santa Barbara
CPU Scheduling G.Anuradha
Module 5: CPU Scheduling
Chapter 5: CPU Scheduling
Processor Fundamentals
3: CPU Scheduling Basic Concepts Scheduling Criteria
Qingbo Zhu, Asim Shankar and Yuanyuan Zhou
Chapter 6: CPU Scheduling
CPU SCHEDULING.
Virtual-Time Round-Robin: An O(1) Proportional Share Scheduler
Outline Scheduling algorithms Multi-processor scheduling
Chapter 6: CPU Scheduling
○ Hisashi Shimosaka (Doshisha University)
Progress Report 2014/04/23.
Research on Embedded Hypervisor Scheduler Techniques
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
Progress Report 2015/01/28.
Progress Report 2017/02/08.
IIS Progress Report 2016/01/18.
Module 5: CPU Scheduling
Presentation transcript:

An Energy-efficient Scheduler for Throughput Guaranteed Jobs on Asymmetric Multi-core Platforms Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information Engineering, National Taiwan University Hsiang-Hsin Li, Pangfeng Liu Graduate Institute of Networking and Multimedia, Nation Taiwan University Jan-Jan Wu Research Center for Information Technology Innovation, Academia Sinica Good afternoon, everyone. I am Ching-Chi Lin from Taiwan. I am a PhD student at National Taiwan University. Today I am going to present the paper entitled “An Energy-efficient Scheduler for Throughput Guaranteed Jobs on Asymmetric Multi-core Platforms”. This is a collaboration work between National Taiwan University and Academia Sinica. In this work, we design a scheduler that determines the frequency of each core and the job-to-core mapping for each time period during the job execution.

Agenda Introduction Problem Definition Energy-Credit Scheduler Evaluation Conclusion This is the agenda for my presentation. First, I’ll start with a brief introduction about the motivation and goal of this work. Then I’ll define the job-to-core problem we are trying to solve in order to generate a schedule, along with our proposed heuristic. Next I’ll describe our energy-credit scheduler, followed by the evaluations. A short conclusion will be made at the end of my presentation.

Motivation The design of new schedulers for asymmetric multi-core platform has become an important issue. An asymmetric multi-core platform consists of cores with the same ISA but different characteristics. Scheduler needs to be aware of the asymmetry among cores in order to exploit the advantage of asymmetric multi-core. I’ll start from the motivation. A recent trend in computing platforms is moving from homogeneous multi-core architectures toward heterogeneous and asymmetric multi-core. Unlike homogeneous multi-core architectures, asymmetric multi-core platform consists of cores with the same ISA but different characteristics. For example, ARM big-little core architecture consists of performance big cores and energy-efficient little cores. Since the characteristic of cores are different, the scheduler needs to be aware of the asymmetry among cores, in order to exploit the advantage of such platforms. The design of new schedulers for asymmetric multi-core platform has become an important issue. There have been researches and designs, such as the In-Kernel Switcher (IKS) and the Global Task Scheduler (GTS) proposed by Linaro. However, most of the existing schedulers focus on how to distinguish workloads suitable for performance cores from those for power-efficient cores. The execution of workloads are not considered by these schedulers.

Goal Design an energy-efficient scheduler for asymmetric multi-core platform. Focus on throughput guaranteed jobs. A single threaded, preemptible task. Must complete a certain amount of workload during every time period in order to meet its expected throughput. Schedule a set of throughput guaranteed jobs so that both the throughput of each job and energy efficiency are guaranteed in every time period. In this paper, we design an energy-efficient scheduler, called energy-credit based scheduler, for asymmetric multi-core platform. Specifically, we focus on throughput guaranteed jobs, such as stream computing applications. A throughput guaranteed job must complete a certain amount of workload during every time period in order to meet its expected throughput. The expected throughput of a job may vary from one time period to the next. In this paper, we define the throughput guaranteed job as a single threaded, preemptible task. A job is preemptible means that we can stop this job at any time, and resume it at any later time, even on another core. Our objective is to schedule a set of such jobs so that both the throughput of each job and energy efficiency are guaranteed in every time period. In order to do that, we have to solve the job-to-core assignment problem.

Contribution Prove the job-to-core assignment problem is NP-Complete. Propose a heuristic algorithm that generates a schedule with bounded power consumption. Develop an energy-credit based scheduler which schedules jobs based on the heuristic. In this paper, we make the following contributions. First, we prove that the job-to-core assignment problem is NP-Complete. Since it is NP-Complete, we propose a heuristic algorithm that generates a schedule with bounded power consumption. Based on the heuristic, we develop an energy-credit based scheduler for scheduling throughput guaranteed jobs on asymmetric multi-core platforms.

Job-to-core Assignment Problem For every time period, given a set of throughput guaranteed jobs, how to generate a schedule that provides every job the required number of CPU cycles while minimizing the total power consumption? A schedule consists of the frequency of each core and the allocated time percentage for each job to run on each core. We define the job-to-core assignment problem as follows. For every time period, we are given a set of throughput guaranteed jobs and the resource requirements to meet their expected throughput. We want to generate a schedule that provides every job the required number of CPU cycles while minimizing the total power consumption. A schedule has two parts – the frequency of each core, and the allocated time percentage for each job to run on each core in a time period. The frequency indicates the number of CPU cycles a core can provide in a time period. With the allocated time percentage, we can estimate the CPU cycle received by each job.

Generate A Feasible Solution Classify the jobs into two groups. One for each core cluster. Initially all jobs are in little core group. Move jobs to big core group until the resource required by the remaining jobs can be satisfied by the little cores. Apply our heuristic algorithm to generate a feasible schedule for each core cluster. To generate a feasible solution, first we classify the jobs into two groups, one for each core cluster. In this paper, we consider two core clusters, performance big cores and energy-efficient little cores. Initially all jobs are in the group for little cores. Then the scheduler repeatedly move a job to big core group, until the resource required by the remaining jobs can be satisfied by the little cores. After the classification, the scheduler applies our heuristic to generate a feasible schedule for each core cluster.

Heuristic Sort jobs in descending order. Two phases: Frequency selection Choose frequency f for core c that satisfies the three constraints. Must be large enough so that the core can accommodate the heaviest remaining job. Must be large enough so that all the remaining cores, if run at f, can accommodate all the remaining jobs. Must be large enough so that the CPU load of job j on core c is no more than 1-x, where x is the CPU load of job j on core c-1. Job distribution Distribute workloads to core c. The heuristic first sorts the jobs according to their workload requirements in descending order. Then it switches between two phases, the frequency selection phase and the job distribution phase. The idea is to decide the frequency of cores one at a time. In the frequency selection phase, the heuristic chooses the minimum frequency from its available frequency set that satisfies the following three constraints. Then the heuristic assigns jobs to that core until the core is full. The two phases repeat until all the jobs are assigned to cores. Since these constraints are quite verbose, I’ll use an example to explain how the heuristic works.

Example Available Frequency selections = {400, 600, 800, 1000, 1200} w1= 1000 w2= 950 w3= 750 f(1) = f(2) = f(3) = f(4) = w4= 750 1 w5= 600 Given four cores with the same available frequency selections, and five jobs sorted according to their workload requirements. The workload indicates the number of CPU cycles required by each of the throughput guaranteed job. We are trying to decide the frequencies, f(1) to f(4), and the job-to-core mapping in a time interval. Available Frequency selections = {400, 600, 800, 1000, 1200}

Example : Determine f(1) w1= 1000 w2= 950 w3= 750 f(1) = f(2) = f(3) = f(4) = f(1) = 1200 w4= 750 1 w5= 600 First, we determine the frequency of core 1. The current heaviest workload, w 1, is 1000. We must choose a minimum frequency from the available frequency selections as f(1) that satisfies the three constraints. Therefore, f(1) equals to 1200. Available Frequency selections = {400, 600, 800, 1000, 1200}

Example : Distribute Workload 950 w3= 750 f(1) = f(2) = f(3) = f(4) = w1 =1000 w4= 750 1 w5= 600 After f(1) has been decided, we start the job distribution phase. We assign the entire job 1 to core 1. Since there are still 200 CPU cycles left, we assign part of job 2, to core 1. 1200 Available Frequency selections = {400, 600, 800, 1000, 1200}

Example : Determine f(2) w’2= 750 w3= 750 f(1) = f(2) = f(3) = f(4) = w2 =200 w1 =1000 x w4= 750 1 w5= 600 After core 1 is “full” by now, we switch back to the frequency selection phase to determine the frequency f(2). Notice that since we already assign part of job 2 to core 1, by the third constraint, the percentage caused by the remaining workload of job 2 on core 2 cannot be larger than x. 1200 Available Frequency selections = {400, 600, 800, 1000, 1200}

Example : Determine f(2) w’2= 750 w3= 750 f(1) = f(2) = f(3) = f(4) = f(1) = 1200 w2 =200 f(2) = 1000 w1 =1000 w4= 750 1 w5= 600 Therefore, we choose f(2) equals to 1000. Available Frequency selections = {400, 600, 800, 1000, 1200}

Example : Distribute workload 750 w2 =200 f(1) = f(2) = f(3) = f(4) = f(1) = 1200 w1 =1000 w4= 750 w’2 =750 1 w5= 600 Again, we then switch to job distribution phase. The remaining workload of job 2 is then assigned to core 2. I believe you all know what will happen next. Yes, we assign part of workload from job 3 to core 2, and switch to frequency selection phase for core 3. 1000 Available Frequency selections = {400, 600, 800, 1000, 1200}

Results Available Frequency selections = {400, 600, 800, 1000, 1200} w2 =200 f(1) = f(2) = f(3) = f(4) = f(1) = 1200 w3 =250 w4 =500 w1 =1000 w5 =600 w’2 =750 1 w’3 =500 This is the result generated by the heuristic. f(1) equals to 1200, while the other core frequencies are 1000. We use aj,p to denote the job-to-core mapping. aj,p is the allocated time percentage for job j to run on core p in one time interval. w’4 =250 1000 1000 1000 Available Frequency selections = {400, 600, 800, 1000, 1200}

Energy Guarantee Given cores with the frequencies generated by the heuristic, the power consumption of our job-to-core assignment consumes at most m( f(1) )( f(n) / f(1) ) more power than any other job assignments. m( f ) is the dynamic power consumption of a fully-loaded core with frequency f. The result generated by the heuristic has the following energy guarantee. Given cores with the frequencies generated by the heuristic, the power consumption of our job-to-core assignment consumes at most m( f(1) )( f(n) / f(1) ) more power than any other job assignment. m( f ) is the dynamic power consumption of a fully-loaded core with frequency f.

Energy-credit Based Scheduler Schedule jobs according to their energy credits. Assign energy credits to jobs according to the time percentage aj,p generated by the heuristic in every time period. A core only executes jobs with credits corresponding to th core. Core x can only run jobs with aj,x ≠ 0. Consume credits during the execution. Based on these results, we develop our energy-credit based scheduler. This scheduler schedule jobs according to their energy credits. Energy credit are assigned to jobs according to aj,p generated by the heuristic in every time period. A core only executes jobs with credits corresponding to this core. That is, core x only runs jobs with aj,x not equal to 0. The credits are consumed during the job execution.

Evaluation Environment An asymmetric multi-core platform consists of performance big cores and energy-efficient little cores. Supports per-core DVFS. Simulation parameters are from Samsung Exynos 7420. ARM Cortex 57 + ARM Cortex 53 As for the evaluation, we conduct simulations to evaluate our proposed scheduler. Our target platform is an asymmetric multi-core platform consists of performance big cores and energy-efficient little cores. We build a simulator with parameters from Samsung Exynos 7420. The table shows the available frequency set of each type of cores and their power consumptions. The power consumption here is a fully-loaded core with the corresponding frequency. Our simulator estimates the power consumption of each core in each time period by first looking up for the power consumption, then times the load of that core.

Evaluation Compare our energy-credit based scheduler with Global Task Scheduler. GTS tracks the load average of each job, and migrates jobs to the cores accordingly. Enable CPUfreq driver to perform dynamic frequency adjustments. “performance” and “conservative”. We compare our energy-credit based scheduler with an existing asymmetric-aware scheduler, Global Task Scheduler from Linaro. Global Task Scheduler determines which cores a job should run on according to its runtime behavior. The scheduler tracks the load average of each job, and migrates jobs to the cores accordingly. The execution of jobs relies on the underlying scheduler, Complete Fair Scheduler (CFS). We enable the CPUfreq driver to perform dynamic frequency adjustments for Global Task Scheduler. Two governors, “performance” and “conservative”, are used in the simulations.

Results – Energy Consumption We compare the energy consumption of our proposed scheduler with those of Global Task Scheduler with the two governors while executing a series of workloads. The workload consists of ten throughput guarantee jobs, each of them with length 600 seconds. The Table shows the energy consumptions of the three scheduler. We can see that our energy-credit based scheduler consume about 60% of the energy of the GTS.

Results – Break Down of Frequency To figure out why our scheduler consume less energy processing the same set of throughput guaranteed jobs, we collect the frequency of each core in each time period during the simulation. We compare our proposed scheduler with Global Task Scheduler with conservative governor. Figure 2 shows the break down of frequency. We can observe that our scheduler assigns lower frequencies to big cores most of the time, while GTS with conservative governor uses higher frequencies. The reason is that our scheduler is aware of the asymmetry among cores, and choose medium frequency for big cores. On the other hand, GTS with conservative leads to using higher frequencies on big cores, thus consumes more energy.

Conclusion We design an energy-efficient scheduler for throughput guaranteed jobs on asymmetric multi-core platform. The simulation results indicate that our proposed scheduler consumes less than 60% energy compares to the Global Task Scheduler. To summarize, we design an energy-efficient scheduler for throughput guaranteed jobs on asymmetric multi-core platform. The scheduler determines the frequency of each core and the job-to-core assignment according to the heuristic we proposed. The simulation results indicate that our proposed scheduler consumes less than 60% energy compares to the Global Task Scheduler with different frequency governors.

Thank you! Thank you for your attention. I would be happy to take questions from you. If you are interested in the details about the proofs and designs, you are also welcome to talk with me after this session.