Ching-Chi Lin Institute of Information Science, Academia Sinica

An Energy-efficient Scheduler for Throughput Guaranteed Jobs on Asymmetric Multi-core Platforms
Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information Engineering, National Taiwan University Hsiang-Hsin Li, Pangfeng Liu Graduate Institute of Networking and Multimedia, Nation Taiwan University Jan-Jan Wu Research Center for Information Technology Innovation, Academia Sinica Good afternoon, everyone. I am Ching-Chi Lin from Taiwan. I am a PhD student at National Taiwan University. Today I am going to present the paper entitled “An Energy-efficient Scheduler for Throughput Guaranteed Jobs on Asymmetric Multi-core Platforms”. This is a collaboration work between National Taiwan University and Academia Sinica. In this work, we design a scheduler that determines the frequency of each core and the job-to-core mapping for each time period during the job execution.

Agenda Introduction Problem Definition Energy-Credit Scheduler
Evaluation Conclusion This is the agenda for my presentation. First, I’ll start with a brief introduction about the motivation and goal of this work. Then I’ll define the job-to-core problem we are trying to solve in order to generate a schedule, along with our proposed heuristic. Next I’ll describe our energy-credit scheduler, followed by the evaluations. A short conclusion will be made at the end of my presentation.

Motivation The design of new schedulers for asymmetric multi-core platform has become an important issue. An asymmetric multi-core platform consists of cores with the same ISA but different characteristics. Scheduler needs to be aware of the asymmetry among cores in order to exploit the advantage of asymmetric multi-core. I’ll start from the motivation. A recent trend in computing platforms is moving from homogeneous multi-core architectures toward heterogeneous and asymmetric multi-core. Unlike homogeneous multi-core architectures, asymmetric multi-core platform consists of cores with the same ISA but different characteristics. For example, ARM big-little core architecture consists of performance big cores and energy-efficient little cores. Since the characteristic of cores are different, the scheduler needs to be aware of the asymmetry among cores, in order to exploit the advantage of such platforms. The design of new schedulers for asymmetric multi-core platform has become an important issue. There have been researches and designs, such as the In-Kernel Switcher (IKS) and the Global Task Scheduler (GTS) proposed by Linaro. However, most of the existing schedulers focus on how to distinguish workloads suitable for performance cores from those for power-efficient cores. The execution of workloads are not considered by these schedulers.

Goal Design an energy-efficient scheduler for asymmetric multi-core platform. Focus on throughput guaranteed jobs. A single threaded, preemptible task. Must complete a certain amount of workload during every time period in order to meet its expected throughput. Schedule a set of throughput guaranteed jobs so that both the throughput of each job and energy efficiency are guaranteed in every time period. In this paper, we design an energy-efficient scheduler, called energy-credit based scheduler, for asymmetric multi-core platform. Specifically, we focus on throughput guaranteed jobs, such as stream computing applications. A throughput guaranteed job must complete a certain amount of workload during every time period in order to meet its expected throughput. The expected throughput of a job may vary from one time period to the next. In this paper, we define the throughput guaranteed job as a single threaded, preemptible task. A job is preemptible means that we can stop this job at any time, and resume it at any later time, even on another core. Our objective is to schedule a set of such jobs so that both the throughput of each job and energy efficiency are guaranteed in every time period. In order to do that, we have to solve the job-to-core assignment problem.

Contribution Prove the job-to-core assignment problem is NP-Complete.
Propose a heuristic algorithm that generates a schedule with bounded power consumption. Develop an energy-credit based scheduler which schedules jobs based on the heuristic. In this paper, we make the following contributions. First, we prove that the job-to-core assignment problem is NP-Complete. Since it is NP-Complete, we propose a heuristic algorithm that generates a schedule with bounded power consumption. Based on the heuristic, we develop an energy-credit based scheduler for scheduling throughput guaranteed jobs on asymmetric multi-core platforms.

Job-to-core Assignment Problem
For every time period, given a set of throughput guaranteed jobs, how to generate a schedule that provides every job the required number of CPU cycles while minimizing the total power consumption? A schedule consists of the frequency of each core and the allocated time percentage for each job to run on each core. We define the job-to-core assignment problem as follows. For every time period, we are given a set of throughput guaranteed jobs and the resource requirements to meet their expected throughput. We want to generate a schedule that provides every job the required number of CPU cycles while minimizing the total power consumption. A schedule has two parts – the frequency of each core, and the allocated time percentage for each job to run on each core in a time period. The frequency indicates the number of CPU cycles a core can provide in a time period. With the allocated time percentage, we can estimate the CPU cycle received by each job.

Generate A Feasible Solution
Classify the jobs into two groups. One for each core cluster. Initially all jobs are in little core group. Move jobs to big core group until the resource required by the remaining jobs can be satisfied by the little cores. Apply our heuristic algorithm to generate a feasible schedule for each core cluster. To generate a feasible solution, first we classify the jobs into two groups, one for each core cluster. In this paper, we consider two core clusters, performance big cores and energy-efficient little cores. Initially all jobs are in the group for little cores. Then the scheduler repeatedly move a job to big core group, until the resource required by the remaining jobs can be satisfied by the little cores. After the classification, the scheduler applies our heuristic to generate a feasible schedule for each core cluster.

Heuristic Sort jobs in descending order.
Two phases: Frequency selection Choose frequency f for core c that satisfies the three constraints. Must be large enough so that the core can accommodate the heaviest remaining job. Must be large enough so that all the remaining cores, if run at f, can accommodate all the remaining jobs. Must be large enough so that the CPU load of job j on core c is no more than 1-x, where x is the CPU load of job j on core c-1. Job distribution Distribute workloads to core c. The heuristic first sorts the jobs according to their workload requirements in descending order. Then it switches between two phases, the frequency selection phase and the job distribution phase. The idea is to decide the frequency of cores one at a time. In the frequency selection phase, the heuristic chooses the minimum frequency from its available frequency set that satisfies the following three constraints. Then the heuristic assigns jobs to that core until the core is full. The two phases repeat until all the jobs are assigned to cores. Since these constraints are quite verbose, I’ll use an example to explain how the heuristic works.

Example Available Frequency selections = {400, 600, 800, 1000, 1200}
w1= 1000 w2= 950 w3= 750 f(1) = f(2) = f(3) = f(4) = w4= 750 1 w5= 600 Given four cores with the same available frequency selections, and five jobs sorted according to their workload requirements. The workload indicates the number of CPU cycles required by each of the throughput guaranteed job. We are trying to decide the frequencies, f(1) to f(4), and the job-to-core mapping in a time interval. Available Frequency selections = {400, 600, 800, 1000, 1200}

Example : Determine f(1)
w1= 1000 w2= 950 w3= 750 f(1) = f(2) = f(3) = f(4) = f(1) = 1200 w4= 750 1 w5= 600 First, we determine the frequency of core 1. The current heaviest workload, w 1, is 1000. We must choose a minimum frequency from the available frequency selections as f(1) that satisfies the three constraints. Therefore, f(1) equals to 1200. Available Frequency selections = {400, 600, 800, 1000, 1200}

Example : Distribute Workload
950 w3= 750 f(1) = f(2) = f(3) = f(4) = w1 =1000 w4= 750 1 w5= 600 After f(1) has been decided, we start the job distribution phase. We assign the entire job 1 to core 1. Since there are still 200 CPU cycles left, we assign part of job 2, to core 1. 1200 Available Frequency selections = {400, 600, 800, 1000, 1200}

w’2= 750 w3= 750 f(1) = f(2) = f(3) = f(4) = w2 =200 w1 =1000 x w4= 750 1 w5= 600 After core 1 is “full” by now, we switch back to the frequency selection phase to determine the frequency f(2). Notice that since we already assign part of job 2 to core 1, by the third constraint, the percentage caused by the remaining workload of job 2 on core 2 cannot be larger than x. 1200 Available Frequency selections = {400, 600, 800, 1000, 1200}

w’2= 750 w3= 750 f(1) = f(2) = f(3) = f(4) = f(1) = 1200 w2 =200 f(2) = 1000 w1 =1000 w4= 750 1 w5= 600 Therefore, we choose f(2) equals to 1000. Available Frequency selections = {400, 600, 800, 1000, 1200}

Example : Distribute workload
750 w2 =200 f(1) = f(2) = f(3) = f(4) = f(1) = 1200 w1 =1000 w4= 750 w’2 =750 1 w5= 600 Again, we then switch to job distribution phase. The remaining workload of job 2 is then assigned to core 2. I believe you all know what will happen next. Yes, we assign part of workload from job 3 to core 2, and switch to frequency selection phase for core 3. 1000 Available Frequency selections = {400, 600, 800, 1000, 1200}

Results Available Frequency selections = {400, 600, 800, 1000, 1200}
w2 =200 f(1) = f(2) = f(3) = f(4) = f(1) = 1200 w3 =250 w4 =500 w1 =1000 w5 =600 w’2 =750 1 w’3 =500 This is the result generated by the heuristic. f(1) equals to 1200, while the other core frequencies are 1000. We use aj,p to denote the job-to-core mapping. aj,p is the allocated time percentage for job j to run on core p in one time interval. w’4 =250 1000 1000 1000 Available Frequency selections = {400, 600, 800, 1000, 1200}

Energy Guarantee Given cores with the frequencies generated by the heuristic, the power consumption of our job-to-core assignment consumes at most m( f(1) )( f(n) / f(1) ) more power than any other job assignments. m( f ) is the dynamic power consumption of a fully-loaded core with frequency f. The result generated by the heuristic has the following energy guarantee. Given cores with the frequencies generated by the heuristic, the power consumption of our job-to-core assignment consumes at most m( f(1) )( f(n) / f(1) ) more power than any other job assignment. m( f ) is the dynamic power consumption of a fully-loaded core with frequency f.

Energy-credit Based Scheduler
Schedule jobs according to their energy credits. Assign energy credits to jobs according to the time percentage aj,p generated by the heuristic in every time period. A core only executes jobs with credits corresponding to th core. Core x can only run jobs with aj,x ≠ 0. Consume credits during the execution. Based on these results, we develop our energy-credit based scheduler. This scheduler schedule jobs according to their energy credits. Energy credit are assigned to jobs according to aj,p generated by the heuristic in every time period. A core only executes jobs with credits corresponding to this core. That is, core x only runs jobs with aj,x not equal to 0. The credits are consumed during the job execution.

Evaluation Environment
An asymmetric multi-core platform consists of performance big cores and energy-efficient little cores. Supports per-core DVFS. Simulation parameters are from Samsung Exynos 7420. ARM Cortex 57 + ARM Cortex 53 As for the evaluation, we conduct simulations to evaluate our proposed scheduler. Our target platform is an asymmetric multi-core platform consists of performance big cores and energy-efficient little cores. We build a simulator with parameters from Samsung Exynos 7420. The table shows the available frequency set of each type of cores and their power consumptions. The power consumption here is a fully-loaded core with the corresponding frequency. Our simulator estimates the power consumption of each core in each time period by first looking up for the power consumption, then times the load of that core.

Evaluation Compare our energy-credit based scheduler with Global Task Scheduler. GTS tracks the load average of each job, and migrates jobs to the cores accordingly. Enable CPUfreq driver to perform dynamic frequency adjustments. “performance” and “conservative”. We compare our energy-credit based scheduler with an existing asymmetric-aware scheduler, Global Task Scheduler from Linaro. Global Task Scheduler determines which cores a job should run on according to its runtime behavior. The scheduler tracks the load average of each job, and migrates jobs to the cores accordingly. The execution of jobs relies on the underlying scheduler, Complete Fair Scheduler (CFS). We enable the CPUfreq driver to perform dynamic frequency adjustments for Global Task Scheduler. Two governors, “performance” and “conservative”, are used in the simulations.

Results – Energy Consumption
We compare the energy consumption of our proposed scheduler with those of Global Task Scheduler with the two governors while executing a series of workloads. The workload consists of ten throughput guarantee jobs, each of them with length 600 seconds. The Table shows the energy consumptions of the three scheduler. We can see that our energy-credit based scheduler consume about 60% of the energy of the GTS.

Results – Break Down of Frequency
To figure out why our scheduler consume less energy processing the same set of throughput guaranteed jobs, we collect the frequency of each core in each time period during the simulation. We compare our proposed scheduler with Global Task Scheduler with conservative governor. Figure 2 shows the break down of frequency. We can observe that our scheduler assigns lower frequencies to big cores most of the time, while GTS with conservative governor uses higher frequencies. The reason is that our scheduler is aware of the asymmetry among cores, and choose medium frequency for big cores. On the other hand, GTS with conservative leads to using higher frequencies on big cores, thus consumes more energy.

Conclusion We design an energy-efficient scheduler for throughput guaranteed jobs on asymmetric multi-core platform. The simulation results indicate that our proposed scheduler consumes less than 60% energy compares to the Global Task Scheduler. To summarize, we design an energy-efficient scheduler for throughput guaranteed jobs on asymmetric multi-core platform. The scheduler determines the frequency of each core and the job-to-core assignment according to the heuristic we proposed. The simulation results indicate that our proposed scheduler consumes less than 60% energy compares to the Global Task Scheduler with different frequency governors.

Thank you! Thank you for your attention.
I would be happy to take questions from you. If you are interested in the details about the proofs and designs, you are also welcome to talk with me after this session.

Ching-Chi Lin Institute of Information Science, Academia Sinica

Similar presentations

Presentation on theme: "Ching-Chi Lin Institute of Information Science, Academia Sinica"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ching-Chi Lin Institute of Information Science, Academia Sinica

Similar presentations

Presentation on theme: "Ching-Chi Lin Institute of Information Science, Academia Sinica"— Presentation transcript:

Similar presentations

About project

Feedback