Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin, You-Cheng Syu, Chao-Jui Chang,

Similar presentations


Presentation on theme: "An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin, You-Cheng Syu, Chao-Jui Chang,"— Presentation transcript:

1 An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin, You-Cheng Syu, Chao-Jui Chang, Jan-Jan Wu, Pangfeng Liu, Po-Wen Cheng and Wei-Te Hsu

2 Introduction Green computing is imperative Increasing of computers Increasing of energy cost Increasing of Carbon emissions

3 Motivation  Main technologies to improve energy effective ◦ Hardware level: Low power devices ◦ System level: Power-management mechanisms in different levels ◦ Application level: Consolidate with virtualization  Power-management mechanisms ◦ Circuit level: Clock-gating ◦ System level: DPM ◦ Processor level: DVFS/DFS/DVS, C-state To Shutdown unused component or circuit

4 Task Execution Modes Batch Mode – Batches of jobs Online Mode – Different time constraints – Interactive and non-interactive tasks – e.g. online judging system

5 Contributions Task scheduling strategy that solves three important issues simultaneously: – assignment of tasks to CPU cores – execution order of tasks – CPU processing rate for the execution of each task Task model, CPU processing rate model, energy consumption model, and cost function. Workload Based Greedy (WBG) for execution of tasks in the batch mode Least Marginal Cost (LMC) a heuristic algorithm for executing tasks in the online mode, LMC assigns interactive and non-interactive tasks to cores.

6 MODELS Task Model – j k = (L k,A k,D k ) – where L k is the number of CPU cycles required to complete j k, A k is the arrival time of j k, and D k is the deadline of j k. If j k has a specific deadline, D k > A k ≥ 0 Processing Rate – Let P = {p1,p2,p3,...} be a non-empty set of discrete processing rates a core can utilize based on the hardware, with 0 < p1 <p2 < p3 <... < p |P|. – We use p jk from set P to denote the processing rate of a task j k. Energy Consumption – For a task j k, let e k is energy consumption; t k the execution time; and p jk be the processing rate. – We define E(p) and T (p) as the energy and the time required to execute one cycle with processing rate p on a CPU core

7 TASK SCHEDULING IN THE BATCH MODE Tasks with Deadlines / Deadline-SingleCore Partition problem: let A={a1,…,an} is set of +ve integers. – Theorem: Deadline-SingleCore is NP Complete. Proof: n tasks j1,…,jn ; no. of cycles needed for first n task is Li=ai S=a1+,…,+an: is total no. of cycles for finishing n tasks. T(pl)=2, T(ph)=1, E(ph)=4, E(pl)=1 ; E = T 2 Time constraint is 1.5S and energy Constraint is 2.5S, deadline is 1.5S. No. of tasks whose sum is at least S/2 to complete in 1.5S time and 2.5S energy.

8 Tasks without Deadlines on a Single Core Platform Cost Function must consider both the energy consumption and the execution time. – Energy Cost: C k,e = R e L k E(p jk ) – Temporal Cost: C k is cost of task j k And C is total cost for all tasks

9 Tasks without Deadlines on a Single Core Platform Amount of delay that a task causes for other tasks

10 Dominating Position Set/Range D p is “dominating position set” of p

11 Scheduling Tasks without Deadlines on Multi-core Platforms Scheduling tasks – Homogeneous multi-core systems Same energy consumption and time consumption function Round-Robin techniques to assign tasks – Heterogeneous multi-core systems Different energy consumption and time consumption function Tasks are assigned in Greedy manner

12 TASK SCHEDULING IN THE ONLINE MODE For e.g. Online judging system Interactive Tasks and Non-interactive Tasks System can be Homogeneous multi-core or Heterogeneous multi-core Interactive task higher priority then non-inter: Marginal Cost

13 Dynamic Task Insertion and Deletion

14

15 COCA: Computation Offload to Clouds using AOP Hsing-Yu Chen, Yue-Hsun Lin, and Chen-Mou Cheng

16 Introduction Computation Offload – Not Mobile cloud AOP Approach – COCA works in source level vs. Binary level approach – In binary level approach, the offload can be made transparent to the application programmers – But the benefits of this become less important in cloud computing

17 Background Aspect-Oriented Programming – Increase Modularity by allowing the separation of cross-cutting concerns – Entails breaking down program logic into distinct part

18 Background AspectJ – Allows programmers to define “aspects” Aspect provides pointcuts and advices for specific functions – Corresponding advices – main AOP used in COCA before, after, around AspectJ for Android – No official support for Android yet – Major changes Alter the compilation phase of Android Java compiler to AspectJ Dynamic Loading for Java Classes – Complied java bytecode(.class) can be loaded and run on a JVM dynamically in runtime

19 Design of COCA

20 Profile Stage 1.Mark all pure functions 2.Evaluates the processing time and required memory foot print for each function – Result of profiling is summarized in a report – Allows evaluation in an emulated environment – Allows automate the selection process by integrating COCA with existing program partitioning schemes

21 Build Stage 1.Divide the original Java source code into ‘to offload’ and ‘not to offload’ – Programmer can selects the target function to offload It selects the dependent classes 2.Translate the code into AspectJ code – Filtered Java classes are complied to JVM bytecode – Results Jar file for cloud server Apk installation file for Android

22 Register stage Assumption – The user already has an account on an existing cloud service (Amazon EC2) Process – Run the COCA server daemon in the cloud – Upload the compiled bytecode in jar files to the cloud Authenticates and loads the clases from the jar file via the dynamic loading

23 Running 1. Launch the corresponding program 2. COCA requests computation offload 3. Server retrieve the related classes from the database, load the target classes 4. Perform computation by calling appropriate functions 5. Send the result back to smart phone

24 Experimental Evaluation Overhead of AspectJ on Android – Target Device – HTC Tattoo smart phone Qualcomm MSM7225 (528Mhz) – First approach – Comparing the latency of function calls with/without AspectJ Before/after advice – 195 ns per call Around advice – 290 ns per call – Second approach – Android sample application “Amazed” – The overhead brought by Aspect J is negligible

25 Experimental Evaluation Real-world Android Chess Game case – AI Capability Enhancement

26 Experimental Evaluation Communication Cost 3G network : 120/509kbps (Up/Down) Transmitted data : 30KB COCA should work very well on current Wi-Fi network

27 Experimental Evaluation Energy Saving – Using Monsoon power monitor – Experiment on Honzovy achy AI computation 56% energy reduction

28 Discussions Arguments for Working at source level – Additional Overhead No additional overhead for developer – If he codes in AOP…… Users – Need to install patched VM – Modularized source code Developer can simply isolate the design from mobile side and cloud side Maintenance much easier

29 Discussions Pure vs. Non-pure Functions – Non-pure functions Tend to access global variables, including primitive variable Static object calls – Synchronize the function with remote object Serializing – severe cost

30 Discussions Potential Application – 3D image rendering  3D Games on mobile Related solutions – NVIDIA RealityServer – OTOY’s streaming platform – Amazon EC2 - EnFuzion

31 Related works


Download ppt "An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin, You-Cheng Syu, Chao-Jui Chang,"

Similar presentations


Ads by Google