An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin, You-Cheng Syu, Chao-Jui Chang,

Slides:



Advertisements
Similar presentations
Asaf Cidon. , Tomer M. London
Advertisements

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Zhou Peng, Zuo Decheng, Zhou Haiying Harbin Institute of Technology 1.
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Operating System Structure
Energy-efficient Virtual Machine Provision Algorithms for Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.
CLOUD COMPUTING FOR MOBILE USERS: CAN OFFLOADING COMPUTATION SAVE ENERGY? Purdue University.
Technical Architectures
Chapter 1: An Overview of Computers and Programming Languages J ava P rogramming: From Problem Analysis to Program Design, From Problem Analysis to Program.
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
Understanding Operating Systems 1 Overview Introduction Operating System Components Machine Hardware Types of Operating Systems Brief History of Operating.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
1 FM Overview of Adaptation. 2 FM RAPIDware: Component-Based Design of Adaptive and Dependable Middleware Project Investigators: Philip McKinley, Kurt.
Chapter 21: Mobile Virtualization Infrastracture and Related Security Issues Guide to Computer Network Security.
ThinkAir: Dynamic Resource Allocation and Parallel Execution in Cloud for Mobile Code Offloading Sokol Kosta, Pan Hui Deutsche Telekom Labs, Berlin, Germany.
Computation Offloading
Computer System Architectures Computer System Software
9/14/2015B.Ramamurthy1 Operating Systems : Overview Bina Ramamurthy CSE421/521.
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
Cloud Computing Energy efficient cloud computing Keke Chen.
CISC105 General Computer Science Class 1 – 6/5/2006.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Progress Report 2014/02/12. Previous in IPDPS’14 Energy-efficient task scheduling on per- core DVFS architecture ◦ Batch mode  Tasks with arrival time.
An Energy-Efficient Hypervisor Scheduler for Asymmetric Multi- core 1 Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.
2013/12/09 Yun-Chung Yang Partitioning and Allocation of Scratch-Pad Memory for Priority-Based Preemptive Multi-Task Systems Takase, H. ; Tomiyama, H.
Operating System Principles And Multitasking
An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science,
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Eduardo Cuervo – Duke University Aruna Balasubramanian - University of Massachusetts Amherst Dae-ki Cho - UCLA Alec Wolman, Stefan Saroiu, Ranveer Chandra,
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Research on Embedded Hypervisor Scheduler Techniques 2014/10/02 1.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Installing Java on a Home machine For Windows Users: Download/Install: Go to downloads html.
Workload Clustering for Increasing Energy Savings on Embedded MPSoCs S. H. K. Narayanan, O. Ozturk, M. Kandemir, M. Karakoy.
Lesson 1 1 LESSON 1 l Background information l Introduction to Java Introduction and a Taste of Java.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
Chapter 1 An Overview of Computers and Programming Languages.
Data-Centric Systems Lab. A Virtual Cloud Computing Provider for Mobile Devices Gonzalo Huerta-Canepa presenter 김영진.
Nguyen Thi Thanh Nha HMCL by Roelof Kemp, Nicholas Palmer, Thilo Kielmann, and Henri Bal MOBICASE 2010, LNICST 2012 Cuckoo: A Computation Offloading Framework.
Application-Aware Traffic Scheduling for Workload Offloading in Mobile Clouds Liang Tong, Wei Gao University of Tennessee – Knoxville IEEE INFOCOM
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
The Tail at Scale Dean and Barroso, CACM Feb 2013 Presenter: Chien-Ying Chen (cchen140) CS538 - Advanced Computer Networks 1.
Rakesh Kumar Keith Farkas Norman P Jouppi,Partha Ranganathan,Dean M.Tullsen University of California, San Diego MICRO 2003 Speaker : Chun-Chung Chen Single-ISA.
Dynamic Mobile Cloud Computing: Ad Hoc and Opportunistic Job Sharing.
CT101: Computing Systems Introduction to Operating Systems.
Resource Provision for Batch and Interactive Workloads in Data Centers Ting-Wei Chang, Pangfeng Liu Department of Computer Science and Information Engineering,
Computer System Structures
Installing Java on a Home machine
Memory Segmentation to Exploit Sleep Mode Operation
Operating Systems : Overview
Computational Thinking, Problem-solving and Programming: General Principals IB Computer Science.
Introduction
From Algorithm to System to Cloud Computing
Chapter 9 – Real Memory Organization and Management
Ching-Chi Lin Institute of Information Science, Academia Sinica
Computing Resource Allocation and Scheduling in A Data Center
Java programming lecture one
Collaborative Offloading for Distributed Mobile-Cloud Apps
Installing Java on a Home machine
Operating Systems Bina Ramamurthy CSE421 11/27/2018 B.Ramamurthy.
Operating Systems : Overview
Operating Systems : Overview
Multithreaded Programming
Presentation transcript:

An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin, You-Cheng Syu, Chao-Jui Chang, Jan-Jan Wu, Pangfeng Liu, Po-Wen Cheng and Wei-Te Hsu

Introduction Green computing is imperative Increasing of computers Increasing of energy cost Increasing of Carbon emissions

Motivation  Main technologies to improve energy effective ◦ Hardware level: Low power devices ◦ System level: Power-management mechanisms in different levels ◦ Application level: Consolidate with virtualization  Power-management mechanisms ◦ Circuit level: Clock-gating ◦ System level: DPM ◦ Processor level: DVFS/DFS/DVS, C-state To Shutdown unused component or circuit

Task Execution Modes Batch Mode – Batches of jobs Online Mode – Different time constraints – Interactive and non-interactive tasks – e.g. online judging system

Contributions Task scheduling strategy that solves three important issues simultaneously: – assignment of tasks to CPU cores – execution order of tasks – CPU processing rate for the execution of each task Task model, CPU processing rate model, energy consumption model, and cost function. Workload Based Greedy (WBG) for execution of tasks in the batch mode Least Marginal Cost (LMC) a heuristic algorithm for executing tasks in the online mode, LMC assigns interactive and non-interactive tasks to cores.

MODELS Task Model – j k = (L k,A k,D k ) – where L k is the number of CPU cycles required to complete j k, A k is the arrival time of j k, and D k is the deadline of j k. If j k has a specific deadline, D k > A k ≥ 0 Processing Rate – Let P = {p1,p2,p3,...} be a non-empty set of discrete processing rates a core can utilize based on the hardware, with 0 < p1 <p2 < p3 <... < p |P|. – We use p jk from set P to denote the processing rate of a task j k. Energy Consumption – For a task j k, let e k is energy consumption; t k the execution time; and p jk be the processing rate. – We define E(p) and T (p) as the energy and the time required to execute one cycle with processing rate p on a CPU core

TASK SCHEDULING IN THE BATCH MODE Tasks with Deadlines / Deadline-SingleCore Partition problem: let A={a1,…,an} is set of +ve integers. – Theorem: Deadline-SingleCore is NP Complete. Proof: n tasks j1,…,jn ; no. of cycles needed for first n task is Li=ai S=a1+,…,+an: is total no. of cycles for finishing n tasks. T(pl)=2, T(ph)=1, E(ph)=4, E(pl)=1 ; E = T 2 Time constraint is 1.5S and energy Constraint is 2.5S, deadline is 1.5S. No. of tasks whose sum is at least S/2 to complete in 1.5S time and 2.5S energy.

Tasks without Deadlines on a Single Core Platform Cost Function must consider both the energy consumption and the execution time. – Energy Cost: C k,e = R e L k E(p jk ) – Temporal Cost: C k is cost of task j k And C is total cost for all tasks

Tasks without Deadlines on a Single Core Platform Amount of delay that a task causes for other tasks

Dominating Position Set/Range D p is “dominating position set” of p

Scheduling Tasks without Deadlines on Multi-core Platforms Scheduling tasks – Homogeneous multi-core systems Same energy consumption and time consumption function Round-Robin techniques to assign tasks – Heterogeneous multi-core systems Different energy consumption and time consumption function Tasks are assigned in Greedy manner

TASK SCHEDULING IN THE ONLINE MODE For e.g. Online judging system Interactive Tasks and Non-interactive Tasks System can be Homogeneous multi-core or Heterogeneous multi-core Interactive task higher priority then non-inter: Marginal Cost

Dynamic Task Insertion and Deletion

COCA: Computation Offload to Clouds using AOP Hsing-Yu Chen, Yue-Hsun Lin, and Chen-Mou Cheng

Introduction Computation Offload – Not Mobile cloud AOP Approach – COCA works in source level vs. Binary level approach – In binary level approach, the offload can be made transparent to the application programmers – But the benefits of this become less important in cloud computing

Background Aspect-Oriented Programming – Increase Modularity by allowing the separation of cross-cutting concerns – Entails breaking down program logic into distinct part

Background AspectJ – Allows programmers to define “aspects” Aspect provides pointcuts and advices for specific functions – Corresponding advices – main AOP used in COCA before, after, around AspectJ for Android – No official support for Android yet – Major changes Alter the compilation phase of Android Java compiler to AspectJ Dynamic Loading for Java Classes – Complied java bytecode(.class) can be loaded and run on a JVM dynamically in runtime

Design of COCA

Profile Stage 1.Mark all pure functions 2.Evaluates the processing time and required memory foot print for each function – Result of profiling is summarized in a report – Allows evaluation in an emulated environment – Allows automate the selection process by integrating COCA with existing program partitioning schemes

Build Stage 1.Divide the original Java source code into ‘to offload’ and ‘not to offload’ – Programmer can selects the target function to offload It selects the dependent classes 2.Translate the code into AspectJ code – Filtered Java classes are complied to JVM bytecode – Results Jar file for cloud server Apk installation file for Android

Register stage Assumption – The user already has an account on an existing cloud service (Amazon EC2) Process – Run the COCA server daemon in the cloud – Upload the compiled bytecode in jar files to the cloud Authenticates and loads the clases from the jar file via the dynamic loading

Running 1. Launch the corresponding program 2. COCA requests computation offload 3. Server retrieve the related classes from the database, load the target classes 4. Perform computation by calling appropriate functions 5. Send the result back to smart phone

Experimental Evaluation Overhead of AspectJ on Android – Target Device – HTC Tattoo smart phone Qualcomm MSM7225 (528Mhz) – First approach – Comparing the latency of function calls with/without AspectJ Before/after advice – 195 ns per call Around advice – 290 ns per call – Second approach – Android sample application “Amazed” – The overhead brought by Aspect J is negligible

Experimental Evaluation Real-world Android Chess Game case – AI Capability Enhancement

Experimental Evaluation Communication Cost 3G network : 120/509kbps (Up/Down) Transmitted data : 30KB COCA should work very well on current Wi-Fi network

Experimental Evaluation Energy Saving – Using Monsoon power monitor – Experiment on Honzovy achy AI computation 56% energy reduction

Discussions Arguments for Working at source level – Additional Overhead No additional overhead for developer – If he codes in AOP…… Users – Need to install patched VM – Modularized source code Developer can simply isolate the design from mobile side and cloud side Maintenance much easier

Discussions Pure vs. Non-pure Functions – Non-pure functions Tend to access global variables, including primitive variable Static object calls – Synchronize the function with remote object Serializing – severe cost

Discussions Potential Application – 3D image rendering  3D Games on mobile Related solutions – NVIDIA RealityServer – OTOY’s streaming platform – Amazon EC2 - EnFuzion

Related works