1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

Slides:



Advertisements
Similar presentations
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Advertisements

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Management and Control of Domestic Smart Grid Technology IEEE Transactions on Smart Grid, Sep Albert Molderink, Vincent Bakker Yong Zhou
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Decentralized Reactive Clustering in Sensor Networks Yingyue Xu April 26, 2015.
Walter Binder University of Lugano, Switzerland Niranjan Suri IHMC, Florida, USA Green Computing: Energy Consumption Optimized Service Hosting.
Tufts Wireless Laboratory Tufts University School Of Engineering Energy-Efficient Structuralized Clustering for Sensor-based Cyber Physical Systems Jierui.
Minimizing Expected Energy Consumption in Real-Time Systems through Dynamic Voltage Scaling Ruibin Xu, Daniel Mosse’, and Rami Melhem.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
Aleksandra Tešanović Low Power/Energy Scheduling for Real-Time Systems Aleksandra Tešanović Real-Time Systems Laboratory Department of Computer and Information.
Energy Efficient Prefetching – from models to Implementation 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software Engineering.
Energy Efficient Prefetching with Buffer Disks for Cluster File Systems 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software.
Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?
Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and Software Engineering Auburn University Energy Efficient Scheduling for High-Performance.
Chapter 1 and 2 Computer System and Operating System Overview
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
ENFORCING PERFORMANCE ISOLATION ACROSS VIRTUAL MACHINES IN XEN Diwaker Gupta, Ludmila Cherkasova, Rob Gardner, Amin Vahdat Middleware '06 Proceedings of.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
Kick-off meeting 3 October 2012 Patras. Research Team B Communication Networks Laboratory (CNL), Computer Engineering & Informatics Department (CEID),
Cutting the Electric Bill for Internet-Scale Systems Andreas Andreou Cambridge University, R02
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
Cloud Data Center/Storage Power Efficiency Solutions Junyao Zhang 1.
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
Green IT and Data Centers Darshan R. Kapadia Gregor von Laszewski 1.
Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy
Introduction Due to the recent advances in smart grid as well as the increasing dissemination of smart meters, the electricity usage of every moment in.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
Active Learning for Class Imbalance Problem
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
Baoxian Zhao Hakan Aydin Dakai Zhu Computer Science Department Computer Science Department George Mason University University of Texas at San Antonio DAC.
Network Aware Resource Allocation in Distributed Clouds.
Low-Power Wireless Sensor Networks
Cloud Computing Energy efficient cloud computing Keke Chen.
Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
1 EE5900 Advanced Embedded System For Smart Infrastructure Energy Efficient Scheduling.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
RELAX : An Energy Efficient Multipath Routing Protocol for Wireless Sensor Networks Bashir Yahya, Jalel Ben-Othman University of Versailles, France ICC.
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
EFFECTIVE LOAD-BALANCING VIA MIGRATION AND REPLICATION IN SPATIAL GRIDS ANIRBAN MONDAL KAZUO GODA MASARU KITSUREGAWA INSTITUTE OF INDUSTRIAL SCIENCE UNIVERSITY.
Company name KUAS HPDS A Realistic Variable Voltage Scheduling Model for Real-Time Applications ICCAD Proceedings of the 2002 IEEE/ACM international conference.
A Node and Load Allocation Algorithm for Resilient CPSs under Energy-Exhaustion Attack Tam Chantem and Ryan M. Gerdes Electrical and Computer Engineering.
Bi-Hadoop: Extending Hadoop To Improve Support For Binary-Input Applications Xiao Yu and Bo Hong School of Electrical and Computer Engineering Georgia.
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
Hard Real-Time Scheduling for Low- Energy Using Stochastic Data and DVS Processors Flavius Gruian Department of Computer Science, Lund University Box 118.
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and.
An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science,
Distributed Process Scheduling : A Summary
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.
GreenCloud: A Packet-level Simulator of Energy-aware Cloud Computing Data Centers Dzmitry Kliazovich ERCIM Fellow University of Luxembourg Apr 16, 2010.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
Multimedia Computing and Networking Jan Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs.
Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.
Evaluating Meta-Scheduling Algorithms in VLAM-G Environment V.Korkhov, A.Belloum, L.O.Hertzberger FNWI, University of Amsterdam Key VLAM-G applications.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Energy-aware QoS packet scheduling.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
Application-Aware Traffic Scheduling for Workload Offloading in Mobile Clouds Liang Tong, Wei Gao University of Tennessee – Knoxville IEEE INFOCOM
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Edinburgh Napier University
Grid Computing.
Flavius Gruian < >
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University

2 Outline  Introduction and Motivation  System Model  Algorithm  Performance Analysis  Summary

3 Introduction Distributed scientific applications in many cases require access to massive data sets. In High Energy Physics (HEP) applications, for example, a handful of experiments have started producing petabytes of data per year for decades. Data grids have served as a technology bridge between the need to access extremely large data sets and the goal of achieving high data transfer rates by providing geographically distributed computing resources and large- scale storage systems.

4 Introduction  The Google Data Cluster 31,654 machines 63,184 CPUs 126,368 Ghz of processing power two identical buildings contain about 100,000 square feet of data center floor space

5 Introduction  Reliability  Computing in high temperatures is more error-prone than in an appropriate environment.  Operational Cost  A single 200-Watt server, such as the IBM 1U*300. The energy bill for this single server would be $180/year.

6 Introduction A key factor in the process of scheduling data- intensive tasks is locations of input data sets required by tasks. A straightforward strategy to enhance performance of data-intensive applications on data grids is to replicate popular data sets to multiple resource sites. Offering higher data access speeds compared to maintaining the data sets in a single site.

7 Drawbacks of Making Too Many Replicas It is challenging to maintain consistency among replicas in Data Grids. It is nontrivial to efficiently generate replicas of massive data sets on the fly in Data Grids. A large number of data replicas can increase energy dissipation in storage resources.

8 Reduce Energy Consumption in Data Grids Minimize electricity cost Improve system reliability How to reduce energy consumption in Data Grids?  E nergy-efficient scheduling algorithms for applications running on data grids.

9 Goals of Scheduling Tradeoffs between energy efficiency and high- performance for data-intensive applications. Integrate data placement strategies with task scheduling Consider real-time requirements How to achieve the goals? A Distributed Energy-Efficient Scheduler called DEES Three key components: energy-aware ranking, performance-aware scheduling, and energy-aware dispatching.

10 Design Goals of DEES Maximize the number of tasks completed before their corresponding deadlines Replicate data and place replicas in an energy- efficient way Dispatches real-time tasks to peer computing sites, considering three factors: Computational capacities of peer computing sites, Energy consumption introduced by tasks, and Data location.

11 Features of DEES High scalability Require no full knowledge of workload conditions of all the computing sites in a data grid. One must consider that obtaining full knowledge of the state of the grid is a difficult task.

12 Key Ideas  High-priority tasks are scheduled first in order to meet their deadlines.  Explore slacks: low-priority tasks can have their deadlines guaranteed.  The dynamic voltage scaling (DVS) technique is used to reduce energy consumption by exploiting available slacks and adjusting appropriate voltage levels accordingly.

13 Dynamic Voltage Scaling  A effective technique for reducing energy consumption by adjusting the clock speed and supply voltage dynamically.  Energy dissipation per CPU cycle is proportional to v 2  Processor energy can be saved by reducing CPU voltages while running it at a slower speed.

14 Design Ideas  Two types of tasks: hard real-time tasks and soft real-time tasks.  Prioritize hard real-time tasks but create slacks by delaying their executions till the latest moment.  After a schedule is made, the processor voltage is adjusted to the lowest possible level on a task-by- task basis at each scheduling point.

15 System Model Geographically distributed sites are interconnected through a WAN. Each site consists of storage resources, computing resources, and a ticket server.

16 Energy Consumption Model  Consider energy consumption of executing tasks, making data replicas, and communicating.  The total energy consumption of a data grid, E total can be expressed as: where E comp is the total energy consumption of computing resources, E comm is the total energy consumption of communication, and E rep is the total energy consumption of replicating data.

17 Four Cases of Energy Consumption Case 1: Local execution and local data Case 2: Local execution and remote data Case 3: Remote execution and same remote data Case 4: Remote execution and different remote data

18 If data is not locally available, then? Executing a task at a site where its data is located: Energy efficient No data transfer and no replication cost Compared to the local execution and remote data scenario, executing the task at a remote site where data is located is still more energy efficient if task’s input data set is larger than its execution code size.

19 Algorithm Components  DEES is composed of  Ranking  Scheduling  Dispatching  Goals:  Maximize the number of tasks meeting deadlines  Minimize energy consumption  Improve scalability

20 Task Grouping  Task Grouping:  Tasks requiring the same data are grouped together.  The task group whose data resides in the local site, called local task group, is ranked first.  Other task groups are ranked in descending order, according to the number of tasks in the task group.  Considering Real-Time Requirements:  Within each group, tasks are ordered by increasing deadline.  Thus, tasks with shorter deadlines are scheduled sooner.

21 DEES Scheduling DEES schedules tasks on a group basis. A local task group is scheduled first. In order to schedule task t i on site s u, DEES selects machine m k at s u that can complete t i within its deadline and provide the minimum completion time. After processing all tasks, remaining unscheduled tasks will be dispatched to remote sites.

22 Dispatching Dispatching: To delivers tasks within each task group to data sites. For task group g j whose data site is s o, scheduling decisions are made by s o ’s scheduler based on its local resource status and task information of g j. If s o cannot schedule all tasks in g j, then unscheduled tasks are dispatched to s o ’s immediate neighbors using tickets in a breadth-first manner.

23 Energy-Aware Ranking To make tradeoffs between energy efficiency and real-time performance, we propose a ranking system to rank s o ’s neighbors. where n is the number of tasks in g j that can be scheduled on s v, ε is a coefficient concerning the task deadline, μ is a coefficient concerning energy saving. Energy consumed to replicate g i ’s data from s o to s v, Energy consumed to transfer g i ’s data from s o to s v, Energy consumed to execute these n tasks at s v.

24 Dispatching: Energy Efficiency vs. real-time ε and μ: To manage the two conflicting goals of saving energy and meeting deadlines. For mission-critical tasks: ε is set to 1 and μ is set to 0, which means the neighbor that can schedule more tasks is given preference. For energy efficiency: ε is set to 0 and μ is set to 1. Thus, the neighbor that consumes the least amount of energy will be considered first.

25 Simulation Parameters

26 Performance Analysis Compared DEES with an effective scheduling algorithm - Close-to-Files. Features of the Close-to-Files algorithm: Good performance since Close-to-File takes data locality into account. It schedules a task to its data site to decrease the amount of data transfer. Scheduling overhead is high: It is an exhaustive algorithm that searches across all combinations of computing and data sites to find a result with the minimum computation and data transmission cost.

27 Performance Metrics The Guarantee Ratio Normalized Average Energy Consumption and Total Energy Consumption are used as the performance metrics in the evaluation.

28 Real-Time Performance Fig. 5. Guarantee Ratio by ranking coefficients

29 Energy Consumption Fig. 6. Normalized Average Energy Consumption by ranking coefficients

30 Performance Fig. 7. Guarantee Ratio by task loads

31 Energy Consumption Fig. 8. Normalized Average Energy Consumption by task loads

32 Summary An energy efficient algorithm to schedule real-time tasks with data access requirements on data grids. By reducing the amount of data replication and task transfers, the proposed algorithm effectively saves energy. Distributed since it does not need knowledge of the complete state of the grid. Detailed simulations demonstrate that DEES significantly reduces the energy consumption while increasing the Guarantee Ratio.

33 Questions Xiao Qin