Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

Slides:

Advertisements

Similar presentations

Capacity Planning in a Virtual Environment

Advertisements

Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.

Operating Systems Process Scheduling (Ch 3.2, )

Walter Binder University of Lugano, Switzerland Niranjan Suri IHMC, Florida, USA Green Computing: Energy Consumption Optimized Service Hosting.

A Cyber-Physical Systems Approach to Energy Management in Data Centers Presented by Chen He Adopted form the paper authors.

Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,

XENMON: QOS MONITORING AND PERFORMANCE PROFILING TOOL Diwaker Gupta, Rob Gardner, Ludmila Cherkasova 1.

Xavier León PhD defense

Quality of Service in IN-home digital networks Alina Albu 23 October 2003.

1 Performance Evaluation of Computer Networks Objectives  Introduction to Queuing Theory  Little’s Theorem  Standard Notation of Queuing Systems  Poisson.

Chapter 1 and 2 Computer System and Operating System Overview

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.

Chapter 11 Operating Systems

Computer Organization and Architecture

14-1. Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin 14 Capacity Planning and Queuing Models.

Grid Load Balancing Scheduling Algorithm Based on Statistics Thinking The 9th International Conference for Young Computer Scientists Bin Lu, Hongbin Zhang.

Akhil Langer, Harshit Dokania, Laxmikant Kale, Udatta Palekar* Parallel Programming Laboratory Department of Computer Science University of Illinois at.

Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.

Towards Eco-friendly Database Management Systems W. Lang, J. M. Patel (U Wisconsin), CIDR 2009 Shimin Chen Big Data Reading Group.

Authors: Mateusz Jarus, Ewa Kowalczuk, Michał Madziar, Ariel Oleksiak, Andrzej Pałejko, Michał Witkowski Poznań Supercomputing and Networking Center GICOMP.

Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.

How to Resolve Bottlenecks and Optimize your Virtual Environment Chris Chesley, Sr. Systems Engineer

XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.

Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.

OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.

Low-Power Wireless Sensor Networks

Cloud Computing Energy efficient cloud computing Keke Chen.

Chapter 5 Operating System Support. Outline Operating system - Objective and function - types of OS Scheduling - Long term scheduling - Medium term scheduling.

Lecture 2 Process Concepts, Performance Measures and Evaluation Techniques.

Tag line, tag line Power Management in Storage Systems Kaladhar Voruganti Technical Director CTO Office, Sunnyvale June 12, 2009.

Challenges towards Elastic Power Management in Internet Data Center.

By Lecturer / Aisha Dawood 1.  Dedicated and Shared Server Processes  Configuring Oracle Database for Shared Server  Oracle Database Background Processes.

Summer Report Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

An Energy-Efficient Hypervisor Scheduler for Asymmetric Multi- core 1 Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.

1 University of Maryland Linger-Longer: Fine-Grain Cycle Stealing in Networks of Workstations Kyung Dong Ryu © Copyright 2000, Kyung Dong Ryu, All Rights.

Data Placement and Task Scheduling in cloud, Online and Offline 赵青天津科技大学

The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.

October 18, 2005 Charm++ Workshop Faucets A Framework for Developing Cluster and Grid Scheduling Solutions Presented by Esteban Pauli Parallel Programming.

Thermal-aware Issues in Computers IMPACT Lab. Part A Overview of Thermal-related Technologies.

Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.

Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.

VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.

Using Map-reduce to Support MPMD Peng

XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Accounting for Load Variation in Energy-Efficient Data Centers

Zeta: Scheduling Interactive Services with Partial Execution Yuxiong He, Sameh Elnikety, James Larus, Chenyu Yan Microsoft Research and Microsoft Bing.

Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer

Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.

Virtual Cluster Computing in IHEPCloud Haibo Li, Yaodong Cheng, Jingyan Shi, Tao Cui Computer Center, IHEP HEPIX Spring 2016.

Architecture for Resource Allocation Services Supporting Interactive Remote Desktop Sessions in Utility Grids Vanish Talwar, HP Labs Bikash Agarwalla,

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

Designing a Grid Computing Architecture: A Case Study of Green Computing Implementation Using SAS® N.Krishnadas Indian Institute of Management, Kozhikode.

Introduction to Load Balancing:

Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1

Green cloud computing 2 Cs 595 Lecture 15.

Controlled Kernel Launch for Dynamic Parallelism in GPUs

Condor – A Hunter of Idle Workstation

System Control based Renewable Energy Resources in Smart Grid Consumer

Computing Resource Allocation and Scheduling in A Data Center

Why Is There A Need For Green Data Center? Data Center Costs Are Rising.

Zhen Xiao, Qi Chen, and Haipeng Luo May 2013

TDC 311 Process Scheduling.

Virtual Memory: Working Sets

Presentation transcript:

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1

O Most data centers including University at Buffalo’s center of computational research (CCR) resources keep running 365*24 despite the knowledge of the work load or utilization. This results in increase of power consumption and decrease of resource utilization. Energy efficient centers are really important as they vastly contribute financially and technically. 2

Outline O Introductions O Overview of the Architecture O Power conservation Mechanism O Adaptive pool Mechanism O Simulation and Measurement O Conclusion O Future work 3

O Power equipment, cooling equipment, and electricity together represents a significant portion of a data center’s cost, O Any guess’s for the %? O Cost is up to 63 percent of the total cost of ownership of its physical IT infrastructure. 4

How to make Data Centers more cost efficient? O For the hardware component level, a general approach is to reduce the power consumed by components not currently in use. Some examples are: O placing the CPU in a “halted” state when there are no runnable tasks O Turning off the hard drive motor or memory device after some period of inactivity O resizing the cache by powering down unused cache lines 5

Approach taken by this article: O This paper proposes an adaptive pool based resource management (APRM) mechanism to provide computing capacity on-demand. O APRM implements power saving by terminating part of idle nodes and guarantees QoS by reserving some idle nodes O By obtaining load information from the management system, APRM can predict the load amount. 6

Management System of HPC O Management system of HPC consists of two components: O Job management subsystem O Resource management subsystem 7

Overview of an extensible cluster management architecture 8

Job Management System O Job Controller O Executing entity that dispatches jobs, controls their life time by starting a job, suspending or canceling them. O Job Supervisor O Responsible for monitoring job status and reporting that information to queue manager. O Queue Manager O Queuing the jobs in the waiting queue O Updating the queue upon receiving information from job supervisor O Making decision about job scheduling in accordance with scheduling algorithm and available resources O Informing job Controller to execute 9

Resource Manager O Executor O Dedicated to executing the instructions O Resource Monitor O Concentrates on monitoring and collecting the status information of resources O Statistics Analyzer O Auxiliary component for supporting automatic and intelligent resource management. O Policy Decisioner O Maintains a collection of policies which are triggered by some predefined events. O Energy effective resource management method is kept in the policy decisioner. 10

Demand fluctuations: O Many studies have shown that demand for high performance scientific computing varies with time. As is studied, job arrivals are expected to have cycles at three levels: O Daily (daily working hours are the peak hours) O Weekly (weekend have the lowest job arrivals) O Yearly () 11

Server States 12

Power Model of Servers O Busy, Idle, Shutdown O Upon completion of all the jobs in a computing node that power state transits from busy to idle. O Once new job arrive at a new computing node, the power state transits from idle to busy. O If a computing node keeps idle for a long time, it will be terminated and the power state transits from Idle to shutdown. O When the workload is becoming heavy, additional computing capacity is expected. Some computing nodes will be wakened up to take part and their status will be transitioned from shutdown to idle and than to busy. 13

Adaptive Pool Mechanism O A resource pool is a collection of computing nodes offering shared access to computing capacity, and the automation and virtualization capabilities of resource pool promise lower costs of ownership. 14

Mechanism of APRM O corePoolSize: the number of nodes to keep in the pool, and it is the sum of the numbers of working nodes and idle nodes. O maxPoolSize: the maximum number of nodes to allow in the pool, and it equals to the total number of the nodes in a cluster. O maxIdleNodes: the maximum number of idle nodes to keep in the pool. O keepAliveTime: when the number of idle nodes in the pool is greater than maxIdleNodes, this is the maximum time that excess computing nodes will wait for new jobs before terminating. 15

Termination Conditions O The idle time of idle nodes is beyond keepAliveTime; O The first condition prevents a computing node from frequently terminating and launching when the computing demand fluctuates in short cycle. 16

Termination Conditions O The number of the idle nodes in the pool is larger than maxIdleNodes; O The second condition targets at decreasing needless computing nodes to save power. 17

Termination Conditions O If more than one idle node simultaneously meets the two conditions above, nodes with longer runtime have priority to terminate. O The third condition is to balance the utilization of nodes. After termination of some idle nodes, the number of idle nodes in the pool maintains maxIdlenodes. 18

APRM APRM implements power saving by terminating part of idle nodes and guarantees QoS by reserving some idle nodes whose number maintains maxIdleNodes. The working parameter maxIdleNodes plays an important role in APRM. If it is set too high, this will lead to excessive provision of computing capacity. However, if too low, the reserved idle nodes may be insufficient to new arrival jobs, and the spare nodes will be wakened to take part in computing with a delay of start-up. 19

20 The ratio of run time of all the computing nodes with APRM to that without APRM as the metric for power saving, and it can be denoted as formula

21 The time between job arrival and completion, averaged over all jobs

22 The ratio of the response time of a job to the time it requires on a dedicated system, averaged over all jobs

23 Average frequency of shutdown as a metric to measure whether computing nodes frequently terminate and launch

Simulation Model O Workload generator O Job scheduler O Resource manager 24

25

26

Summary O The difference of average job response time is not more than minutes, and that of average job slow down is not beyond This suggests APRM has little impact on QoS with significant power saving. O Future Work: O Researching traces O And conclude with better predictive methods 27

Thank You 28