New Challenges in Cloud Datacenter Monitoring and Management

Slides:

Advertisements

Similar presentations

Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,

Advertisements

© 2007 Open Grid Forum Grids in the IT Data Center OGF 21 - Seattle Nick Werstiuk October 16, 2007.

What is Cloud Computing? Massive computing resources, deployed among virtual datacenters, dynamically allocated to specific users and tasks and accessed.

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.

State Monitoring in Cloud Datacenters Shing Meng (Student Member, IEEE) Ling Liu (Senior Member, IEEE) Ting Wang (Student Member, IEEE) IEEE Transactions.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

SLA-Oriented Resource Provisioning for Cloud Computing

Towards Autonomic Adaptive Scaling of General Purpose Virtual Worlds Deploying a large-scale OpenSim grid using OpenStack cloud infrastructure and Chef.

Cloud Computing to Satisfy Peak Capacity Needs Case Study.

CLOUD COMPUTING AN OVERVIEW & QUALITY OF SERVICE Hamzeh Khazaei University of Manitoba Department of Computer Science Jan 28, 2010.

Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,

Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.

Transform your desktop with virtualization. 22 Agenda Evolution of VDI VDI Solution VDI Use Cases Questions & Answers.

INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 4.

Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.

SPRING 2011 CLOUD COMPUTING Cloud Computing San José State University Computer Architecture (CS 147) Professor Sin-Min Lee Presentation by Vladimir Serdyukov.

WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.

Adaptive Server Farms for the Data Center Contact: Ron Sheen Fujitsu Siemens Computers, Inc Sever Blade Summit, Getting the.

H-1 Network Management Network management is the process of controlling a complex data network to maximize its efficiency and productivity The overall.

EA and IT Infrastructure - 1© Minder Chen, Stages in IT Infrastructure Evolution Mainframe/Mini Computers Personal Computer Client/Sever Computing.

Plan Introduction What is Cloud Computing?

VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.

Self-Adaptive QoS Guarantees and Optimization in Clouds Jim (Zhanwen) Li (Carleton University) Murray Woodside (Carleton University) John Chinneck (Carleton.

Cloud Computing Cloud Computing Class-1. Introduction to Cloud Computing In cloud computing, the word cloud (also phrased as "the cloud") is used as a.

A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.

P2P Systems Meet Mobile Computing A Community-Oriented Software Infrastructure for Mobile Social Applications Cristian Borcea *, Adriana Iamnitchi + *

Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.

Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.

Computer System Architectures Computer System Software

Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.

A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.

G O D D A R D S P A C E F L I G H T C E N T E R 1 Global Precipitation Measurement (GPM) GV Data Exchange Protocol Mathew Schwaller GPM Formulation Project.

Light showcase: System Center 2012 SP1- Operations Manager Medium showcase: System Center 2012 SP1- Operations Manager Deep showcase:

Low-Power Wireless Sensor Networks

November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.

Cloud Computing Energy efficient cloud computing Keke Chen.

Storage Management in Virtualized Cloud Environments Sankaran Sivathanu, Ling Liu, Mei Yiduo and Xing Pu Student Workshop on Frontiers of Cloud Computing,

STORAGE ARCHITECTURE/ EXECUTIVE: Virtualization It’s not what you think you’re buying. John Blackman Independent Storage Consultant.

1 High-Level Carrier Requirements for Cross Layer Optimization Dave McDysan Verizon.

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.

Challenges towards Elastic Power Management in Internet Data Center.

The Grid System Design Liu Xiangrui Beijing Institute of Technology.

©2015 EarthLink. All rights reserved Cloud Express ™ Optimize Your Business & Cloud Networks.

Advanced Computer Networks Topic 2: Characterization of Distributed Systems.

BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.

OPERETTA: An Optimal Energy Efficient Bandwidth Aggregation System Karim Habak†, Khaled A. Harras‡, and Moustafa Youssef† †Egypt-Japan University of Sc.

9 Systems Analysis and Design in a Changing World, Fourth Edition.

Cracow Grid Workshop ‘06 17 October 2006 Execution Management and SLA Enforcement in Akogrimo Antonios Litke Antonios Litke, Kleopatra Konstanteli, Vassiliki.

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

VMware vSphere Configuration and Management v6

June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.

Architecture & Cybersecurity – Module 3 ELO-100Identify the features of virtualization. (Figure 3) ELO-060Identify the different components of a cloud.

3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.

Web Technologies Lecture 13 Introduction to cloud computing.

Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,

Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer

PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.

Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.

KAASHIV INFOTECH – A SOFTWARE CUM RESEARCH COMPANY IN ELECTRONICS, ELECTRICAL, CIVIL AND MECHANICAL AREAS

INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.

INTRODUCTION TO CLOUD COMPUTING. CLOUD  The expression cloud is commonly used in science to describe a large agglomeration of objects that visually appear.

Welcome To We have registered over 5,000 domain names and host over 1,500 cloud servers for individuals and organizations, Our fast and reliable.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

Grid Computing.

Brandon Hixon Jonathan Moore

Cloud Computing: Concepts

Presentation transcript:

New Challenges in Cloud Datacenter Monitoring and Management Shicong Meng (smeng@cc.gatech.edu)

Agenda Background Challenges in Cloud Monitoring System-level User-level Network-level Conclusions and Future Work Cloud Management Related Work Student Workshop for Frontier of Cloud Computing

Background Complexity and Mission Criticalness of Cloud Scale and diversity of the infrastructure Servers, network devices, storages, etc. Hundreds, even thousands of machines Massive number of user applications Catastrophic consequence of failure / security breach / performance degradation Monitoring is indispensable Availability, failure detection Performance, provisioning Security, anomaly detection Application-level monitoring Student Workshop for Frontier of Cloud Computing

Background Delivering Monitoring-as-a-Service Similar to other cloud services Database service (e.g. SimpleDB, Datastore) Storage service (e.g. S3) Application service (e.g. AppEngine) Various benefits End-to-end support, easy to use Well maintained, reliable service Sharing of implementation (template implementation) Student Workshop for Frontier of Cloud Computing

Background A high-level view of the cloud monitoring service Student Workshop for Frontier of Cloud Computing

Background State Monitoring Monitoring the state of a system / application / service State definition: a scalar value describes a certain state, V E.g. CPU utilization, average response time, etc. Violation: V > T Student Workshop for Frontier of Cloud Computing

Background Distributed State Monitoring State value V is aggregated across multiple objects Monitor and coordinator An example of web server monitoring (average CPU utilization) Student Workshop for Frontier of Cloud Computing

Background Architecture Monitor Server Coordinator Server Student Workshop for Frontier of Cloud Computing

Challenges at System Level Efficient Scalability Supporting tens of thousands of monitoring tasks Cost effective: minimize resource usage Monitoring QoS Multi-tenancy environment Minimize resource contention between monitoring tasks Student Workshop for Frontier of Cloud Computing

Efficient Scalability Massive Scale Many monitoring tasks are inherently large scale E.g. SLA monitoring A large number of users Infrastructure monitoring Application monitoring Monitoring tasks with high cost E.g. Distributed heavy hitter detection based on netflow data Cost Effectiveness Monitoring is a facilitating service Use few machines as possible Student Workshop for Frontier of Cloud Computing

Efficient Scalability Observation Not every task need intensive monitoring One task may not need intensive monitoring all the time Student Workshop for Frontier of Cloud Computing

Efficient Scalability Violation Likelihood Driven Adaptation Perform intensive monitoring Only for tasks with high violation likelihood Only when the violation likelihood of the task is high Efficient violation estimation based on the sampled value change δ Reduce sampling frequency if violation likelihood less than an error allowance V2 V1 δ Time Monitored Value Student Workshop for Frontier of Cloud Computing

Efficient Scalability Handling Changes of Distribution Distributing error allowance among multiple monitor node Error Allowance

Efficient Scalability Results Student Workshop for Frontier of Cloud Computing

Challenges at System Level Efficient Scalability Supporting tens of thousands of monitoring tasks Cost effective: minimize resource usage Monitoring QoS Multi-tenancy environment Minimize resource contention between monitoring tasks Student Workshop for Frontier of Cloud Computing

Quality-of-Service Implication of Multi-Tenancy Monitoring tasks: adding, removing Resource contention between monitoring tasks Understanding the impact of resource contention Let’s first look at the implementation of monitor server …

Quality-of-Service Threading on Monitor Servers Performance and scalability goals Naïve implementation Per-node thread Potential large number of simultaneous monitoring tasks high threading cost Thread pool based implementation Global scheduling for all monitor nodes within one server Triggers for sampling and distributed condition evaluation Scalability: sorted triggers Thread pool

Quality-of-Service Impact of resource contention Sampling job may take longer time to finish (mis-deadlines) Some monitoring tasks may miss sampling points (misfiring)

Quality-of-Service Challenges in Resolving Resource Contention Average resource utilization is not sufficient May lead to wrong decision Monitor nodes of the same task must be scheduled to execute at the same time. Time shift should be minimized 60 secs 60 secs 60 secs 60 secs 60 secs 60 secs

Quality-of-Service Challenges in Resolving Resource Contention Average resource utilization is not sufficient May lead to wrong decision Monitor nodes of the same task must be scheduled to execute at the same time. Time shift should be minimized 60 secs 60 secs 60 secs 60 secs 60 secs 60 secs

Quality-of-Service Challenges in Resolving Resource Contention Average resource utilization is not sufficient May lead to wrong decision Monitor nodes of the same task must be scheduled to execute at the same time. Time shift should be minimized 60 secs 60 secs 60 secs 60 secs 60 secs 60 secs

Quality-of-Service Challenges in Resolving Resource Contention Average resource utilization is not sufficient May lead to wrong decision Monitor nodes of the same task must be scheduled to execute at the same time. Time shift should be minimized 60 secs 60 secs 60 secs 60 secs 60 secs 60 secs

Quality-of-Service Approach Intuition Capturing patterns of Monitoring task resource usage Server resource availability Matching usage pattern and availability pattern efficiently 50%-80% reduction in mis-deadlines and misfiring

Challenges at User Level Budget-Aware Monitoring Allow dynamic monitoring resolution based on available budget Distributed Continuous Violation Detection Meets the need of different detection model Achieve efficiency at the same time Student Workshop for Frontier of Cloud Computing

Budget-Aware Monitoring Cloud and “Pay-as-You-Go” Directly associate computing cost with monetary cost Allow flexible provisioning based on available budget Overhead in Cloud Monitoring Violation processing cost E.g. provisioning new servers when detects performance degradation Also consumes cloud users’ budget What does existing monitoring techniques miss? No connection between monitoring utility and monitoring cost E.g. the budget consumption of a monitoring task is simply unknown… Surprising bills are possible… An ideal type of monitoring

Budget-Aware Monitoring Why we need a new interface? Web application auto-scaling Dynamically adding/removing servers based on performance Given a budget, how should we configure the monitoring task?

Budget-Aware Monitoring Monitoring Resolution Granularity of monitoring We propose to use sliding time windows to control monitoring resolution E.g. average all sample values within the window

Budget-Aware Monitoring Monitoring Resolution Granularity of monitoring We propose to use sliding time windows to control monitoring resolution E.g. average all sample values within the window

Budget-Aware Monitoring How does budget-aware monitoring work? Determine monitoring resolution based on available budget When budget is abundant Using fine monitoring resolution Detect both trivial and important violation When budget is limited Using coarse monitoring resolution Detect less but important violation

Budget-Aware Monitoring Approach Sketch Results summary Auto-scaling experiment with RUBiS on emulab 20% - 40% reduction in response time

Challenges at User Level (Brief) Distributed Continuous Violation Detection Instantaneous detection model Continuous detection model Small difference in model, big difference in distributed processing L L Short-term burst Persistent violation Student Workshop for Frontier of Cloud Computing

Challenges at Network Level (Brief) Resource-Aware Monitoring Fabric Monitoring the functioning of both systems and applications running on large-scale distributed systems Continuous collecting detailed attribute values A large number of nodes A large number of attributes Overhead increases quickly as the system, application and monitoring tasks scales up. Goal Organizing nodes into a monitoring overlay Per-node resource constraint is not violated Maximize the number of values to be collected Student Workshop for Frontier of Cloud Computing

Conclusions and Future Work Monitoring-as-a-service Brings various benefits to applications deployed in cloud However, it is also difficult to deliver Involves changes at almost all levels We developed techniques to solve some of the problems Require further study Future Work Monitoring API Provisioning monitoring service and billing Etc. Student Workshop for Frontier of Cloud Computing

Cloud Management Related Work Scalable Management Middleware for Virtualized Datacenters Scalable and Cost-Effective IPTV Cloud Student Workshop for Frontier of Cloud Computing

Thank You Questions?