Autonomic SLA-driven Provisioning for Cloud Applications Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer Presented by Ismail Alan.

Slides:

Advertisements

Similar presentations

Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Advertisements

An economic approach for scalable and highly-available distributed applications Nicolas Bonvin, Thanasis G. Papaioannou and Karl Aberer School of Computer.

Hadi Goudarzi and Massoud Pedram

CloudStack Scalability Testing, Development, Results, and Futures Anthony Xu Apache CloudStack contributor.

Towards an Exa-scale Operating System* Ely Levy, The Hebrew University *Work supported in part by a grant from the DFG program SPPEXA, project FFMK.

CLOUD COMPUTING AN OVERVIEW & QUALITY OF SERVICE Hamzeh Khazaei University of Manitoba Department of Computer Science Jan 28, 2010.

Virtualization Futures VMworld Europe 2008 Dr. Stephen Alan Herrod CTO and VP of R&D, VMware.

What’s the Problem Web Server 1 Web Server N Web system played an essential role in Proving and Retrieve information. Cause Overloaded Status and Longer.

Memory Buddies: Exploiting Page Sharing for Smart Colocation in Virtualized Data Centers Timothy Wood, Gabriel Tarasuk-Levin, Prashant Shenoy, Peter Desnoyers*,

Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.

Dynamically Scaling Applications in the Cloud Presented by Paul.

NoHype: Virtualized Cloud Infrastructure without the Virtualization Eric Keller, Jakub Szefer, Jennifer Rexford, Ruby Lee ISCA 2010 Princeton University.

SLA-aware Virtual Resource Management for Cloud Infrastructures

Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.

FI-WARE – Future Internet Core Platform FI-WARE Cloud Hosting July 2011 High-level description.

COMS E Cloud Computing and Data Center Networking Sambit Sahu

Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.

Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.

INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 4.

Jennifer Rexford Princeton University MW 11:00am-12:20pm Wide-Area Traffic Management COS 597E: Software Defined Networking.

Scalable Server Load Balancing Inside Data Centers Dana Butnariu Princeton University Computer Science Department July – September 2010 Joint work with.

Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.

VMware vSphere 4 Introduction. Agenda VMware vSphere Virtualization Technology vMotion Storage vMotion Snapshot High Availability DRS Resource Pools Monitoring.

Self-Adaptive QoS Guarantees and Optimization in Clouds Jim (Zhanwen) Li (Carleton University) Murray Woodside (Carleton University) John Chinneck (Carleton.

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING Carlos de Alfonso Andrés García Vicente Hernández.

ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Presented by: Katya Rodriguez Ahmed Alsuwat Saud Tawi NETWORK AWARE LOAD- BALANCING VIA PARALLEL VM MIGRATION FOR DATA CENTERS Kun-Ting Chen,

Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical.

Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.

Software to Data model Lenos Vacanas, Stelios Sotiriadis, Euripides Petrakis Technical University of Crete (TUC), Greece Workshop.

Department of Computer Science Engineering SRM University

DaVinci: Dynamically Adaptive Virtual Networks for a Customized Internet Jennifer Rexford Princeton University With Jiayue He, Rui Zhang-Shen, Ying Li,

Database Replication Policies for Dynamic Content Applications Gokul Soundararajan, Cristiana Amza, Ashvin Goel University of Toronto EuroSys 2006: Leuven,

Network Aware Resource Allocation in Distributed Clouds.

CH2 System models.

INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 2.

Improving Network I/O Virtualization for Cloud Computing.

Challenges towards Elastic Power Management in Internet Data Center.

Papers on Storage Systems 1) Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud, SC ) Making Cloud Intermediate Data Fault-Tolerant,

1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.

Vic Liu Bob Mandeville Brooks Hickman Weiguo Hao Zu Qiang Speaker: Vic Liu China Mobile Problem Statement for VxLAN Performance Test draft-liu-nvo3-ps-vxlan-perfomance-00.

DaVinci: Dynamically Adaptive Virtual Networks for a Customized Internet Jiayue He, Rui Zhang-Shen, Ying Li, Cheng-Yen Lee, Jennifer Rexford, and Mung.

Visual Studio Windows Azure Portal Rest APIs / PS Cmdlets US-North Central Region FC TOR PDU Servers TOR PDU Servers TOR PDU Servers TOR PDU.

Hyper-V Performance, Scale & Architecture Changes Benjamin Armstrong Senior Program Manager Lead Microsoft Corporation VIR413.

Copyright © 2005 VMware, Inc. All rights reserved. How virtualization can enable your business Richard Allen, IBM Alliance, VMware

20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.

1 Agility in Virtualized Utility Computing Hangwei Qian, Elliot Miller, Wei Zhang Michael Rabinovich, Craig E. Wills {EECS Department, Case Western Reserve.

CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.

MidVision Enables Clients to Rent IBM WebSphere for Development, Test, and Peak Production Workloads in the Cloud on Microsoft Azure MICROSOFT AZURE ISV.

Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.

Distributed, Self-stabilizing Placement of Replicated Resources in Emerging Networks Bong-Jun Ko, Dan Rubenstein Presented by Jason Waddle.

Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

SEMINAR ON.  OVERVIEW -  What is Cloud Computing???  Amazon Elastic Cloud Computing (Amazon EC2)  Amazon EC2 Core Concept  How to use Amazon EC2.

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Workload Distribution Architecture

Measurement-based Design

Frequency Governors for Cloud Database OLTP Workloads

20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.

Zhen Xiao, Qi Chen, and Haipeng Luo May 2013

Specialized Cloud Mechanisms

ICSOC 2018 Adel Nadjaran Toosi Faculty of Information Technology

Cloud Computing Architecture

Specialized Cloud Architectures

Cloud Computing Architecture

Client/Server Computing and Web Technologies

Microsoft Virtual Academy

Presentation transcript:

Autonomic SLA-driven Provisioning for Cloud Applications Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer Presented by Ismail Alan

 This paper discusses an economic approach to managing cloud resources for individual applications based on established Service Level Agreements (SLA).  The approach attempts to mitigate the impact (to individual applications) of varying loads and random failures within the cloud. About the Paper

 A distributed, component-based application running on an elastic infrastructure Cloud Apps – Issue #1 : Placement 2 C1 C2C3C4

 A distributed, component-based application running on an elastic infrastructure Cloud Apps – Issue #1 : Placement 3 C2 VM1 C3 VM2 C4 VM3 C1

 A distributed, component-based application running on an elastic infrastructure Performance of C1, C2 and C3 is probably less than C4 No info on other VMs colocated on same server ! Cloud Apps – Issue #1 : Placement 5 No control on placement C3 VM2 C4 VM3 Server 2 C1 C2 Server 1 VM1

 Load-balanced traffic to 4 identical components on 4 identical VMs Cloud Apps – Issue #2 : Unstability 6 C1 VM2 100 ms C1 VM3 100 ms C1 VM4 100 ms C1 VM1 100 ms

 Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary because of different factors  Physical server, Hypervisor, Storage,... Cloud Apps – Issue #2 : Unstability 7 C1 VM2 140 ms C1 VM3 100 ms C1 VM4 100 ms C1 VM1 100 ms

 Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary because of different factors  Physical server, Hypervisor, Storage,... Component overloaded Cloud Apps – Issue #2 : Unstability 8 C1 VM1 130 ms C1 VM2 140 ms C1 VM3 100 ms C1 VM4 100 ms

 Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary because of different factors  Physical server, Hypervisor, Storage,... Component overloaded Component bug, crash, deadlock,... Cloud Apps – Issue #2 : Unstability 9 C1 VM1 130 ms C1 VM2 140 ms C1 VM3 100 ms C1 VM4 infinity

 Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary because of different factors  Physical server, Hypervisor, Storage,... Component overloaded Component bug, crash, deadlock,... Failure of C1 on VM4 -> load should be rebalanced Cloud Apps – Issue #2 : Unstability 10 C1 VM1 140 ms C1 VM2 150 ms C1 VM3 130 ms C1 VM4 infinity

 Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary because of different factors  Physical server, Hypervisor, Storage,... Component overloaded Component bug, crash, deadlock,... Failure of C1 on VM4 -> load should be rebalanced Cloud Apps – Issue #2 : Unstability 11 C1 VM1 140 ms C1 VM2 150 ms C1 VM3 130 ms C1 VM4 infinity Application should react early !

 Build for failures –––– Do not trust the underlying infrastructure Do not trust your components either !  Components should adapt to the changing conditions –––––– Quickly Automatically e.g. by replacing a wonky VM by a new one Cloud Apps – Overview 12

Scarce: a framework to build scalable cloud applications

Architecture Overview 14 Agent Server GOSSIPING + BROADCAST Agent A BEBE  An agent on each server / VM working based on Economic approach –––– starts/stops/monitors the components Takes decisions on behalf of the components  An agent communicates with other agents –––– Routing table Status of the server (resources usage) Agent

An economic approach 15  Time is split into epochs At each epoch servers charge a virtual rent for hosting a component according to –––––– Current resource usage (I/O, CPU,...) of the server Technical factors (HW, connectivity,...) Non-technical factors (location)

An economic approach 16  Components –––––– Pay virtual rent at each epoch Gain virtual money by processing requests Take decisions based on balance ( = gain – rent )  Replicate, migrate, suicide, stay  Virtual rents are updated by gossiping (no centralized board)  Time is split into epochs At each epoch servers charge a virtual rent for hosting a component according to –––––– Current resource usage (I/O, CPU,...) of the server Technical factors (HW, connectivity,...) Non-technical factors (location)

Economic model (i) 17 Balance of the component Utility of component Usage % of the server resources by component c Migration threshold Rent paid by component

Economic model (ii) 18  Based on the negative balance a component may migrate or stop  Calculate the availability  If satisfactory, the component stops.  Otherwise, try to find a less expensive server.  Based on the positive balance a component may replicate  Verify that can afford replication  If it can afford replication for consecutive epochs, replicate  Otherwise, continue to run.

Economic model (iii) 19  Choosing a candidate server j during replication/migration of a component i  netbenefit maximization  2 optimization goals :  high-availability by geographical diversity of replicas  low latency by grouping related components  g j : weight related to the proximity of the server location to the geographical distribution of the client requests to the component S i is the set of server hosting a replica of component I Diversity function returns geographical distance among each server pair. 

SLA Performance Guarantees (i) 20  Each component has its own SLA constraints SLA derived directly from entry components  Resp. Time = Service Time + max (Resp. Time of Dependencies) C1 SLA ::500ms C2 C3 C4 C5

SLA Performance Guarantees (ii) 21  SLA propagation from parents to children Parent j sends its performance constraints (e.g. response time upper bound) to its dependencies D(j) : Child i computes its own performance constraints : : group of constraints sent by the replicas of the parent g

SLA Performance Guarantees (iii) 22  SLA propagation from parents to children

Automatic Provisioning 23  Usage of allocated resources is maximized : –––– autonomic migration / replication / suicide of components not enough to ensure end-to-end response time  Each individual component has to satisfy its own SLA –––– SLA easily met -> decrease resources (scale down) SLA not met -> increase resources (scale up, scale out)  Cloud resources managed by framework via cloud API

Adaptivity to slow servers 24  Each component keeps statistics about its children – e.g. 95 th perc. response time  A routing coefficient is computed for each child at each epoch – Send more requests to more performant children

Evaluation

Evaluation: Setup 26  An application composed by 5 different components, mostly CPU-intensive  8 8-cores servers (Intel Core i7 920, 2.67 GHz, 8GB, Linux trunk-amd64) The components interact with the cloud infrastructure through an API Comparison of Scarce model with static approach. C1 SLA ::500ms C2 C3 C4 C5

Adaptation to Varying Load (i) 27  5 rps to 60 rps at minute 8, step 5 rps/min Static setup : 2 servers with 4 cores Fig 9 : Throughput of the application during the varying load experiments Fig 6 Scarce : Resources used by the app. over time for varying request load.

Adaptation to Varying Load (ii) 28  5 rps to 60 rps at minute 8, step 5 rps/min Static setup : 2 servers with 4 cores Fig 7: Mean response times of the application (SLA : 500 ms) as perceived by remote clients under the adaptive approach (“Scarce”) and the static setup. Fig 8: 95th percentile response times of the application (SLA : 500 ms) as perceived by remote clients under Scarce and the static setup.

Adaptation to Slow Server 29  Max 2 cores/server, 25 rps At minute 4, a server gets slower (200 ms delay) Fig 13: Resources used by the application over time in case of a “wonky” server. Fig. 12. Mean and 95th percentile response times of the application (SLA : 500 ms) as perceived by remote clients in case of a “wonky” server.

Scalability 30  Add 5 rps per minute until 150 rps Max 6 cores/server Fig 14: Mean and 95th percentile response times of the application (SLA:500ms) as perceived by remote clients in the scalability experiment. Fig 16: Resources used by the application over time during the scalability experiment. Fig 15: Scarce : Throughput of the application during the scalability experiment.

Conclusion

32  Framework for building cloud applications Elasticity : add/remove resources High Availability : software, hardware, network failures Scalability : growing load, peaks, scaling down,... – Quick replication of busy components  Load Balancing : load has to be shared by all available servers –––––– Replication of busy components Migration of less busy components Reach equilibrium when load is stable  SLA performance guarantees – Automatic provisioning  No synchronization, fully decentralized

Thank you !