OnCall: Defeating Spikes with Dynamic Application Clusters Keith Coleman and James Norris Stanford University June 3, 2003.

Slides:



Advertisements
Similar presentations
1 Efficient Application Placement in a Dynamic Hosting Platform Zakaria Al-Qudah – CWRU Hussein Alzoubi – CWRU Mark Allman – ICSI Michael Rabinovich –
Advertisements

Virtual Memory (II) CSCI 444/544 Operating Systems Fall 2008.
Capacity Planning in a Virtual Environment
1 Vladimir Knežević Microsoft Software d.o.o.. 80% Održavanje 80% Održavanje 20% New Cost Reduction Keep Business Up & Running End User Productivity End.
Xen , Linux Vserver , Planet Lab
11 HDS TECHNOLOGY DEMONSTRATION Steve Sonnenberg May 12, 2014 © Hitachi Data Systems Corporation All Rights Reserved.
A Case for Virtualizing Nodes on Network Experimentation Testbeds Konrad Lorincz Harvard University June 1, 2015June 1, 2015June 1, 2015.
Keith Wiles DPACC vNF Overview and Proposed methods Keith Wiles – v0.5.
1 Virtual Machine Resource Monitoring and Networking of Virtual Machines Ananth I. Sundararaj Department of Computer Science Northwestern University July.
OnCall Defeating Traffic Spikes with a Free-Market Application Cluster James Norris Keith Coleman Armando Fox George Candea Stanford University.
Wade Wegner Windows Azure Technical Evangelist Microsoft Corporation Windows Azure AppFabric Caching.
OnCall Defeating Traffic Spikes with a Free-Market Application Cluster James Norris Keith Coleman Armando Fox George Candea Stanford University.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Virtualization in Data Centers Prashant Shenoy
1© Copyright 2015 EMC Corporation. All rights reserved. SDN INTELLIGENT NETWORKING IMPLICATIONS FOR END-TO-END INTERNETWORKING Simone Mangiante Senior.
Capacity planning for web sites. Promoting a web site Thoughts on increasing web site traffic but… Two possible scenarios…
Jennifer Rexford Princeton University MW 11:00am-12:20pm Wide-Area Traffic Management COS 597E: Software Defined Networking.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
ProjectWise Virtualization Kevin Boland. What is Virtualization? Virtualization is a technique for deploying technologies. Virtualization creates a level.
SOFTWARE AS A SERVICE PLATFORM AS A SERVICE INFRASTRUCTURE AS A SERVICE.
System Center 2012 Setup The components of system center App Controller Data Protection Manager Operations Manager Orchestrator Service.
Virtual Desktops and Flex CSU-Pueblo Joseph Campbell.
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
26/4/2001VMware - HEPix - LAL 2001 Windows/Linux Coexistence : VMware Approach HEPix – LAL Apr Michel Jouvin
Introduction To Windows Azure Cloud
Module 13: Network Load Balancing Fundamentals. Server Availability and Scalability Overview Windows Network Load Balancing Configuring Windows Network.
Department of Computer Science Engineering SRM University
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
How to Resolve Bottlenecks and Optimize your Virtual Environment Chris Chesley, Sr. Systems Engineer
#devshark welcome to #devshark. #devshark HELLO! I’M Ville Rauma Fingersoft Product Owner Web
Distributed File Systems
Kinshuk Govil, Dan Teodosiu*, Yongqiang Huang, and Mendel Rosenblum
Orbited Scaling Bi-directional web applications A presentation by Michael Carter
Session objectives Discuss whether or not virtualization makes sense for Exchange 2013 Describe supportability of virtualization features Explain sizing.
Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.
Server Virtualization
Cosc 4750 Maintenance & Analysis. Maintenance Contracts Annual cost of 10%-12% of component’s list price. On-site maintenance –usually within hours.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
1 Virtual Machine Memory Access Tracing With Hypervisor Exclusive Cache USENIX ‘07 Pin Lu & Kai Shen Department of Computer Science University of Rochester.
Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Chapter 11 Extending LANs 1. Distance limitations of LANs 2. Connecting multiple LANs together 3. Repeaters 4. Bridges 5. Filtering frame 6. Bridged network.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Modeling Virtualized Environments in Simalytic ® Models by Computing Missing Service Demand Parameters CMG2009 Paper 9103, December 11, 2009 Dr. Tim R.
1 Agility in Virtualized Utility Computing Hangwei Qian, Elliot Miller, Wei Zhang Michael Rabinovich, Craig E. Wills {EECS Department, Case Western Reserve.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Scalability == Capacity * Density.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
EuroSys Doctoral Workshop 2011 Resource Provisioning of Web Applications in Heterogeneous Cloud Jiang Dejun Supervisor: Guillaume Pierre
Memory Resource Management in VMware ESX Server By Carl A. Waldspurger Presented by Clyde Byrd III (some slides adapted from C. Waldspurger) EECS 582 –
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
BIG DATA/ Hadoop Interview Questions.
Server Consolidation in Clouds through Gossiping Moreno MarzollaOzalp Babaoglu Fabio Panzieri Università di Bologna, Dip. di Scienze dell'Informazione.
Network Topology and LAN Technologies
Bentley Systems, Incorporated
Diskpool and cloud storage benchmarks used in IT-DSS
Build a low-touch, highly scalable cloud with IBM SmartCloud Provisioning Academic Initiative © 2011 IBM Corporation.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Customer Profile (Target)
Microsoft Azure P wer Lunch
Dependability Evaluation and Benchmarking of
Cloud computing mechanisms
AWS Cloud Computing Masaki.
UNIT IV RAID.
ICSOC 2018 Adel Nadjaran Toosi Faculty of Information Technology
Internet and Web Simple client-server model
Specialized Cloud Architectures
Virtual Memory: Working Sets
Monitor VMware with SC2012 SP1 Operation Manager & Veeam Microsoft Tools for VMware Integration & Migration Symon Perriman Michael Stafford Senior.
Presentation transcript:

OnCall: Defeating Spikes with Dynamic Application Clusters Keith Coleman and James Norris Stanford University June 3, 2003

Problem & Motivation Today’s static clusters can’t respond to flash crowds Adding new machines to handle spikes is slow and difficult Serving all spikes quickly requires huge excess capacity => Expensive Excess capacity can’t be shared between applications

The OnCall Solution Hypothesis: A dynamic application cluster can handle spikes more effectively than a static cluster (and at lower cost, to boot) –Dynamic application cluster based on VMs for generality and legacy support Current version is application generic –Cluster runs many applications, sharing capacity among them

Platform: Overview Physical Machines running VMware (plus Daemons and Applications) Scheduler Node L7 Load Balancers Network- Attached Storage

# Goals Allow same or better performance as same size static cluster Allow higher max capacity per app (to handle spikes) Maintain performability isolation –Spiking app shouldn’t hurt others

Key Innovation: Scheduling VM Scheduling –Don’t want thrashing with 4 GB VMs (hard to avoid) –Solution: Double-Hysteresis Based on OS concepts in ESX Server –But clusters are different than OSes => forced innovations Keep 3 performance averages –Slow, Fast averages + Future (linear) prediction –Pick max of these 3 Market-based Idle-tax approach –Tax idle resources 75% => not proportional (gives running apps benefit of the doubt) –“Shares” allow prioritization (and business) –Plus, guaranteed computers At least g-min computers at all time Up to g-max computers guaranteed if needed Beyond that, up to the market

Measuring App Load For scheduling to work, need a good measurement of app load 3 Options: –OS statistics. Base on disk/ram/cpu measurements from OS Definitely inaccurate Pipeline problem –End-to-End. Measure end-to-end response rate against typical rate Doesn’t demonstrate usage on cluster node (could be on backend DB) Very application specific (and OnCall is app generic) –Meter. Time standard benchmark on each host Maybe not entirely accurate But best because measuring resource usage but not reliant on OS reporting

Copying vs. Preboot Originally were going to test both in production system, but copying overhead in prototypes was too high (~ 3-4 mins) Preboot or copy-on-demand much better

Alternate Apps Test

Spiking App Test

The End Questions?

EXTRA SLIDES Extra Slides From Computer Forum Poster Session

Platform: Scheduler Scheduler activates VMs/apps to handle traffic Scheduler runs as “just another application” in the cluster –Improves fault tolerance Gathers resource usage data from node Daemons

Platform: Daemon One Daemon per physical machine Launches and retires VM/apps Monitors resource usage for reporting to Scheduler Daemons and Scheduler run on a lease-based system –If contact with Scheduler is lost (e.g. leases not renewed), Scheduler kills self, Daemons start new instance of scheduler

Platform: “Collective” Base OnCall system based on Stanford’s “Collective” project Collective provides primitives for treating VMs as first class objects Key features include: –Demand Paging – block-grain transfer of VM images => instant startup –Prefetching – sends blocks ahead of time, based on experienced demand

Spike Response Strategies Copy VM/Application On-Demand Full VM/application images copied to new physical boxes, replacing other apps running on those boxes, in direct response to traffic changes Upside: consumes no extra resources Downside: potentially slow Pre-boot VMs VMs are pre-booted and simply signaled when traffic loads demanded their use. Upside: fast Downside: consumes resources Prepare based on historic data When a spike begins, resource manager begins copying and/or pre- booting based on historic spike data Upside: may increase response speed Downside: potential complexity increase

Platform: Network NAT and virtual routers used to hide physical network topology from VMs Also provides: –Isolation between applications (they are on separate virtual networks) –Support for legacy applications

Metrics: Spike Handling To test hypothesis, will measure: –Maximum Single Application Throughput (as compared with static cluster) –Spike Response Delay –Inter-application Performability Isolation

Metrics: Specific Measures The above general metrics will be obtained through low-level measurements: –End-to-End Throughput (as measured by application-level requests) –Request Latency –Dropped Requests

Metrics: Test Scenarios Base Setup: Cluster shared between two or more applications Simulation: Load spike on one application Simulation: Time-varying application demand Simulation: Induced node failures during spike