Scalability By Alex Huang. Current Status 10k resources managed per management server node Scales out horizontally (must disable stats collector) Real.

Slides:



Advertisements
Similar presentations
Case Study: Photo.net March 20, What is photo.net? An online learning community for amateur and professional photographers 90,000 registered users.
Advertisements

Capacity Planning in a Virtual Environment
Operating System.
CloudStack Scalability Testing, Development, Results, and Futures Anthony Xu Apache CloudStack contributor.
Chapter 4 Infrastructure as a Service (IaaS)
Implementing A Simple Storage Case Consider a simple case for distributed storage – I want to back up files from machine A on machine B Avoids many tricky.
Apache CloudStack Evolution Proposal Alex Huang Software Architect, Citrix Systems.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 16 – Intro. to Transactions.
Understand Database Backups and Restore Database Administration Fundamentals LESSON 5.2.
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
Memory Management 2010.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
ABC Co. Network Implementation High reliability is primary concern – near 100% uptime required –Customer SLA has stiff penalty clauses –Everything is designed.
An Introduction to Cloud Computing. The challenge Add new services for your users quickly and cost effectively.
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
Module 15: Monitoring. Overview Formulate requirements and identify resources to monitor in a database environment Types of monitoring that can be carried.
What is the output generated by this program? Please assume that each executed print statement completes, e.g., assume that each print is followed by an.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Offline Performance Monitoring for Linux Abhishek Shukla.
How to Resolve Bottlenecks and Optimize your Virtual Environment Chris Chesley, Sr. Systems Engineer
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
- Tausief Shaikh (Senior Server developer). Introduction Covers sense of responsibility towards Project development in IT Focusing on memory and CPU utilizations.
KISTI’s Activities on the NA4 Biomed Cluster Soonwook Hwang, Sunil Ahn, Jincheol Kim, Namgyu Kim and Sehoon Lee KISTI e-Science Division.
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
#devshark welcome to #devshark. #devshark HELLO! I’M Ville Rauma Fingersoft Product Owner Web
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Sun Grid Engine. Grids Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
CHAPTER 2: COMPUTER-SYSTEM STRUCTURES Computer system operation Computer system operation I/O structure I/O structure Storage structure Storage structure.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Software Performance Testing Based on Workload Characterization Elaine Weyuker Alberto Avritzer Joe Kondek Danielle Liu AT&T Labs.
Developments in networked embedded system technologies and programmable logic are making it possible to develop new, highly flexible data acquisition system.
Some Design Notes Iteration - 2 Method - 1 Extractor main program Runs from an external VM Listens for RabbitMQ messages Starts a light database engine.
DONE-08 Sizing and Performance Tuning N-Tier Applications Mike Furgal Performance Manager Progress Software
Remote Objects. The Situation Is there a better (in terms of programmer time) way to do network communications? What is it we’re trying to accomplish?
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Mission Critical Application Architecture and Flash August MDCFUG Chafic Kazoun, Founder and CTO Atellis: | Weblog:
 Load balancing is the process of distributing a workload evenly throughout a group or cluster of computers to maximize throughput.  This means that.
Moving Web Apps From Synchronous to Asynchronous Processing Jason Carreira Architect, ePlus Systems OpenSymphony member.
Scale Fail or, how I learned to stop worrying and love the downtime.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
Memory management.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 16 – Intro. to Transactions.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
BizTalk Throttling and Threshold
Operating System.
Reusing old features to build new ones
Discussions on group meeting
#01 Client/Server Computing
Main Memory Management
Cloud Computing Architecture
Sun Grid Engine.
Performance And Scalability In Oracle9i And SQL Server 2000
Client/Server Computing and Web Technologies
IIS and .NET Security Application Pools Pamella Smith June 18, 2009.
#01 Client/Server Computing
Presentation transcript:

Scalability By Alex Huang

Current Status 10k resources managed per management server node Scales out horizontally (must disable stats collector) Real production deployment of tens of thousands of resources Internal testing with software simulators up to 30k physical resources with 30k VMs managed by 4 management server nodes We believe we can at least double that scale per management server node

Balancing Incoming Requests Each management server has two worker thread pools for incoming requests: effectively two servers in one. – Executor threads provided by tomcat – Job threads waiting on job queue All incoming requests that requires mostly DB operations are short in duration and are executed by executor threads because incoming requests are already load balanced by the load balancer All incoming requests needing resources, which often have long running durations, are checked against ACL by the executor threads and then queued and picked up by job threads. # of job threads are scaled to the # of DB connections available to the management server Requests may take a long time depending on the constraint of the resources but they don’t fail.

The Much Harder Problem CloudStack performs a number of tasks on behalf of the users and those tasks increases with the number of virtual and physical resources available – VM Sync – SG Sync – Hardware capacity monitoring – Virtual resource usage statistics collection – More to come When done in number of hundreds, no big deal. As numbers increase, this problem magnifies. How to scale this horizontally across management servers?

Comparison of two Approaches Stats Collector – collects capacity statistics – Fires every five minutes to collect stats about host CPU and memory capacity – Smart server and dumb client model: Resource only collects info and management server processes – Runs the same way on every management server VM Sync – Fires every minute – Peer to peer model: Resource does a full sync on connection and delta syncs thereafter. Management server trusts on resource for correct information. – Only runs against resources connected to the management server node

Numbers Assume 10k hosts and 500k VMs (50 VMs per host) Stats Collector – Fires off 10k requests every 5 minutes or 33 requests a second. – Bad but not too bad: Occupies 33 threads every second. – But just wait: 2 management servers: 66 requests 3 management servers: 99 requests – It gets worse as # of management servers increase because it did not auto-balance across management servers – Oh but it gets worse still: Because the 10k hosts is now spread across 3 management servers. While it’s 99 requests generated, the number of threads involved is three-fold because requests need to be routed to the right management server. – It keeps the management server at 20% busy even at no load from incoming requests VM Sync – Fires off 1 request at resource connection to sync about 50 VMs – Then, push from resource as resource knows what it has pushed before and only pushes changes that are out-of-band. – So essentially no threads occupied for a much larger data set.

What’s the Down Side? Resources must reconcile between VM states caused by management server commands and VM states it collects from the physical hardware so it requires more CPU Resources must use more memory to keep track of what amounts to a journal of changes since the last sync point. But data centers are full of these two resources.

Resource Load Balancing As management server is added into the cluster, resources are rebalanced seamlessly. – MS2 signals to MS1 to hand over a resource – MS1 wait for the commands on the resources to finish – MS1 holds further commands in a queue – MS1 signals to MS2 to take over – MS2 connects – MS2 signals to MS1 to complete transfer – MS1 discards its resource and flows the commands being held to MS2 Listeners are provided to business logic to listen on connection status and adjusts work based on who’s connected. By only working on resources that are connected to the management server the process is on, work is auto-balanced between management servers. Also reduces the message routing between the management servers.

Designing for Scalability Take advantage of the most abundant resources in a data center (CPU, RAM) Auto-scale to the least abundant resource (DB) Do not hold DB connections/Transactions across resource calls. – Use lock table implementation (Merovingian2 or GenericDao.acquireLockInTable() call) over database row locks in this situation. – Database row locks are still fine quick short lock outs. Balance the resource intensive tasks as # of management server nodes increases and decreases – Use job queues to balance long running processes across management servers – Make use of resource rebalancing in CloudStack to auto-balance your world load.

Reliability By Alex Huang

The Five W’s of Unreliability What is unreliable? Everything Who is unreliable? Developers & administrators When does unreliability happen? 3:04 a.m. no matter which time zone… Any time. Where does unreliability happen? In carefully planned, everything has been considered data centers. How does unreliability happen? Rather nonchalantly

Dealing with Unreliability Don’t assume! Don’t bang your head against the wall! Know when you don’t know any better. Ask for help!

Designs against Unreliability Management Servers keeps an heartbeat with the DB. One ping a minute. Management Servers self-fences if it cannot write the heartbeat Other management servers wait to make sure the down management server is no longer writing to the heartbeat and then signal interested software to recover Check points at every call to a resource and code to deal with recovering from those check points Database records are not actually deleted to help with manual recovery when needed Write code that is idempotent Respect modularity when writing your code