CloudStack Scalability Testing, Development, Results, and Futures Anthony Xu Apache CloudStack contributor.

Slides:



Advertisements
Similar presentations
Software Defined Networking in Apache CloudStack
Advertisements

Cloud computing is used to describe a variety of computing concepts that involve a large number of computers connected through a real-time communication.
2  Industry trends and challenges  Windows Server 2012: Beyond virtualization  Complete virtualization platform  Improved scalability and performance.
HetnetIP Ethernet BackHaul Configuration Automation Demo.
System Center 2012 R2 Overview
Apache CloudStack Evolution Proposal Alex Huang Software Architect, Citrix Systems.
“It’s going to take a month to get a proof of concept going.” “I know VMM, but don’t know how it works with SPF and the Portal” “I know Azure, but.
4/16/2017 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
BETA!BETA! Building a secure private cloud on Microsoft technologies Private cloud security concerns Security & compliance in a Microsoft private cloud.
DESIGN CONSIDERATIONS OF A GEOGRAPHICALLY DISTRIBUTED IAAS CLOUD ARCHITECTURE CS 595 LECTURE 10 3/20/2015.
Introduction to DoC Private Cloud
Installing and Setting up mongoDB replica set PREPARED BY SUDHEER KONDLA SOLUTIONS ARCHITECT.
Name Title Microsoft Windows Azure: Migrating Web Applications.
Getting Started with Oracle Compute Cloud
VM Role (PaaS)Virtual Machine (IaaS) StorageNon-Persistent StoragePersistent Storage Easily add additional storage DeploymentBuild VHD offsite and upload.
Memi Lavi Senior Consultant MCS, Microsoft Israel Self Service Private Cloud With Windows Azure Pack.
Additional SugarCRM details for complete, functional, and portable deployment.
Data Center Network Redesign using SDN
Scalability By Alex Huang. Current Status 10k resources managed per management server node Scales out horizontally (must disable stats collector) Real.
Opensource for Cloud Deployments – Risk – Reward – Reality
Yury Kissin Infrastructure Consultant Storage improvements Dynamic Memory Hyper-V Replica VM Mobility New and Improved Networking Capabilities.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 7 2/23/2015.
DevCloud and CloudMonkey in Apache CloudStack
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.
608D CloudStack 3.0 Omer Palo Readiness Specialist, WW Tech Support Readiness May 8, 2012.
DCIM211. By 2015, 35% of enterprise IT expenditures for most organizations will be managed outside the IT department’s budget IT budget.
Stu Fox Datacom Systems Ltd. ON-PREMISES SERVICE PROVIDERMICROSOFT CONSISTENT PLATFORM Modern platform for the world’s apps 1.
Uwe Lüthy Solution Specialist, Core Infrastructure Microsoft Corporation Integrated System Management.
Server Performance, Scaling, Reliability and Configuration Norman White.
Microsoft Virtual Academy. STANDARDIZATION SELF SERVICEAUTOMATION Give Customers of IT services the ability to identify, access and request services.
ON-PREMISES SERVICE PROVIDERMICROSOFT CONSISTENT PLATFORM Modern platform for the world’s apps 1.
Visual Studio Windows Azure Portal Rest APIs / PS Cmdlets US-North Central Region FC TOR PDU Servers TOR PDU Servers TOR PDU Servers TOR PDU.
Windows Azure Virtual Machines Anton Boyko. A Continuous Offering From Private to Public Cloud.
Manage your cloud with Citrix CloudPortal Services Manager 10 Jared Engskow Senior Technical Readiness Specialist May 8, 2010.
Stairway to the cloud or can we take the highway? Taivo Liik.
Cloud Computing is a Nebulous Subject Or how I learned to love VDF on Amazon.
Windows ® Azure ™ Platform. Network Architecture Packet Filtering Built-In Firewalls Connect Service SSL WCF Security Agenda.
Nagender Vedula & Bradley Bartz ON-PREMISES SERVICE PROVIDERMICROSOFT CONSISTENT PLATFORM Modern platform for the world’s apps 1.
Scalability == Capacity * Density.
Enabling the Cloud OS Today  New high-density Web Sites with elastic cloud scaling and complete dev-ops experiences  New rich IaaS experience for self-service.
Deploying Highly Available SQL Server in Windows Azure A Presentation and Demonstration by Microsoft Cluster MVP David Bermingham.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Brian Lauge Pedersen Senior DataCenter Technology Specialist Microsoft Danmark.
Architecting Enterprise Workloads on AWS Mike Pfeiffer.
Azure.
Apache CloudStack An Introduction Kevin Kluge
Lead SQL BankofAmerica Blog: SQLHarry.com
Network Load Balancing
Windows Azure Pack : Express Installation
Chapter 21: Cloud Computing and Related Security Issues
Red Hat User Group June 2014 Marco Berube, Cloud Solutions Architect
Chapter 22: Cloud Computing Technology and Security
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
Azure.
Migration Strategies – Business Desktop Deployment (BDD) Overview
Managing Clouds with VMM
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Outline Virtualization Cloud Computing Microsoft Azure Platform
Cloud computing mechanisms
Cloud Computing Architecture
A Network Operating System Edited By Maysoon AlDuwais
Cloud Computing: Concepts
Azure Container Service
Microsoft Virtual Academy
Setting up PostgreSQL for Production in AWS
PayPal Cloud Journey & Architecture
06 | SQL Server and the Cloud
Presentation transcript:

CloudStack Scalability Testing, Development, Results, and Futures Anthony Xu Apache CloudStack contributor

Secure, multi-tenant cloud orchestration platform –Turnkey platform for delivering IaaS clouds –Hypervisor agnostic –Highly scalable, secure and open –Complete Self-service portal –Open source, open standards –Deploys on premise Apache CloudStack: a project in incubation

Router L3 Core Switch Top of Rack Switch … … … … … Availability Zone 1 Servers Management Server Cluster Object Storage Pod 1 Pod 2 Pod 3 Pod N Primary MySQL Load Balancer Admin Internet Backup MySQL Manage hosts, create VMs, virtual disks, virtual networks, meter usage, ….

Thinking about cloud orchestration at scale Host management Capacity management What host to use to deploy a new VM Failure handling Security group propagation Set a goal

We can’t afford this as our QA lab

User API Admin API Load Balancer Mgmt. Server MySQL Zone Simulator MySQL Simulator enables scale testing Mgmt. Server

User API Admin API Load Balancer Mgmt. Server MySQL Zone Simulator MySQL Environment Mgmt. Server 2 cores, 4 with Hyper Threading. 2.2 GHz Xeon. 16 GB RAM. 12 GB JVM Heap. Single spinning disk, later single SSD. 32 GB RAM. MySQL 5.5.

Allocator performance is awful with 1000 hosts Two minutes to decide which host to use for a new VM! Computing capacity for every pod repeatedly Fixed that, but still 12 seconds to decide Use host tags, down to 2 seconds Major changes required to improve further In 2.2.0, store capacity info in DB, skip pod altogether Harness the power of SQL select and all is well

Polling doesn’t scale TRUE?FALSE? Sometimes, it is good enough

Host management Check host state via TCP connection Check every minute 30,000 checks per minute, 500 per second But they take 10 seconds, so 5000 in parallel Not using async I/O so 5000 threads required… Single JVM can support threads so this is concerning but may not be the limiting factor

Host management What is the maximum feasible JVM heap size? Some people use heaps with hundreds of GB Commercial tools can help, but cost We decided to stay below 20 GB (GC concerns) How much CPU is required for background processing?

CPU utilization while deploying 30,000 VMs on 30,000 hosts CPU Utilization. 400% is maximum Time 20, Idle

Deploy time from 25,000 to 30,000 VMs Seconds to deploy VM number: 25,000 plus X

Problem: agent load balancing Management servers start/stop/fail/crash How do newly started Management Servers get agents / work? When a Management Server exits, how do others pick up its load? When new hosts are added how is the load distributed? Mgmt Server 1 Mgmt Server 2 Agent 3Agent 4Agent 5Agent 6Agent 1Agent 2

Common use case timings at scale 30,000 hosts and 4 Management Servers 4 Management Servers running, 1 fails: 10 minutes to redistribute 7500 agents 3 Management Servers running, add a fourth: 40 minutes to redistribute load evenly 0 Management Servers running, start all 4 simultaneously: 16 minutes to connect to all 30,000 hosts IMPORTANT

… DB Security Group Web Security Group Understanding security groups …… Web VM DB VM Web VM DB VM Web VM Ingress Rule: Allow VMs in Web Security Group access to VMs in DB Security Group on Port 3306

L3 isolation with distributed firewalls Tenant 1 VM Tenant 2 VM Tenant 1 VM Public Internet Public IP address Load Balancer L3 Core Pod 1 L2 Switch Pod 3 L2 Switch … … Pod 2 L2 Switch

L3 isolation with distributed firewalls Tenant 1 VM Tenant 2 VM Tenant 1 VM Tenant 1 VM Tenant 1 VM Public Internet Public IP address Load Balancer L3 Core Pod 1 L2 Switch Pod 3 L2 Switch … … Pod 2 L2 Switch

L3 isolation with distributed firewalls Tenant 1 VM Tenant 2 VM Tenant 1 VM Tenant 2 VM Tenant 2 VM Tenant 1 VM Tenant 1 VM Public Internet Public IP address Load Balancer L3 Core Pod 1 L2 Switch Pod 3 L2 Switch … … Pod 2 L2 Switch

1 Firewall per Virtual Machine

VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … VMVM VMVM VMVM … One million firewalls?

Well-known software scaling techniques Message queues Consistency tradeoffs Idempotent configuration & retries CloudStack uses Special purpose queues Optimized for large security groups Eventual consistency for rule updates Orchestrating hundreds of thousands of firewalls

Problem: firewall rules explosion in dom0 -A FORWARD -m tcp –p tcp –dport 3060 –src – j ACCEPT -A FORWARD -m tcp –p tcp –dport 3060 –src – j ACCEPT -A FORWARD -m tcp –p tcp –dport 3060 –src – j ACCEPT -A FORWARD -m tcp –p tcp –dport 3060 –src – j ACCEPT … Performance suffers for large security groups Allow Security Group {Web} on TCP port 3060

ipset –N web_sg iptreemap ipset –A web_sg ipset –A web_sg ipset –A web_sg ipset –A web_sg A FORWARD –p tcp –m tcp –dport 3060 –m set –match-set web_sg src -j ACCEPT … Fix with ipsets: Problem: firewall rules explosion in dom0 See also

Security group propagation time Seconds to fully synced Number of VMs in security group

Problem: database connection management Scale testing resulted in several “too many open connections” errors from MySQL Common problem: holding open connections while doing long-running operations Took some code clean up and refactoring No longer an issue 10,000 connections are OK CloudStack is far below that

DB connections per MS while deploying 30,000 VMs Number of DB connections Time 20,000 5,000

Other considerations (beyond control plane) Network design and devices Object store scalability Per-host and cluster scalability Storage Understand your workload

Future work Improve simulator accuracy Publish results of advanced network (VLAN) testing Verify assumption of VM density not impacting scale

More information and joining the project Project web site: Mailing lists: Scalability study:

Q&A