Zhen Xiao, Qi Chen, and Haipeng Luo May 2013 Automatic Scaling of Internet Applications for Cloud Computing Services Zhen Xiao, Qi Chen, and Haipeng Luo May 2013 To appear in IEEE Transactions on Computers
Outlines 1. Introduction 2. Related Work 3. System Architecture 4. Our Design 5. Simulations 1. Introduction 6. Experiments 7. Conclusion
Outlines 1. Introduction 2. Related Work 3. System Architecture 4. Our Design 5. Simulations 1. Introduction 6. Experiments 7. Conclusion
Benefits of Cloud Computing Auto-scaling: Cloud computing allows business customers to scale up and down their resource usage based on needs L7 Switch Hypervisor Hypervisor Hypervisor Hypervisor
When and where to start a VM for an application? Myth about Cloud Computing Myth #1: Cloud computing provides infinite resource on demand --- Reality: just statistical multiplexing L7 Switch Hypervisor Hypervisor Hypervisor Hypervisor When and where to start a VM for an application?
Goals of Scheduling Achieve good demand satisfaction The percentage of application demand that is satisfied should be maximized when a large number of applications experience their peak demand around the same time. Support green computing the number of servers used should be minimized as long as they can still satisfy the needs of all VMs. Idle servers can be turned off to save energy.
Goals of Scheduling
Outlines 1. Introduction 2. Related Work 3. System Architecture 4. Our Design 5. Simulations 1. Introduction 6. Experiments 7. Conclusion
Related Work Auto scaling in Amazon EC2 -- Scalr Google AppEngine For one application Load balancing Google AppEngine Support Java & Python Secure sandbox environment has strict limitations Can not support existing applications Microsoft Windows Azure Applications should be stateless Users maintain the number of Instances
Outlines 1. Introduction 2. Related Work 3. System Architecture 4. Our Design 5. Simulations 1. Introduction 6. Experiments 7. Conclusion
System Architecture … Requests Switch Application Scheduler Plugin Appl info Appl info Load distribution Reqs Instance list Algorithm adjustment Appl placement Dispatcher Request counter Monitor Usher CTRL Dom 0 Dom U Dom 0 Dom U Dom 0 Dom U Usher LNM Usher LNM … Usher LNM Xen Hypervisor Xen Hypervisor Xen Hypervisor
Fast start Complicated applications can take a long time to finish all the initializations (several minutes) Suspend && Resume Resumption time is independent of the start up time It depends on how fast the server can read the suspended VM file from the disk, which is quite short (several seconds) with modern disk technology Start up time is reduced by 70% for a VM with 1G memory Memory file VM1 VM1 Disk VM1 VM2
Green computing Put idle servers into standby mode so that they can be waken up quickly in an on-demand manner TPC-W workloads Fully utilized server consumes about 205 Watts Idle server consumes about 130 Watts Server in standby mode consumes about 20 Watts Putting idle server into standby mode save 85% energy Wake-on-LAN (WOL) technology Standby to active transition time is 1-2 seconds Suspend (in ram) to active transition time is 3-5 seconds
Outlines 1. Introduction 2. Related Work 3. System Architecture 4. Our Design 5. Simulations 1. Introduction 6. Experiments 7. Conclusion
Problem definition
Our design Bin packing Class Constrained Bin-Packing Problem (CCBP) Items with different sizes are packed into a min number of bins Class Constrained Bin-Packing Problem (CCBP) The size of each item is one unit Each bin has capacity v Items are divided into classes Each bin can accommodate items from at most c distinct classes Model our problem as CCBP Each server is a bin Each class represents an application Items from a specific class represent the resource demands The capacity of a bin is the amount of resource at a server Class constraint: the max number of appls a server can run simultaneously
Our design App 3 c = 2 v = 5 App 3 App 1 App 3 App 1 App 3 App 1 App 2 Server1 Server2
Our design Enhanced color set algorithm All sets contain exactly c colors except the last one Items from different color sets are packed independently using a greedy algorithm Resource needs of appls can vary with time Applications can join and leave A key observation of our work: Not all item movements are equally expensive Creating a new application instance is expensive Adjusting the load distribution is cheaper
Demand varies with time Load increase: arrivals of new items c = 4 v = 5 App 2 App 3 App 2 App 1 App 3 App 3 App 1 App 2 App 2 App 3 App 3 App 1 App 1 App 2 App 3 App 3 App 1 App 1 App 2 App 3 bin1 bin2 unfilled bin
Demand varies with time Load decrease: departure of already packed items c = 4 v = 5 App 2 App 3 App 2 App 1 App 3 App 3 App 1 App 2 App 2 App 3 App 3 App 1 App 1 App 3 App 2 App 3 App 1 App 1 App 2 App 3 bin1 bin2 unfilled bin
Mathematical analysis R is determined mostly by c * t (the total load of all appls in a color set)
Practical considerations Server equivalence class Divide the servers into “equivalence classes” based on their hardware settings Run our algorithm within each equivalence class Periodical execution Optimizations Each color set has at most one unfilled bin Use them to satisfy the appls whose demands are not completely satisfied
Outlines 1. Introduction 2. Related Work 3. System Architecture 4. Our Design 5. Simulations 1. Introduction 6. Experiments 7. Conclusion
Simulations
1000 servers and 1000 applications. Appl demand ratio 1000 servers and 1000 applications.
Scalability Increase both #server and #appl from 1000 to 10,000
Appl number Fix #server to 1000, vary #appl from 200 to 2000
Outlines 1. Introduction 2. Related Work 3. System Architecture 4. Our Design 5. Simulations 1. Introduction 6. Experiments 7. Conclusion
Experiments 30 Dell PowerEdge servers with Intel E5620 CPU (8 cores) and 24GB of RAM run xen-4.0 and linux 2.6.18 Web applications: Apache servers serving CPU intensive PHP scripts Clients: httperf to invoke the PHP scripts
Load shifting
Auto scaling Green Computing Flash Crowd
Auto scaling Compared with the Scalr (Amazon EC2) Flash Crowd
Auto scaling Our algorithm restores to the normal QoS in less than 5 min. While Scalr still suffers much degraded performance even after 25 min.
Outlines 1. Introduction 2. Related Work 3. System Architecture 4. Our Design 5. Simulations 1. Introduction 6. Experiments 7. Conclusion
Conclusion We presented the design and implementation of a system that can scale up and down the number of application instances automatically based on demand An enhanced color set algorithm to decide the application placement and the load distribution Achieve high satisfaction ratio even when the load is very high Save energy by reducing the number of running instances when the load is low
Thank You!
Application joins and leaves Application leaves Load decrease: demand → 0 Shutdown the instances Remove the color from the set Application joins Sort the unfilled color sets: #color↓ Use a greedy algorithm to add new colors into those sets If #unfilled set > 1, then Sort the unfilled color sets: #color ↓ Use the last set in the list to fill the first set Repeat until #unfilled set <= 1 If #new color > 0, then Partition them into additional color sets
Scheduling Model App 2 App 2 App 3 App 2 App 1 Server1 Server2 Server3