Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical.

Slides:

Advertisements

Similar presentations

Capacity Planning in a Virtual Environment

Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.

Towards Autonomic Adaptive Scaling of General Purpose Virtual Worlds Deploying a large-scale OpenSim grid using OpenStack cloud infrastructure and Chef.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Enabling High-level SLOs on Shared Storage Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica Cake 1.

Reciprocal Resource Fairness: Towards Cooperative Multiple-Resource Fair Sharing in IaaS Clouds School of Computer Engineering Nanyang Technological University,

CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.

Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.

Towards Energy Efficient Hadoop Wednesday, June 10, 2009 Santa Clara Marriott Yanpei Chen, Laura Keys, Randy Katz RAD Lab, UC Berkeley.

Towards Energy Efficient MapReduce Yanpei Chen, Laura Keys, Randy H. Katz University of California, Berkeley LoCal Retreat June 2009.

© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Automated Workload Management in.

1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.

EA and IT Infrastructure - 1© Minder Chen, Stages in IT Infrastructure Evolution Mainframe/Mini Computers Personal Computer Client/Sever Computing.

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

1 Integrating a Network IDS into an Open Source Cloud Computing Environment 1st International Workshop on Security and Performance in Emerging Distributed.

1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.

Abstract Cloud data center management is a key problem due to the numerous and heterogeneous strategies that can be applied, ranging from the VM placement.

A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.

A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Report ： Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.

Middleware Enabled Data Sharing on Cloud Storage Services Jianzong Wang Peter Varman Changsheng Xie 1 Rice University Rice University HUST Presentation.

Naixue GSU Slide 1 ICVCI’09 Oct. 22, 2009 A Multi-Cloud Computing Scheme for Sharing Computing Resources to Satisfy Local Cloud User Requirements.

Introduction To Windows Azure Cloud

Department of Computer Science Engineering SRM University

How to Resolve Bottlenecks and Optimize your Virtual Environment Chris Chesley, Sr. Systems Engineer

Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.

Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.

Database Replication Policies for Dynamic Content Applications Gokul Soundararajan, Cristiana Amza, Ashvin Goel University of Toronto EuroSys 2006: Leuven,

OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

Improving Network I/O Virtualization for Cloud Computing.

EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.

Autonomic SLA-driven Provisioning for Cloud Applications Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer Presented by Ismail Alan.

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.

BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.

What is the cloud ? IT as a service Cloud allows access to services without user technical knowledge or control of supporting infrastructure Best described.

THE LITTLE ENGINE(S) THAT COULD: SCALING ONLINE SOCIAL NETWORKS B 圖資三謝宗昊.

Automated Control in Cloud Computing: Challenges and Opportunities Harold C. Lim, Shivnath Babu, Jeffrey S. Chase, and Sujay S. Parekh ACM’s First Workshop.

The New Zealand Institute for Plant & Food Research Limited Use of Cloud computing in impact assessment of climate change Kwang Soo Kim and Doug MacKenzie.

Server Virtualization

Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Scale up Vs. Scale out in Cloud Storage and Graph Processing Systems

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Managing Web Server Performance with AutoTune Agents by Y. Diao, J. L. Hellerstein, S. Parekh, J. P. Bigus Presented by Changha Lee.

EuroSys Doctoral Workshop 2011 Resource Provisioning of Web Applications in Heterogeneous Cloud Jiang Dejun Supervisor: Guillaume Pierre

Load Rebalancing for Distributed File Systems in Clouds.

Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer

Scientific days, June 16 th & 17 th, 2014 This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX ) funded by the French program.

Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.

Online Parameter Optimization for Elastic Data Stream Processing Thomas Heinze, Lars Roediger, Yuanzhen Ji, Zbigniew Jerzak (SAP SE) Andreas Meister (University.

Workload Distribution Architecture

Diskpool and cloud storage benchmarks used in IT-DSS

Distributed Network Traffic Feature Extraction for a Real-time IDS

StratusLab Final Periodic Review

StratusLab Final Periodic Review

Clinton A Jones Eastern Kentucky University Department of Technology

Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.

Elastic Consistent Hashing for Distributed Storage Systems

Overview Introduction VPS Understanding VPS Architecture

20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.

AWS Cloud Computing Masaki.

ICSOC 2018 Adel Nadjaran Toosi Faculty of Information Technology

Cloud Computing Architecture

Specialized Cloud Architectures

Cloud Computing Architecture

Presentation transcript:

Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical and Computer Engineering, University of Florida 1

Outline Introduction System overview System architecture and modeling methodologies Evaluation Contribution and related work Discussions and future work 2

Outline Introduction System overview System architecture and modeling methodologies Evaluation Contribution and related work Discussions and future work 3

Introduction - Popularity of highly dynamic workloads Many web-based services (especially Web 2.0) often experience rapid load surges and drops. One Facebook application saw an increase from 25,000 to 250,000 users in 3 days, with up to 20,000 new users signing up per hour during peak times. Elastic services offered by cloud computing becomes one solution Grow/shrink service capacity dynamically as the load changes. 4

Introduction - Elasticity in cloud computing Elasticity is one of cloud computing’s greatest features – Systems acquire and release resources in response to users’ dynamic workloads; users only pay for what they need. SLAs Web Services Virtualization Picture provided by Dr. Andy Li from UF 5

Introduction - Topic of this paper This paper addresses the challenges associated with controlling the elastic storage in a data- intensive service, in cloud computing environment. Intuitively, it does: If performance can not meet the Service Level Objective (SLO) → grow storage capacity If performance meets SLO, and system utilization is low → shrink storage capacity 6

Introduction - Topic of this paper In this paper, Hadoop Distributed File System (HDFS) is employed as the storage system. When the controller increases the storage size: Create new storage instances Move storage data to the new instances (data rebalancing) When the controller reduces the storage size: Remove a certain number of storage instances Some storage data on existing nodes get replicated because the replica number is lower than the replica degree N. This is automatically done by DHFS. 7

Outline Introduction System overview System architecture and modeling methodologies Evaluation Contribution and related work Discussions and future work 8

System overview What is the big picture Controller Cloud Provider (Amazon EC2) Web Tier (Apache server) Application Tier (Facebook core) Storage Tier (Hadoop DFS) Elastic Service Clients Sensor Actuator Gather measurements Manage instances Sensors higher system load Create more storage instances, and rebalance data Suppose we are hosting the Facebook server on amazon EC2 instances, with the proposed control techniques. Sensors lower system load Remove some storage nodes 9

System overview Challenges in elastic storage control Controlling elastic storage involves many challenges: Data Rebalancing. The newly added storage nodes will not be effective until data rebalancing is done. Interference to Guest Service. Data rebalancing also consumes the system resources. Actuator Delay. The controller must consider the delay of the control operations, otherwise it may response too late or become unstable. 10

Outline Introduction System overview System architecture and modeling methodologies Evaluation Contribution and related work Discussions and future work 11

System architecture The controller is composed by: Horizontal Scale Controller (HSC) - responsible for growing and shrinking the number of storage nodes. Data Rebalance Controller (DRC) - controlling the data transfers to rebalance the storage tier after it grows or shrinks. State machine - coordinating the actions of the HSC and the DRC. 12

System architecture - Horizontal Scale Controller (HSC) Actuator: The HSC uses cloud APIs to change the number of active server instances. Sensor: The paper uses CPU utilization on the storage nodes as the sensor feedback metric It is easy to measure, and strongly correlated to overall response time of the Cloudstone benchmark when the bottleneck is on the storage tier. 13

Modeling methodology - System model without controller The system without a controller can be described as this graph: U(z): Input to the system, the number of storage instances. D(z): The effect of client workload variance on the value of storage instance number. V(z): The effective number of storage instances Y(z): The Output of the system, the CPU utilization on storage nodes. G(z): The transfer function of the storage system. G(z) U(z) Y(z) + + V(z) D(z) 14

Modeling methodology - Controller - Integral control G(z) R(z) K(z) + - E(z) U(z) Y(z) + + V(z) D(z) 15

Modeling methodology - Controller - discrete control functions 16

Modeling methodology - Proportional thresholding 17

System architecture - Data Rebalance Controller (DRC) The DRC rebalances the layout of data in the system after the number of storage nodes grows or shrinks. Rebalancing is a cause of actuator delay and interference. Tuning knob of HDFS rebalancer: Bandwidth b allocated to the rebalancer. Select b to control the tradeoff between lag and interference. Big b - fast rebalance, serious impacts on normal service. Small b - slow rebalance, not very disruptive to normal service. 18

Modeling Methodology - Modeling the impacts of b 19

Modeling Methodology - Balancing between lag and interference 20

System architecture - State machine Recall that: Horizontal Scale Controller (HSC) is used to increase/shrink the number of storage nodes Data Rebalance Controller (DRC) is used to rebalance the storage after the changes in storage node size They have mutual dependencies: After HSC adds a new storage node, the system cannot obtain full service until DRC completes rebalancing. When one component is taking actions, the noise will be introduced to the sensor measurements of the other one. To preserve stability during adjustments, a state machine is employed to coordinate HSC and DRC to manage their mutual dependencies. 21

System architecture - State machine The following diagram shows the internal state machine of the elasticity controller in the storage tier. Horizontal Scale State Rebalance state Init Storage tier configuration changed? No Storage tier configuration changed? Yes Rebalancing done? Yes Rebalancing done? No Elasticity Controller Storage Tier 22

Outline Introduction System overview System architecture and modeling methodologies Evaluation Contribution and related work Discussions and future work 23

Evaluation - Experimental Testbed The paper employs CloudStone to run with GlassFish as the front-end application server tier. CloudStone: a flexible Web 2.0 benchmark generator GlassFish: an open source application server project HDFS is used for the storage HDFS is modified to expose the rebalancer’s bandwidth throttle b as an actuator to the external controller. The paper implements a local ORCA cluster as the cloud infrastructure provider ORCA: A resource control framework that provides a resource leasing service; guests can lease resources from a substrate resource provider, such as a cloud provider 24

Evaluation - Experimental Testbed The experimental service cluster: A group of servers running on a local network. To fully explore the effects of the storage tier: Other tiers are statically over-provisioned. The storage tier nodes: Dynamically allocated virtual machine instances They all have fixed resource configurations:  30 MB disk space; 512 MB RAM; single disk arm; 2.8 GHz CPU. HDFS is preloaded with at least 36 GB data. 25

Evaluation - Controller Effectiveness a1. CPU utilization - static b1. Response time - static a2. CPU utilization - dynamic b2. Response time - dynamic Target response time: 3 seconds. Target CPU utilization: 20%. See from the figures: 1. Dynamic provisioning is able to adapt to the load burst. 2. Instance creation and data rebalancing has cost and delay on effect. 26

Evaluation - Controller Effectiveness a1. CPU utilization - static b1. Response time - static a2. CPU utilization - dynamic b2. Response time - dynamic Target response time: 3 seconds. Target CPU utilization: 20%. See from the figures: 1. Dynamic provisioning is alert enough to adapt to the small load increase. 2. The cost and delay of node creation/rebalancing are smaller than the prev. 27

Evaluation - Resource Efficiency a1. CPU utilization - static b1. Response time - static a2. CPU utilization - dynamic b2. Response time - dynamic Target response time: 3 seconds. Target CPU utilization: 20%. See from the figures: 1. Shrinking the storage size has much lower cost/ delay than increasing it. 2. During resizing process, There are almost no SLO violations. 28

Evaluation - Comparison of Rebalance Policies 29

Outline Introduction System overview System architecture and modeling methodologies Evaluation Contribution and related work Discussions and future work 30

Contribution and related work This paper is the first to address the problem of automated control for elastic storage in cloud computing. SCADS is a related work dealing with dynamically scaling a storage system. It uses machine learning to predict resource requirements. Padala et al. proposed a decoupled architecture (between guest and cloud provider) for cloud computing. They did not consider the actuator constraints. Aqueduct uses a feedback controller to throttle the rebalancing bandwidth usage to ensure the SLOs will not be violated. The rebalancing may be able to use very little bandwidth. 31

Outline Introduction System overview System architecture and modeling methodologies Evaluation Contribution and related work Discussions and future work 32

Discussions and future work H(z) W(z) G(z) R(z) K(z) + - E(z) U(z) Y(z) + + V(z) D(z) 33

Discussions and future work 34

Discussions and future work Make the resource configuration of newly created storage instances tunable. Resizing storage size by adding/removing storage instances with flexible resource configuration. Optimizing the system by exploring the capacity and efficiency of individual storage instances, rather than storage instance amount. This requires investigating the performance of storage nodes under different setups: disk size, CPU frequency, RAM size, etc. 35

THANK YOU! 36