A Cloud Data Center Optimization Approach using Dynamic Data Interchanges Prof. Stephan Robert University of Applied Sciences.

Slides:

Advertisements

Similar presentations

Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,

Advertisements

Dynamic Server Allocation in Heterogeneous Clusters J. Palmer I. Mitrani School of Computing Science University of Newcastle NE1 7RU

QoS-based Management of Multiple Shared Resources in Dynamic Real-Time Systems Klaus Ecker, Frank Drews School of EECS, Ohio University, Athens, OH {ecker,

Chapter 20 Oracle Secure Backup.

1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.

Impact of Interference on Multi-hop Wireless Network Performance Kamal Jain, Jitu Padhye, Venkat Padmanabhan and Lili Qiu Microsoft Research Redmond.

Achieving Elasticity for Cloud MapReduce Jobs Khaled Salah IEEE CloudNet 2013 – San Francisco November 13, 2013.

Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.

S. J. Shyu Chap. 1 Introduction 1 The Design and Analysis of Algorithms Chapter 1 Introduction S. J. Shyu.

Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science

Xavier León PhD defense

Small-world Overlay P2P Network

Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.

Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.

ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Computer Science Department Stony Brook University.

ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Department of Computer Science Stony Brook University.

Wide-area cooperative storage with CFS

FileSecure Implementation Training Patch Management Version 1.1.

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

1 Global Meta-Hybrids for Large-Scale Combinatorial Optimization Professor Leyuan Shi Department of Industrial Engineering University of Wisconsin-Madison.

DEXA 2005 Quality-Aware Replication of Multimedia Data Yicheng Tu, Jingfeng Yan and Sunil Prabhakar Department of Computer Sciences, Purdue University.

Naixue GSU Slide 1 ICVCI’09 Oct. 22, 2009 A Multi-Cloud Computing Scheme for Sharing Computing Resources to Satisfy Local Cloud User Requirements.

Fast Spectrum Allocation in Coordinated Dynamic Spectrum Access Based Cellular Networks Anand Prabhu Subramanian*, Himanshu Gupta*,

MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.

Storage Allocation in Prefetching Techniques of Web Caches D. Zeng, F. Wang, S. Ram Appeared in proceedings of ACM conference in Electronic commerce (EC’03)

CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.

Energy-Efficient Video Multicast in 4G Wireless Systems Ya-Ju Yu 1, Pi-Cheng Hsiu 2,3, and Ai-Chun Pang 1,4 1 Graduate Institute of Networking and Multimedia,

Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.

Network Aware Resource Allocation in Distributed Clouds.

Microprocessor-based systems Curse 7 Memory hierarchies.

Distributing Layered Encoded Video through Caches Authors: Jussi Kangasharju Felix HartantoMartin Reisslein Keith W. Ross Proceedings of IEEE Infocom 2001,

Segment-Based Proxy Caching of Multimedia Streams Authors: Kun-Lung Wu, Philip S. Yu, and Joel L. Wolf IBM T.J. Watson Research Center Proceedings of The.

1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,

RANI NALAMARU DEPARTMENT OF COMPUTER SCIENCE BALL STATE UNIVERSITY RANI NALAMARU DEPARTMENT OF COMPUTER SCIENCE BALL STATE UNIVERSITY Efficient Transmission.

Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,

An Approach to Persistence of Web Resources Joachim Feise University of California, Irvine Information and Computer Science

The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,

An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.

Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.

Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.

CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)

ASSIGNMENT, DISTRIBUTION AND QOS PROVISIONING IN COMMUNICATION NETWORKS.

1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.

Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.

Evolving RBF Networks via GP for Estimating Fitness Values using Surrogate Models Ahmed Kattan Edgar Galvan.

Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.

CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.

An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.

Video Caching in Radio Access network: Impact on Delay and Capacity

On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.

Optimizing server placement in distributed systems in the presence of competition Jan-Jan Wu( 吳真貞 ), Shu-Fan Shih ( 施書帆 ), Pangfeng Liu ( 劉邦鋒 ), Yi-Min.

Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.

CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 27 – Media Server (Part 2) Klara Nahrstedt Spring 2009.

Distributed Control and Autonomous Systems Lab. Sang-Hyuk Yun and Hyo-Sung Ahn Distributed Control and Autonomous Systems Laboratory (DCASL ) Department.

Management of Broadband Media Assets on Wide Area Networks Lars-Olof Burchard.

Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

A Hierarchical Edge Cloud Architecture for Mobile Computing IEEE INFOCOM 2016 Liang Tong, Yong Li and Wei Gao University of Tennessee – Knoxville 1.

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Measurement-based Design

Introduction | Model | Solution | Evaluation

The Impact of Replacement Granularity on Video Caching

Server Allocation for Multiplayer Cloud Gaming

Edge computing (1) Content Distribution Networks

The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’

Presentation transcript:

A Cloud Data Center Optimization Approach using Dynamic Data Interchanges Prof. Stephan Robert University of Applied Sciences of Western Switzerland IEEE CloudNet San Francisco November 2013

Motivation and background Distributed datacenters in the Cloud have become popular ways to increase data availability and reducing costs Cloud storage has received a lot of attention with a view to reduce costs: – Minimizing infrastructure and running costs – Allocation of data servers to customers – Geo-optimization (look at locations of where customers are to decide where to place datacenters)

Datacenter optimization Research areas on optimizing datacenter operations: – Energy and power management – Cost benefit analysis – Cloud networks versus Grids – Geo-distribution of cloud centers – Multi-level caching

Motivation and background (cont.) We consider the operational situation when we have decided on the datacenter locations. Is there any other optimization we can perform? Problem we examine: – Data locality: users not always near the data -> higher costs – Situation can change over time: we can decide to place our data near the users now, but there is no guarantee this will not change in the future

Principal idea We consider a model for actively moving data closer to the current users. When needed, we move data from one server to a temporary (cache) area in a different server. In the near future, when users request this particular data, we can serve them from the local cache.

Benefits Benefit of copying (caching) data to a local server: – We correct the mismatch between where the data is and where the users are. – We only copy once (cost), read many (benefit). – We train the algorithm by using a history of requests to determine the relative frequency of items being requested (in an efficient way, as the number can be very large).

Model We consider a combinatorial optimization model to determine the best placement of the data This model will tell us if we need to copy data from one datacenter to another, in anticipation of user requests. The optimization aim is to minimize the total expected cost of serving the future user data requests The optimization constraints are the cache size capacities. The model accounts for: – The cost of copying data between datacenters – The relative cost/benefit of delivering the data from a remote vs. a local server – The likelihood that particular data will be requested in particular locations in the near future

Model if object i is obtained from datacenter d Each object must be available in at least one datacenter The cache size Z of each datacenter must not be exceeded Expected cost of retrieving object i from datacenter d Cost of copying object i from default datacenter to another datacenter d Probability object i will be requested by user u

Operational aspects Firstly, we must obtain a historical log of requests, including who requested what, where the file was located and file size. We use this information to calculate the access probabilities in the model (in practice, using Hbase/Hadoop in a distributed manner). The costs in the model have to be decided based on the architecture etc (eg the relative benefit of using a local server versus a remote one for a particular user. Periodically (eg daily) we run the algorithm to determine any data duplication that is beneficial to do. (Of course, the network must be aware of the local copies and know to use them).

Computational experimentation Computational experimentation carried out in a simulation environment (no real-life implementation at this stage) We measured the costs/benefits of obtaining the data directly against using our optimization model to rearrange the data periodically Consistent performance for 3, 5, 10 datacenters.

Computational experimentation Setup of N datacenters located on a circle Users placed at random inside the circle Costs linked to the distance Data object requests were generated from Zipf distribution (independently for each user) First half if data used to train the algorithm (historic access log), the second half used for the simulation.

Simulation results – parameter variation Datacenters Users Cache size Objects (problem size) 320Small (1500)Small (100) 5100Large (3000)Med (500) 10500Large (1000) 1000

Simulation results Data centers UsersCache sizeObjects (problem size) Cost (default) Cost (optimized) % cost improve ment 320Small (1500)Med (500) % 3500Large (3000)Small (100) % 31000Small (1500)Med (500) % 5100Large (3000)Large (1000) % 5500Small (1500)Med (500) % 51000Large (3000)Small (100) % 1020Large (3000)Large (1000) % 10100Small (1500)Med (500) % Large (3000)Large (1000) % Promising results with ~ 20% cost reduction on average Full results appear in the proceedings paper

Practicalities – is the idea feasible in a real system? More complexities but also easy solutions – Time criticality: no need to use on live system, can optimize object locations overnight periodic dynamic reconfiguration – Metadata storage: need to store object access frequencies to calculate the probabilities p. Implemented a metadata storage in HBase on a Hadoop cluster. –> conclusion feasible and easy

Complexity issues Optimization problem is complex (NP hard) to solve. – Can keep input size small: We only need to consider the most popular objects.. – Currently developing a fast heuristic algorithm based on knapsack methods Standard problems of data – Other complexities: legal issues of moving data across countries (if personal data are involved)

Thank you Questions?