Load Rebalancing for Distributed File Systems in Clouds Hung-Chang Hsiao, Member, IEEE Computer Society, Hsueh-Yi Chung, Haiying Shen, Member, IEEE, and.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Small-world Overlay P2P Network
A Scalable Content Addressable Network (CAN)
Data Structures Hash Tables
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Design and Performance Evaluation of Queue-and-Rate-Adjustment Dynamic Load Balancing Policies for Distributed Networks Zeng Zeng, Bharadwaj, IEEE TRASACTION.
Load Balancing in Structured P2P Systems (DHTs) Sonesh Surana [Brighten Godfrey, Karthik Lakshminarayanan, Ananth Rao, Ion Stoica,
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
SCALLOP A Scalable and Load-Balanced Peer- to-Peer Lookup Protocol for High- Performance Distributed System Jerry Chou, Tai-Yi Huang & Kuang-Li Huang Embedded.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Structured P2P Network Group14: Qiwei Zhang; Shi Yan; Dawei Ouyang; Boyu Sun.
Algorithms for Self-Organization and Adaptive Service Placement in Dynamic Distributed Systems Artur Andrzejak, Sven Graupner,Vadim Kotov, Holger Trinks.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
Ch 4. The Evolution of Analytic Scalability
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
1 The Google File System Reporter: You-Wei Zhang.
Distributed Asynchronous Bellman-Ford Algorithm
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
Load Balancing in Structured P2P System Ananth Rao, Karthik Lakshminarayanan, Sonesh Surana, Richard Karp, Ion Stoica IPTPS ’03 Kyungmin Cho 2003/05/20.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Introduction to Hadoop and HDFS
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Papers on Storage Systems 1) Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud, SC ) Making Cloud Intermediate Data Fault-Tolerant,
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Chord Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
1. Outline  Introduction  Different Mechanisms Broadcasting Multicasting Forward Pointers Home-based approach Distributed Hash Tables Hierarchical approaches.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Peer to Peer Network Design Discovery and Routing algorithms
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Distributed Hash Tables Steve Ko Computer Sciences and Engineering University at Buffalo.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Bruce Hammer, Steve Wallis, Raymond Ho
UbiFlow: Mobile Management in Urban-scale Software Defined IoT
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
Load Rebalancing for Distributed File Systems in Clouds.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
CSE 486/586 Distributed Systems Distributed Hash Tables
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Large-scale file systems and Map-Reduce
Pastry Scalable, decentralized object locations and routing for large p2p systems.
CSE 486/586 Distributed Systems Distributed Hash Tables
DHT Routing Geometries and Chord
Outline Midterm results summary Distributed file systems – continued
CSE 486/586 Distributed Systems Distributed Hash Tables
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Load Rebalancing for Distributed File Systems in Clouds Hung-Chang Hsiao, Member, IEEE Computer Society, Hsueh-Yi Chung, Haiying Shen, Member, IEEE, and Yu-Chang Chao speaker: 饒展榕

OUTLINE INTRODUCTION LOAD REBALANCING PROBLEM OUR PROPOSAL SIMULATIONS IMPLEMENTATION AND MEASUREMENT SUMMARY

INTRODUCTION In clouds, clients can dynamically allocate their resources on-demand without sophisticated deployment and management of resources. Key enabling technologies for clouds include the MapReduce programming paradigm distributed file systems virtualization and so forth.

These techniques emphasize scalability, so clouds can be large in scale, and comprising entities can arbitrarily fail and join while maintaining system reliability. Distributed file systems are key building blocks for cloud computing applications based on the MapReduce programming paradigm.

In such file systems, nodes simultaneously serve computing and storage functions; a file is partitioned into a number of chunks allocated in distinct nodes so that MapReduce tasks can be performed in parallel over the nodes.

In such a distributed file system, the load of a node is typically proportional to the number of file chunks the node possesses. Because the files in a cloud can be arbitrarily created, deleted, and appended, and nodes can be upgraded, replaced and added in the file system, the file chunks are not distributed as uniformly as possible among the nodes.

Load balance among storage nodes is a critical function in clouds. In a load-balanced cloud, the resources can be well utilized and provisioned, maximizing the performance of MapReduce-based applications.

State-of-the-art distributed file systems (e.g., Google GFS[2] and Hadoop HDFS [3]) in clouds rely on central nodes to manage the metadata information of the file systems and to balance the loads of storage nodes based on that metadata.

Our objective is to allocate the chunks of files as uniformly as possible among the nodes such that no node manages an excessive number of chunks.

Additionally, we aim to reduce network traffic (or movement cost) caused by rebalancing the loads of nodes as much as possible to maximize the network bandwidth available to normal applications. Exploiting capable nodes to improve the system performance is, thus, demanded.

The storage nodes are structured as a network based on distributed hash tables (DHTs). DHTs enable nodes to self-organize and -repair while constantly offering lookup functionality in node dynamism, simplifying the system provision and management.

LOAD REBALANCING PROBLEM First, let us denote the set of files as F. Each file f ε F is partitioned into a number of disjointed, fixedsize chunks denoted by Cf. Second, the load of a chunkserver is proportional to the number of chunks hosted by the server [3].

Third, node failure is the norm in such a distributed system, and the chunkservers may be upgraded, replaced and added in the system. Finally, the files in F may be arbitrarily created, deleted, and appended. The net effect results in file chunks not being uniformly distributed to the chunkservers.

Our objective in the current study is to design a load rebalancing algorithm to reallocate file chunks such that the chunks can be distributed to the system as uniformly as possible while reducing the movement cost as much as possible.

OUR PROPOSAL A file in the system is partitioned into a number of fixed-size chunks, and “each” chunk has a unique chunk handle (or chunk identifier) named with a globally known hash function such as SHA1.

We represent the IDs of the chunkservers in V by1/n ; 2/n ; 3/n ;... ; n/n ; for short, denote the n chunkservers as 1; 2; 3;... ; n. Unless therwise clearly indicated, we denote the successor of chunkserver i as chunkserver i + 1 and the successor of chunkserver n as chunkserver 1.

In most DHTs, the average number of nodes visited for a lookup is O(log η ), if each chunkserver i maintains log 2 n neighbors, that is, nodes i +2 k mod n for k =0; 1; 2;... ; log 2 n -1. Among the log 2 n neighbors, the one i is the successor of i.

DHTs are used in our proposal for the following reasons: The chunkservers self-configure and self-heal in our proposal because of their arrivals, departures, and failures, simplifying the system provisioning and management. The DHT network is transparent to the metadata management in our proposal.

3.2 Load Rebalancing Algorithm A large-scale distributed file system is in a load-balanced state if each chunkserver hosts no more than A chunks. In our proposed algorithm, each chunkserver node i first estimates whether it is underloaded (light) or overloaded (heavy) without global knowledge.

A node is light if the number of chunks it hosts is smaller than the threshold of (1- △ L)A (where 0 ≦ △ L < 1). In contrast, a heavy node manages the number of chunks greater than (1+ △ U)A, where 0 ≦ △ U < 1. △ L and △ U are system parameters.

In the following discussion, if a node i departs and rejoins as a successor of another node j, then we represent node i as node j +1, node j’s original successor as node j + 2, the successor of node j’s original successor as node j + 3, and so on.

For each node i ε V, if node i is light, then it seeks a heavy node and takes over at most A chunks from the heavy node.

We first present a load-balancing algorithm, in which each node has global knowledge regarding the system, that leads to low movement cost and fast convergence. We then extend this algorithm for the situation that the global knowledge is not available to each node without degrading its performance.

Based on the global knowledge, if node I finds it is the least-loaded node in the system, i leaves the system by migrating its locally hosted chunks to its successor i +1 and then rejoins instantly as the successor of the heaviest node (say, node j).

To immediately relieve node j’s load, node i requests min{Lj-A; A} chunks from j. That is, node i requests A chunks from the heaviest node j if j’s load exceeds 2A; otherwise, i requests a load of Lj -A from j to relieve j’s load.

Node j may still remain as the heaviest node in the system after it has migrated its load to node i. In this case, the current least-loaded node, say node i0, departs and then rejoins the system as j’s successor. That is, i0 becomes node j t 1, and j’s original successor i thus becomes node j t 2. Such a process repeats iteratively until j is no longer the heaviest.

Such a load-balancing algorithm by mapping the least-loaded and most-loaded nodes in the system has properties as follows: Low movement cost. Fast convergence rate.

The mapping between the lightest and heaviest nodes at each time in a sequence can be further improved to reach the global load- balanced system state. We extend the algorithm by pairing the top-k1 underloaded nodes with the top-k2 Overloaded nodes.

We use U to denote the set of top-k1 underloaded nodes in the sorted list of underloaded nodes, and use O to denote the set of top-k2 overloaded nodes in the sorted list of overloaded nodes.