Authors: Sajjad Rizvi, Xi Li, Bernard Wong, Fiodar Kazhamiaka

Slides:

Advertisements

Similar presentations

Advertisements

1 EP2210 Fairness Lecture material: –Bertsekas, Gallager, Data networks, 6.5 –L. Massoulie, J. Roberts, "Bandwidth sharing: objectives and algorithms,“

Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.

Yu Stephanie Sun 1, Lei Xie 1, Qi Alfred Chen 2, Sanglu Lu 1, Daoxu Chen 1 1 State Key Laboratory for Novel Software Technology, Nanjing University, China.

What’s the Problem Web Server 1 Web Server N Web system played an essential role in Proving and Retrieve information. Cause Overloaded Status and Longer.

Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.

Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.

Performance Evaluation of Peer-to-Peer Video Streaming Systems Wilson, W.F. Poon The Chinese University of Hong Kong.

Wide Web Load Balancing Algorithm Design Yingfang Zhang.

The Google File System.

On Self Adaptive Routing in Dynamic Environments -- A probabilistic routing scheme Haiyong Xie, Lili Qiu, Yang Richard Yang and Yin Yale, MR and.

1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:

Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.

1 The Google File System Reporter: You-Wei Zhang.

Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.

Efficient Protocols for Massive Data Transport Sailesh Kumar.

Network Aware Resource Allocation in Distributed Clouds.

1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.

Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.

Politecnico di Torino Dipartimento di Automatica ed Informatica TORSEC Group Performance of Xen’s Secured Virtual Networks Emanuele Cesena Paolo Carlo.

A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.

1 Finding Constant From Change: Revisiting Network Performance Aware Optimizations on IaaS Clouds Yifan Gong, Bingsheng He, Dan Li Nanyang Technological.

High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.

Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.

《 Hierarchical Caching Management for Software Defined Content Network based on Node Value 》 Reporter ： Jing Liu ， China Affiliation ： University of Science.

Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.

1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.

SDN Management Layer DESIGN REQUIREMENTS AND FUTURE DIRECTION NO OF SLIDES : 26 1.

Deadline-based Resource Management for Information- Centric Networks Somaya Arianfar, Pasi Sarolahti, Jörg Ott Aalto University, Department of Communications.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Static Process Scheduling

Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】

Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,

Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.

Internet Traffic Engineering Motivation: –The Fish problem, congested links. –Two properties of IP routing Destination based Local optimization TE: optimizing.

Optimization-based Cross-Layer Design in Networked Control Systems Jia Bai, Emeka P. Eyisi Yuan Xue and Xenofon D. Koutsoukos.

The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)

Towards an integrated multimedia service hosting overlay Dongyan Xu Xuxian Jiang Proceedings of the 12th annual ACM international conference on Multimedia.

VL2: A Scalable and Flexible Data Center Network

Network Layer COMPUTER NETWORKS Networking Standards (Network LAYER)

Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Accelerating Peer-to-Peer Networks for Video Streaming

Confluent vs. Splittable Flows

Web Server Load Balancing/Scheduling

Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le

Web Server Load Balancing/Scheduling

The Impact of Replacement Granularity on Video Caching

BD-CACHE Big Data Caching for Datacenters

SOUTHERN TAIWAN UNIVERSITY ELECTRICAL ENGINEERING DEPARTMENT

An Equal-Opportunity-Loss MPLS-Based Network Design Model

A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.

The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.

Be Fast, Cheap and in Control

Providing Secure Storage on the Internet

DDoS Attack Detection under SDN Context

ExaO: Software Defined Data Distribution for Exascale Sciences

Multi-hop Coflow Routing and Scheduling in Data Centers

Dynamic Packet-filtering in High-speed Networks Using NetFPGAs

Networked Real-Time Systems: Routing and Scheduling

ADVISOR : Professor Yeong-Sung Lin STUDENT : Hung-Shi Wang

Performance Evaluation of Computer Networks

Performance Evaluation of Computer Networks

THE GOOGLE FILE SYSTEM.

by Mikael Bjerga & Arne Lange

Routing and the Network Layer (ref: Interconnections by Perlman

The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP’03, October 19–22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae.

Towards Predictable Datacenter Networks

Presentation transcript:

Mayflower: Improving Distributed Filesystem Performance Through DFN/Filesystem Co-Design Authors: Sajjad Rizvi, Xi Li, Bernard Wong, Fiodar Kazhamiaka Denjamin Cassel Presented: Yihan Li

Outline Motivation Introduction Design Overview Replica and Path Selection Evaluation Conclusion

Motivation Network is the performance bottleneck Distributed filesystems are the primary bandwidth consumers Oversubscribed network architectures High-performance SSDs Current distributed filesystems and network control planes are designed independently Only use static network information They are not reciprocally involved in making network decisions To perform replica selection based on network distance Does not capture dynamic resource contention Does not capture network congestion

What is Mayflower (I) Mayflower is co-designed from ground up with a Software-Defined Networking (SDN) control plane It consists of three main components Dataserver Nameserver Flowserver It can perform path selection for other applications Read requests Through a public interface

What is Mayflower (II) Dataservers Performs reads from and appends to file chunks Nameserver Manages the file to chunk mapping Flowserver To run alongside the SDN controller Models the path bandwidth of the elephant flows Performs both replica and network path selection

Advantage It enables both filesystem and network decisions to be made collaboratively by the filesystem and network control plane Mayflower evaluates all possible paths between the client and all of the replica hosts It can directly minimize average request completion time Expected completion time of the pending request Expected increase in completion time of other in-flight requests It can determine if read concurrently from multiple replica hosts

Design Overview (I) Five assumptions The system only stores a modest number of files Most reads are large and sequential, and clients often fetch entire files File writes are primarily large sequential appends to files (random writes are very rare) The workloads are heavily read-dominant The network is the bottleneck

Design Overview (II) Select both the replica and the network path for the Mayflower read operations Estimating current network and make selections It can work together with existing network managers It periodically fetches the flow stats from the edge switches (for avoiding error) It re-computes an estimate of the path bandwidth (ensure completion time estimates are accurate)

File Read Operation

Design Overview (III) Mayflower provides sequential consistency by default Mayflower provides linearizability with respect to read and append requests Sending the last chunk’s read requests to the primary replica host Vast majority of chunks can be serviced by any replica host Most chunks are essentially immutable System delays the delete for T time (maximum expiration period) for consistency

Replica-path Selection Algorithm Based on estimated network state Bandwidth estimations Remaining flow size approximations Target performance metrics Average job completion times Must account for the effect on existing flows New flows affect the path selection for already scheduled flows

Problem Statement (I) Optimization goal Select the network path that minimize the completion time of both the new flow as well as existing flows The algorithm considers The paths of existing flows The capacity of each link The data size of each request The estimated bandwidth shares of existing flows The remaining un-transferred data size of existing flows

Problem Statement (II) G: paths from source to destination ci,j: cost of impact on existing flows bi,j: bottleneck bandwidth di,j: data flow Ii,j: binary indicator S: super source t: sink node x: data size

Replica-Path Selection Process (I)

Replica-Path Selection Process (II) The first portion: estimates the cost of the new flow The second portion: estimates the impact of the flow on existing flows Fp in path p Bandwidth share of the existing flows: max-min fair share calculations Unknown flow size: use an estimate size (average elephant flow size) Slack in updating bandwidth utilization: the bandwidth utilization for the new flow is set to its estimated bandwidth share Existing flows: updated with their new estimated values

Replica-Path Selection Process (III)

Replica-Path Selection Process (IV) Reading from multiple replicas for reducing the completion time Total cost Size of sub-flow bandwidth

Evaluation (I) 13 machines 64 GB RAM 200 GB Intel S3700 SSD Experimental Setup 13 machines 64 GB RAM 200 GB Intel S3700 SSD Two Intel Xeon E5-2620 Mellanox SX6012 switch via 10 Gbps links 64 virtual hosts, four pads (each with three physical machines)

Traffic Matrix Job arrival follows the Poisson distribution File read popularity follows the Zipf distribution with the skewness parameter equals to 1.1 R: a client is placed in the same rack as the primary replica P: in another rack but in the same pod O = 1 – R – P: in a different pod Pod 1 Pod 2 Pod 3 Pod 4 Rack 1 Rack 2 Rack 3 Rack 4

Paths to all replicas are partially congested Evaluation (II) Paths to all replicas are partially congested

Evaluation (III)

Evaluation (IV) Mayflower is effective at avoiding congestion point

Evaluation (V)

Interdependence between the network and the applications Read requests Evaluation (VI) Interdependence between the network and the applications Background flow

Evaluation (VII)

Evaluation (VIII)

Conclusions How Mayflower improves read performance Distributed filesystem that follows a network/filesystem co-design approach It provides a novel replica and network path selection algorithm Evaluation