Google File System Simulator Pratima Kolan Vinod Ramachandran.

Slides:



Advertisements
Similar presentations
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
Advertisements

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.
The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
LOAD BALANCING IN A CENTRALIZED DISTRIBUTED SYSTEM BY ANILA JAGANNATHAM ELENA HARRIS.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
End-to-End Analysis of Distributed Video-on-Demand Systems Padmavathi Mundur, Robert Simon, and Arun K. Sood IEEE Transactions on Multimedia, February.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
The Google File System.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
Case Study - GFS.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.
1 The Google File System Reporter: You-Wei Zhang.
Detail: Reducing the Flow Completion Time Tail in Datacenter Networks SIGCOMM PIGGY.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
The Google File System Presenter: Gladon Almeida Authors: Sanjay Ghemawat Howard Gobioff Shun-Tak Leung Year: OCT’2003 Google File System14/9/2013.
Data in the Cloud – I Parallel Databases The Google File System Parallel File Systems.
Fragmentation in Large Object Repositories Russell Sears Catharine van Ingen CIDR 2007 This work was performed at Microsoft Research San Francisco with.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Presenters: Rezan Amiri Sahar Delroshan
Dana Butnariu Princeton University EDGE Lab June – September 2011 OPTIMAL SLEEPING IN DATACENTERS Joint work with Professor Mung Chiang, Ioannis Kamitsos,
GFS : Google File System Ömer Faruk İnce Fatih University - Computer Engineering Cloud Computing
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Problem-solving on large-scale clusters: theory and applications Lecture 4: GFS & Course Wrap-up.
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.
Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Google File System Robert Nishihara. What is GFS? Distributed filesystem for large-scale distributed applications.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
Implementation of Simple Cloud-based Distributed File System Group ID: 4 Baolin Wu, Liushan Yang, Pengyu Ji.
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Google File System Sanjay Ghemwat, Howard Gobioff, Shun-Tak Leung Vijay Reddy Mara Radhika Malladi.
Performance Evaluation of Redirection Schemes in Content Distribution Networks Jussi Kangasharju, Keith W. Ross Institut Eurecom Jim W. Roberts France.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Name : Mamatha J M Seminar guide: Mr. Kemparaju. GRID COMPUTING.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Web Server Load Balancing/Scheduling
Web Server Load Balancing/Scheduling
Regulating Data Flow in J2EE Application Server
Replication Middleware for Cloud Based Storage Service
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
B.Ramamurthy Appendix A
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
Zhen Xiao, Qi Chen, and Haipeng Luo May 2013
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
Replica Placement Heuristics of Application-level Multicast
Data Placement Problems in Database Applications
CS639: Data Management for Data Science
by Mikael Bjerga & Arne Lange
Performance-Robust Parallel I/O
CS639: Data Management for Data Science
The Google File System (GFS)
Presentation transcript:

Google File System Simulator Pratima Kolan Vinod Ramachandran

Google File System Master Manages Metadata Data Transfer Happens directly between client and chunk server Files broken into 64 MB chunks Chunks replicated across three machines for safety

Event Based Simulation Event 1Event 2Event 3 Component 1 Component 2 Component 3 Priority Queue Simulator Place Event in Priority Queue Get Next High Priority Event from Queue Output of simulated event

Simplified GFS Architecture ClientMaster Server Network Disk 1 Switch Switch: Infinite Bandwidth Network Disk 2 Network Disk 3Network Disk 4Network Disk 5 Represent Network Queues

Data Flow The client queries the master server for a Chunk ID it wants to read. The master server returns a set of disks ids that contain the Chunk. The client requests a disk for the Chunk The disk transfers the data to the client

Experiment Setup We have a client whose bandwidth can be varied from 0… Mbps We have 5 disks each a having a per disk bandwidth of 40 Mbps We have 3 chunk replicas per chunk of data as a baseline Each client request is for 1 Chunk of data from a disk

Simplified GFS Architecture ClientMaster Server Network Disk 1 Switch Client Bandwidth varied from 0… Mbps Per Disk Bandwidth : 40 Mbps Switch: Infinite Bandwidth Chunk ID: Network Disk 2 Network Disk 3Network Disk 4Network Disk 5 Represent Network Queues

Experiment 1 Disk Requests Served With out Load Balancing – In this case we pick the first chunk server from the list of available chunk servers that contains the disk block. Disk Requests Served With Load Balancing – In this case we apply a greedy algorithm and balance the load of incoming requests across the 5 disks

Expectation In the Non load balancing case we expect the effective request/data rate to reach a peak value of 2 disks(80 Mbps) In the load balancing case we expect the effective request/data rate to reach a peak value of 5 disks(200 Mbps)

Load Balancing Graph This graph plots the data rate at client vs. client bandwidth

Experiment 2 Disk Requests Served With No Dynamic Replication – In this case we have a fixed number of replicas(3 in our case) and the server does not create more replication based on statistics for read requests. Disk Requests Served With Dynamic Replication – In this case the server replicates certain chunks based on the frequency of the chunk requests. – We define a replication factor, which is fraction < 1 – No of Replicas For Chunk = (replication factor) * No of requests For The Chunk – We Cap the Max No of Replicas by the Number of disks

Expectation Our Requests are all aimed on the chunks placed in disk 0,disk 1, disk2. In the non replication case we expect the effective data rate at the client to me limited by the bandwidth provided by 3 disks(120 Mbps) In the replication case we expect the effective data rate at the client to me limited by the bandwidth provided by 5 disks(200 Mbps)

Replication Graph This graph plots the data rate at client vs. client bandwidth

Experiment 3 Disk Requests Served with no Rebalancing – In this case we do not implement any rebalancing of read requests based on frequency of chunk requests Disk Requests Served with Rebalancing – In this case we perform rebalancing of read requests by picking a request with highest frequency and transferring it to a disk with a lesser load

Graph 3

Request Distribution Graph

Conclusion and Future Work GFS is a simple file system for large-data intensive applications We studied the behavior of certain read workloads on this file system In the future we would like to come up with optimizations that could fine tune GFS