Download presentation
Presentation is loading. Please wait.
Published byEsmond Quinn Modified over 8 years ago
1
Google File System Simulator Pratima Kolan Vinod Ramachandran
2
Google File System Master Manages Metadata Data Transfer Happens directly between client and chunk server Files broken into 64 MB chunks Chunks replicated across three machines for safety
3
Event Based Simulation Event 1Event 2Event 3 Component 1 Component 2 Component 3 Priority Queue Simulator Place Event in Priority Queue Get Next High Priority Event from Queue Output of simulated event
4
Simplified GFS Architecture ClientMaster Server Network Disk 1 Switch Switch: Infinite Bandwidth Network Disk 2 Network Disk 3Network Disk 4Network Disk 5 Represent Network Queues
5
Data Flow The client queries the master server for a Chunk ID it wants to read. The master server returns a set of disks ids that contain the Chunk. The client requests a disk for the Chunk The disk transfers the data to the client
6
Experiment Setup We have a client whose bandwidth can be varied from 0…..1000 Mbps We have 5 disks each a having a per disk bandwidth of 40 Mbps We have 3 chunk replicas per chunk of data as a baseline Each client request is for 1 Chunk of data from a disk
7
Simplified GFS Architecture ClientMaster Server Network Disk 1 Switch Client Bandwidth varied from 0…..1000 Mbps Per Disk Bandwidth : 40 Mbps Switch: Infinite Bandwidth Chunk ID: 0-1000 0-1000 0-2000 1001-2000 1001-2000 Network Disk 2 Network Disk 3Network Disk 4Network Disk 5 Represent Network Queues
8
Experiment 1 Disk Requests Served With out Load Balancing – In this case we pick the first chunk server from the list of available chunk servers that contains the disk block. Disk Requests Served With Load Balancing – In this case we apply a greedy algorithm and balance the load of incoming requests across the 5 disks
9
Expectation In the Non load balancing case we expect the effective request/data rate to reach a peak value of 2 disks(80 Mbps) In the load balancing case we expect the effective request/data rate to reach a peak value of 5 disks(200 Mbps)
10
Load Balancing Graph This graph plots the data rate at client vs. client bandwidth
11
Experiment 2 Disk Requests Served With No Dynamic Replication – In this case we have a fixed number of replicas(3 in our case) and the server does not create more replication based on statistics for read requests. Disk Requests Served With Dynamic Replication – In this case the server replicates certain chunks based on the frequency of the chunk requests. – We define a replication factor, which is fraction < 1 – No of Replicas For Chunk = (replication factor) * No of requests For The Chunk – We Cap the Max No of Replicas by the Number of disks
12
Expectation Our Requests are all aimed on the chunks placed in disk 0,disk 1, disk2. In the non replication case we expect the effective data rate at the client to me limited by the bandwidth provided by 3 disks(120 Mbps) In the replication case we expect the effective data rate at the client to me limited by the bandwidth provided by 5 disks(200 Mbps)
13
Replication Graph This graph plots the data rate at client vs. client bandwidth
14
Experiment 3 Disk Requests Served with no Rebalancing – In this case we do not implement any rebalancing of read requests based on frequency of chunk requests Disk Requests Served with Rebalancing – In this case we perform rebalancing of read requests by picking a request with highest frequency and transferring it to a disk with a lesser load
15
Graph 3
16
Request Distribution Graph
17
Conclusion and Future Work GFS is a simple file system for large-data intensive applications We studied the behavior of certain read workloads on this file system In the future we would like to come up with optimizations that could fine tune GFS
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.