Ch 11 Distributed File System

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
Implementation of Simple Cloud-based Distributed File System Group ID: 4 Baolin Wu, Liushan Yang, Pengyu Ji.
P-Grid Presentation by Thierry Lopez P-Grid: A Self-organizing Structured P2P System Karl Aberer, Philippe Cudré-Mauroux, Anwitaman Datta, Zoran Despotovic,
Northwestern University 2007 Winter – EECS 443 Advanced Operating Systems The Google File System S. Ghemawat, H. Gobioff and S-T. Leung, The Google File.
Distributed File Systems
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Presenters: Rezan Amiri Sahar Delroshan
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Distributed File Systems Architecture – 11.1 Processes – 11.2 Communication – 11.3 Naming – 11.4.
Sun Network File System Presentation 3 Group A4 Sean Hudson, Syeda Taib, Manasi Kapadia.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Distributed File Systems Architecture – 11.1 Processes – 11.2 Communication – 11.3 Naming – 11.4.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Ch 11 Distributed File System Ch11.1 Architecture Lei Zhang Oct
Implementation of Simple Cloud-based Distributed File System Group ID: 4 Baolin Wu, Liushan Yang, Pengyu Ji.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Distributed Systems: Distributed File Systems Ghada Ahmed, PhD. Assistant Prof., Computer Science Dept. Web:
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
File Systems for Cloud Computing Chittaranjan Hota, PhD Faculty Incharge, Information Processing Division Birla Institute of Technology & Science-Pilani,
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Data Management with Google File System Pramod Bhatotia wp. mpi-sws
Cloud Computing CS Distributed File Systems and Cloud Storage – Part I
File System Implementation
Introduction to HDFS: Hadoop Distributed File System
CHAPTER 3 Architectures for Distributed Systems
Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.
Distributed Systems CS
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Sajitha Naduvil-vadukootu
A Scalable content-addressable network
Distributed Systems CS
CS-4513 Distributed Computing Systems Hugh C. Lauer
The Google File System (GFS)
A Redundant Global Storage Architecture
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
IS 651: Distributed Systems Distributed File Systems
The Google File System (GFS)
Distributed File Systems
Distributed File Systems
The Google File System (GFS)
The Google File System (GFS)
CSE 451: Operating Systems Spring Module 21 Distributed File Systems
CS 345A Data Mining MapReduce This presentation has been altered.
Cloud scale storage: The Google File system
Distributed File Systems
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
CSE 451: Operating Systems Distributed File Systems
The Google File System (GFS)
Chapter 15: File System Internals
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
THE GOOGLE FILE SYSTEM.
Distributed File Systems
CSE 451: Operating Systems Autumn Module 22 Distributed File Systems
Distributed File Systems
Network File System (NFS)
The Google File System (GFS)
Presentation transcript:

Ch 11 Distributed File System Ch11.1 Architecture Srithi Reddy Muthyala Oct 6 2017

Three Archs to Introduce Client-Server Arch (Centralized) NFS (Network File System) Cluster-based Arch (Less Centralized) GFS (Global File System) Symmetric Arch (Fully Distributed) DHT-based (Distributed Hash Table)

Three Archs to Introduce Client-Server Arch (Centralized) NFS Cluster-based Arch (Less Centralized) GFS Symmetric Arch (Fully Distributed) DHT-based

Intro to NFS 2 ways of C-S Arch Naive way. RPC

Intro to NFS- basics Although implemented by SUN Solaris, it is the predominant FS implementation on Unix System Layered Structure VFS: Virtual File System- Common Interface for Remote file and local RPC: For data transport

NFS API Interfaces

Three Archs to Introduce Client-Server Arch (Centralized) NFS Cluster-based Arch (Less Centralized) GFS Symmetric Arch (Fully Distributed) DHT-based

Cluster-Based Distributed File Systems Downsides of a C-S Arch Performance bottle neck Single-Point-Failure Solution: Files(resources) can be stored on a few servers A big file across multi servers File Stripping for big structured files Many files on different servers Most files are not well structured

Cluster-Based Distributed File Systems How to support file access in a Data Center? Files permanently growing File size might be multi gigabytes. A server might be malfunction File access request from any client should be responded in any condition

Cluster-Based Distributed File Systems

Cluster-Based Distributed File Systems GFS, how does it work? A cluster has a master node, which ONLY keeps meta information of files A big file is splited into CHUNKS, a CHUNK of size 64Mbs. Chunks are spread on many chunk servers More details on GFS Chunks are replicated --- Redundancy Master does not keep up-to-date of chunk locations A Chunks server knows what exactly it stores. If client retrieval failed(low probability), ask Master again, master update latest info from chunk servers

Cluster-Based Distributed File Systems GFS, how does it work? File update. Client pushes back updated file chunk to corresponding chunk server Chunk server conducts the backup/replication Master node is kept out of this loop, bottle neck problem is solved I/O performance of a GFS is pretty good and scalability is good as well

Three Archs to Introduce Client-Server Arch (Centralized) NFS(Network File System) Cluster-based Arch (Less Centralized) GFS ( Global File System) Symmetric Arch (Fully Distributed) DHT-based (Distributed Hash Table)

Symmetric Arch Peer-to-Peer No Client, No server, No Master, No Chunk First realization is Ivy (Multi user Read/Write)

Symmetric Arch

What is a DHT? Hash Table data structure that maps “keys” to “values” essential building block in software systems Distributed Hash Table (DHT) similar, but spread across many hosts Interface insert(key, value) lookup(key)

Symmetric Arch Ivy details Data storage. File composed of 8kb data blocks. Content-hash data blocks Public-key based blocks Replication Every block B is stored on K immediate successors, better availability

DHT: basic idea K V K V K V K V K V K V K V K V K V K V K V Operation: take key as input; route messages to node holding key

Future Developments Client-Server Arch (Centralized) NFS Cluster-based Arch (Less Centralized) GFS Symmetric Arch (Fully Distributed) DHT-based

Reference Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. "The Google file system." ACM SIGOPS operating systems review. Vol. 37. No. 5. ACM, 2003. Sandberg, Russel, et al. "Design and implementation of the Sun network filesystem." Proceedings of the Summer USENIX conference. 1985. Muthitacharoen, Athicha, et al. "Ivy: A read/write peer-to-peer file system." ACM SIGOPS Operating Systems Review 36.SI (2002): 31-44. Naor, Moni, and Udi Wieder. "A simple fault tolerant distributed hash table."Peer-to-Peer Systems II. Springer Berlin Heidelberg, 2003. 88-97. Cai, Min, Ann Chervenak, and Martin Frank. "A peer-to-peer replica location service based on a distributed hash table." Proceedings of the 2004 ACM/IEEE conference on Supercomputing. IEEE Computer Society, 2004. Kleiman, Steve R. "Vnodes: An Architecture for Multiple File System Types in Sun UNIX." USENIX Summer. Vol. 86. 1986.