A Survey on Distributed File Systems

Slides:



Advertisements
Similar presentations
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Advertisements

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.
The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Ceph: A Scalable, High-Performance Distributed File System
Ceph: A Scalable, High-Performance Distributed File System Sage Weil Scott Brandt Ethan Miller Darrell Long Carlos Maltzahn University of California, Santa.
Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
The Google File System.
Case Study - GFS.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.
Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Serverless Network File Systems Overview by Joseph Thompson.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Ceph: A Scalable, High-Performance Distributed File System
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Distributed File Systems Architecture – 11.1 Processes – 11.2 Communication – 11.3 Naming – 11.4.
Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
1 CMPT 431© A. Fedorova Google File System A real massive distributed file system Hundreds of servers and clients –The largest cluster has >1000 storage.
File-System Management
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
Chapter 11: File System Implementation
Google File System.
Network Configurations
File System Implementation
CHAPTER 3 Architectures for Distributed Systems
Google Filesystem Some slides taken from Alan Sussman.
Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
Introduction to client/server architecture
TYPES OFF OPERATING SYSTEM
Replication Middleware for Cloud Based Storage Service
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
The Google File System (GFS)
CLUSTER COMPUTING.
The Google File System (GFS)
Distributed File Systems
Distributed File Systems
The Google File System (GFS)
The Google File System (GFS)
Specialized Cloud Architectures
Distributed File Systems
CSE 451: Operating Systems Distributed File Systems
The Google File System (GFS)
THE GOOGLE FILE SYSTEM.
Introduction To Distributed Systems
by Mikael Bjerga & Arne Lange
Database System Architectures
Distributed File Systems
Distributed File Systems
The Google File System (GFS)
Presentation transcript:

A Survey on Distributed File Systems By Priyank Gupta November 5, 2012

Introduction Definition “..is any file system which can be accessed by multiple hosts sharing via a computer network” Allows sharing of files between multiple clients and storage resources. With evolution of Large Number of large scale data intensive applications, there is a challenge of managing large amounts of data over multiple computers.

Key Design Issues (1/2) Transparency Fault Tolerance Scalability Essentially, the interface should be designed such that client sees no difference between local machine vs remote server files. Fault Tolerance System should be able continue working without any data loss even when a certain number of nodes develop a fault. Scalability System should be designed such that future increase in load can be easily handled by addition of extra resources and less degradation in performance.

Key Design Issues (2/2) Security Performance Same data set can be accessed by multiple nodes. Therefore access control needs to be controlled and at times restricted. Performance Usually quantified by measuring time taken by a system to satisfy service requests. These may be a combination of disk access time and CPU processing time. The goal usually is to reach levels close to that of a centralized file system.

Outlook (1/2) Newer studies based upon dynamic analysis of monitoring file system states in real time. Data in the order of TBs and multi-GBs is the norm. Therefore, block size of the conventional file system is revised. File Access Patterns tend to be either read or write, especially for files which are accessed more frequently. This information can be used for optimization

Outlook (2/2) Small levels of caching reduces read traffic drastically Memory mapping is used extensively in modern workloads. As a result, if the file is kept in the memory till a process is memory mapping it then the amount of miss rate can be kept at a minimum. Meta data can be more expensive then the actual file data itself.

The Google File System Designed to meet rapidly growing demands of Google’s data processing needs Built of inexpensive commodity components that are going to fail therefore, there is constant monitoring of the system in order to tolerate and recover. Huge file sizes(TB or Multi GB). Operations are mainly append. High sustained bandwidth is more important than latency

Architecture

Architecture Consists of single master, multiple chunk servers and the GFS client. Master maintains file system metadata which is instrumental in locating chunks of data corresponding to the actual data on various chunk servers. Master provides this information to client after that the client contacts chunk servers directly to perform operations.

Read operation

Write Operation

Performance Measurements

Limitations Not POSIX compliant. As a result its applications may not be easily portable to other distributed computing environments Replica management involves choosing replica chunk servers from different communication racks. Therefore there is more latency when a write is performed as this operation is propagated through different communication racks. The system may not work efficiently if there are a large number of small files.

Ceph Open source distributed file system. Capable of handling peta bytes of storage easily. Just like GFS, is built with commercial grade components and assumes dynamic workloads. System is built incrementally. Scalable design. Intelligent Object Storage Devices such as CPU, cache etc make low level block allocation decisions. Clients interact with the metadata server to perform operations such as open, rename and directly communicate with the OSDs to perform file IO such as read and write.

Architecture (1/2) Decouples data and metadata operations by eliminating file allocation tables and replacing them with CRUSH functions. Ceph employs adaptive metadata cluster distribution architecture which improves the distribution of data significantly and makes the system highly scalable. Ceph makes efficient use of the intelligence of the OSDs and uses them for data access as well as serialization update.

Architecture (2/2) Three main components: The Client: POSIX file system interface to a process Cluster of OSDs: collectively stores all data and metadata Metadata Server Cluster: manages file names and directories, consistency and coherence.

Architecture

Client Client runs at the user end and can be acces via linking to it or as a mounted file system via FUSE File IO: The metadata cluster traverses the file system and sends the inode number of the file requested by the client. Client synchronization: POSIX semantics are follwed to ensure synchronization between file related operations. Typically burden of serialization and synchronization is put on the OSD storing each object.

OSD Cluster (1/3) Reliable Autonomic Distributed Object Store (RADOS) approach ensures linear scaling by using OSDs extensively for replication cluster expansion, fault detection etc. CRUSH (Controlled Replication Under Scalable HASHING) is a data distribution function which efficiently maps a group of objects into an ordered list of OSD To locate an object CRUSH requires only the group and cluster map and eables the client, OSD or MDS to calculate the location without exchange of distribution related material.

OSD Cluster (2/3)

OSD Cluster (3/3) Files are mapped to objects using the inode number parameter. The objects are in turn mapped to placement groups using a hashing function. The placement groups are them mapped to OSDs using CRUSH. THE OSDs are grouped on different communication racks such that data retrieval can be carried out even if an OSDs goes faulty.

Metadata Server Cluster (1/2) Cluster is diskless which simply serves as an index to the OSD cluster for facilitating read and write. File and directory metadata is very small which is essentially a collection of directory entries and inodes. Typically half of total system workloads are metadata operations therefore lighter, simpler workloads have greater impact on efficiency. Adaptively and intelligently distributes responsibility for managing file system directory.

Metadata Server Cluster (2/2)

Performance (1/4) Test includes a 14 node OSD cluster and load is generated by 400 clients on 20 other nodes It is quite clear that the OSD throughput value seems to reach the theoretical threshold for larger number of replicas

Performance (2/4) 2 & 3 replicas show no difference Network transmission latency dominates overall latency at higher write sizes

Performance (3/4) OSD throughput scales linearly with size of OSD cluster upto the point where network switch is saturated. Having a larger number of placement groups (PG) results in increased per node throughput.

Metadat Operation Scaling Test involves 430 node cluster while varying the number of MDS with meta data only operations Results indicate a slowdown of 50% for large clusters.

Limitations Ceph file system design currently does not have any features that take care of security. The file system design trusts all the nodes. Although failure recovery issue has been addressed at the OSD level, Ceph developers have not addressed the issue of a failure recovery at the Metadata Cluster level.

Conclusion One of the most reliable solutions for large shared data. Built with consumer grade components which are expected to fail at some point. Future systems will use the concept of decoupling file system related metadata and the actual file data Compatibility with various environments will be important. Heterogenous file systems will be an important design challenge of the future.