Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.

Similar presentations


Presentation on theme: "Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference."— Presentation transcript:

1 Distributed File System

2 Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference

3 DFS A distributed implementation of the classical time sharing model of a file system, where multiple users share files and storage resources.

4 Key Characteristics of DFS Dispersion Clients and files Multiplicity Clients and files

5 Primary issues of DFS Naming and Transparency Fault Tolerance

6 Naming Naming – mapping between logical and physical objects. Multilevel mapping. Transparent replicas and location

7 Naming Schemes — Three Main Approaches Host name + local name  guarantees a unique system wide name. Mount remote directories to local directories  once mounted, files can be referenced in a location-transparent manner Total integration of the component file systems.  A single global name structure  If a server is unavailable, some arbitrary set of directories on on different machines also becomes unavailable

8 Transparency(1) Login Transparency: User can log in at any host with uniform login procedure and perceive a uniform view of the file system. Access Transparency: Client process on a hots has uniform mechanism to access all files in system regardeless of files are on local/remote host. Location Transparency: The names of the files do not reveal their physical location.

9 Transparency(2) Concurrency Transparency: An update to a file should not have effect on the correct execution of other process that is concurrently sharing a file. Replication Transparency: Files may be replicated to provide redundancy for availability and also to permit concurrent access for efficiency.

10 Fault Tolerance Stateful Vs. Stateless  Maintain information on client File Replication

11 Distinctions Between Stateful & Stateless Service Failure Recovery.  A stateful server loses all its volatile state in a crash.  With stateless server, the effects of server failure and recovery are almost unnoticeable.

12 File Replication Several copies of a file's contents at different locations enable multiple servers to share the load of providing the service Naming scheme maps a replicated file name to a particular replica. Updates

13 Current Project HDFS: Hadoop Distributed File System Distributed parallel fault tolerant file system. It is designed to reliably store very large files across machines in a large cluster. Efficient, reliable, and open source

14 Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both Map/Reduce and the distributed file system are designed so that node failures are automatically handled by the framework.

15 HDFS Hadoop's Distributed File System is designed to reliably store very large files across machines in a large cluster. It is inspired by the Google File System. Hadoop DFS stores each file as a sequence of blocks, all blocks in a file except the last block are the same size. Blocks belonging to a file are replicated for fault tolerance. The block size and replication factor are configurable per file. Files in HDFS are "write once" and have strictly one writer at any time. Hadoop Distributed File System – Goals: Store large data sets Cope with hardware failure Emphasize streaming data access

16 Architecture Like Hadoop Map/Reduce, HDFS follows a master/slave architecture. An HDFS installation consists of a single Namenode, a master server that manages the filesystem namespace and regulates access to files by clients. In addition, there are a number of Datanodes, one per node in the cluster, which manage storage attached to the nodes that they run on. The Namenode makes filesystem namespace operations like opening, closing, renaming etc. of files and directories available via an RPC interface. It also determines the mapping of blocks to Datanodes. The Datanodes are responsible for serving read and write requests from filesystem clients, they also perform block creation, deletion, and replication upon instruction from the Namenode.

17

18 Naming: central metadata server Synchronization: write-once-read-many, give locks on objects to clients, using leases Consistency and replication: server side replication, asynchronous replication, checksum Fault tolerance: failure as norm Security: no dedicated security mechanism

19 Future Work Robustness of data sharing model The preceding section, architecture, naming, synchronization, availability, heterogeneity and support for databases Security

20 Reference [1] Thanh, T.D.; Mohan, S.; Choi, E.; SangBum Kim; Pilsung Kim. 2008Networked Computing and Advanced Information Management. “A Taxonomy and Survey on Distributed File Systems” [2] Randy chow,1997,Distributed operating systems & Algorithms [3] Eliezer Levy, Abraham Silberschatz. December 1990 Computing Surveys (CSUR), Volume 22 Issue 4. ”Distributed file systems: concepts and examples”. [4]http://hadoop.apache.org/common/docs/current/hdfs_design.html#Introd uction [4]http://hadoop.apache.org/common/docs/current/hdfs_design.html#Introd uction [5]http://www.snia.org/events/wintersymp2009/cloud/dhruba_hadoop_snia. pdf [5]http://www.snia.org/events/wintersymp2009/cloud/dhruba_hadoop_snia. pdf

21 [6]http://en.wikipedia.org/wiki/List_of_file_systems#Distributed_file_s ystems [6]http://en.wikipedia.org/wiki/List_of_file_systems#Distributed_file_s ystems [7]http://en.wikipedia.org/wiki/Hadoop#Hadoop_Distributed_File_Sy stem [7]http://en.wikipedia.org/wiki/Hadoop#Hadoop_Distributed_File_Sy stem [8]http://www.cs.gsu.edu/~cscyqz/courses/aos/slides08/ch6.1- Fall08.pptx [8]http://www.cs.gsu.edu/~cscyqz/courses/aos/slides08/ch6.1- Fall08.pptx

22 Q&A?

23 Thank you!


Download ppt "Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference."

Similar presentations


Ads by Google