DISTRIBUTED FILE SYSTEMS

Slides:



Advertisements
Similar presentations
DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM R. Sandberg, D. Goldberg S. Kleinman, D. Walsh, R. Lyon Sun Microsystems.
Advertisements

CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Distributed File Systems 17: Distributed File Systems
File System Implementation
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
Other File Systems: AFS, Napster. 2 Recap NFS: –Server exposes one or more directories Client accesses them by mounting the directories –Stateless server.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
1 DNS,NFS & RPC Rizwan Rehman, CCS, DU. Netprog: DNS and name lookups 2 Hostnames IP Addresses are great for computers –IP address includes information.
File Systems (2). Readings r Silbershatz et al: 11.8.
DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM R. Sandberg, D. Goldberg S. Kleinman, D. Walsh, R. Lyon Sun Microsystems.
Distributed File Systems Sarah Diesburg Operating Systems CS 3430.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed File Systems Steve Ko Computer Sciences and Engineering University at Buffalo.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Distributed File Systems 1 CS502 Spring 2006 Distributed Files Systems CS-502 Operating Systems Spring 2006.
Networked File System CS Introduction to Operating Systems.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Distributed File Systems
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
Chapter 20 Distributed File Systems Copyright © 2008.
What is a Distributed File System?? Allows transparent access to remote files over a network. Examples: Network File System (NFS) by Sun Microsystems.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
ITEC 502 컴퓨터 시스템 및 실습 Chapter 11-2: File System Implementation Mi-Jung Choi DPNM Lab. Dept. of CSE, POSTECH.
DISTRIBUTED FILE SYSTEMS
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
CE Operating Systems Lecture 17 File systems – interface and implementation.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
COT 4600 Operating Systems Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM.
Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas.
Review CS File Systems - Partitions What is a hard disk partition?
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
Distributed Systems: Distributed File Systems Ghada Ahmed, PhD. Assistant Prof., Computer Science Dept. Web:
1 DISTRIBUTED FILE SYSTEMS DEFINITIONS: A Distributed File System ( DFS ) is simply a classical model of a file system distributed across multiple machines.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Chapter 12: File System Implementation
Chapter 11: File System Implementation
Distributed File Systems
Distributed File Systems
Chapter 12: File System Implementation
File System Implementation
Storage Virtualization
Chapter 12: File System Implementation
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
Chapter 15: File System Internals
Today: Coda, xFS Case Study: Coda File System
Directory Structure A collection of nodes containing information about all files Directory Files F 1 F 2 F 3 F 4 F n Both the directory structure and the.
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Chapter 11: File System Implementation
Distributed File Systems
DISTRIBUTED FILE SYSTEMS
Distributed File Systems
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
CSE 451: Operating Systems Spring Module 21 Distributed File Systems
DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM
Distributed File Systems
Distributed File Systems
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
CSE 451: Operating Systems Distributed File Systems
Chapter 15: File System Internals
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Distributed File Systems
Chapter 15: File System Internals
Distributed File Systems
Network File System (NFS)
M05 DISTRIBUTED FILE SYSTEM
Presentation transcript:

DISTRIBUTED FILE SYSTEMS DEFINITIONS:   A Distributed File System ( DFS ) is simply a classical model of a file system distributed across multiple machines. The purpose is to promote sharing of dispersed files. This is an area of active research interest today. The resources on a particular machine are local to itself. Resources on other machines are remote. A file system provides a service for clients. The server interface is the normal set of file operations: create, read, etc. on files.

Introduction Distributed file systems support the sharing of information in the form of files throughout the intranet. A distributed file system enables programs to store and access remote files exactly as they do on local ones, allowing users to access files from any computer on the intranet. Recent advances in higher bandwidth connectivity of switched local networks and disk organization have lead high performance and highly scalable file systems.

DISTRIBUTED FILE SYSTEMS Definitions  Clients, servers, and storage are dispersed across machines. Configuration and implementation may vary - Servers may run on dedicated machines, OR Servers and clients can be on the same machines. The OS itself can be distributed with the file system a part of that distribution. A distribution layer can be interposed between a conventional OS and the file system. Clients should view a DFS the same way they would a centralized FS; the distribution is hidden at a lower level. Performance is concerned with throughput and response time.

DISTRIBUTED FILE SYSTEMS Naming and Transparency Naming is the mapping between logical and physical objects.   Example: A user filename maps to <cylinder, sector>. In a conventional file system, it's understood where the file actually resides; the system and disk are known. In a transparent DFS, the location of a file, somewhere in the network, is hidden. File replication means multiple copies of a file; mapping returns a SET of locations for the replicas. Location transparency - The name of a file does not reveal any hint of the file's physical storage location. File name still denotes a specific, although hidden, set of physical disk blocks.

DISTRIBUTED FILE SYSTEMS Naming and Transparency The ANDREW DFS AS AN EXAMPLE:   Is location independent. Supports file mobility. Separation of FS and OS allows for disk-less systems. These have lower cost and convenient system upgrades. The performance is not as good. NAMING SCHEMES: There are three main approaches to naming files: 1. Files are named with a combination of host and local name. This guarantees a unique name. NEITHER location transparent NOR location independent. Same naming works on local and remote files. The DFS is a loose collection of independent file systems.

DISTRIBUTED FILE SYSTEMS Naming and Transparency NAMING SCHEMES:   2. Remote directories are mounted to local directories. So a local system seems to have a coherent directory structure. The remote directories must be explicitly mounted. The files are location independent. SUN NFS is a good example of this technique. 3. A single global name structure spans all the files in the system. The DFS is built the same way as a local filesystem. Location independent.

DISTRIBUTED FILE SYSTEMS Naming and Transparency IMPLEMENTATION TECHNIQUES:   A non-transparent mapping technique: name ----> < system, disk, cylinder, sector > A transparent mapping technique: name ----> file_identifier ----> < system, disk, cylinder, sector > So when changing the physical location of a file, only the file identifier need be modified. This identifier must be "unique”.

DISTRIBUTED FILE SYSTEMS Remote File Access CACHING Reduce network traffic by retaining recently accessed disk blocks in a cache, so that repeated accesses to the same information can be handled locally. If required data is not already cached, a copy of data is brought from the server to the user. Perform accesses on the cached copy. Files are identified with one master copy residing at the server machine, Copies of (parts of) the file are scattered in different caches. Cache Consistency Problem -- Keeping the cached copies consistent with the master file. A remote service ((RPC) has these characteristic steps: The client makes a request for file access. The request is passed to the server in message format. The server makes the file access. Return messages bring the result back to the client.   This is equivalent to performing a disk access for each request.

DISTRIBUTED FILE SYSTEMS Remote File Access CACHE LOCATION:   Caching is a mechanism for maintaining disk data on the local machine. This data can be kept in the local memory or in the local disk. Caching can be advantageous both for read ahead and read again. The cost of getting data from a cache is a few HUNDRED instructions; disk accesses cost THOUSANDS of instructions. The master copy of a file doesn't move, but caches contain replicas of portions of the file. Caching behaves just like "networked virtual memory". What should be cached? << blocks <---> files >>. Bigger sizes give a better hit rate; smaller give better transfer times. Caching on disk gives: Better reliability. Caching in memory gives: The possibility of diskless work stations, Greater speed,

DISTRIBUTED FILE SYSTEMS Remote File Access COMPARISON OF CACHING AND REMOTE SERVICE:   Many remote accesses can be handled by a local cache. There's a great deal of locality of reference in file accesses. Servers can be accessed only occasionally rather than for each access. Caching causes data to be moved in a few big chunks rather than in many smaller pieces; this leads to considerable efficiency for the network. Disk accesses can be better optimized on the server if it's understood that requests are always for large contiguous chunks. Caching works best on machines with considerable local store - either local disks or large memories.

DISTRIBUTED FILE SYSTEMS Remote File Access STATEFUL VS. STATELESS SERVICE:   Stateful: A server keeps track of information about client requests. It maintains what files are opened by a client; connection identifiers; server caches. Memory must be reclaimed when client closes file or when client dies. Stateless: Each client request provides complete information needed by the server (i.e., filename, file offset ). The server can maintain information on behalf of the client, but it's not required.

DISTRIBUTED FILE SYSTEMS Remote File Access STATEFUL VS. STATELESS SERVICE:   Performance is better for stateful. Don't need to parse the filename each time, or "open/close" file on every request. Fault Tolerance: A stateful server loses everything when it crashes. Server must poll clients in order to renew its state. Client crashes force the server to clean up its encached information. Stateless remembers nothing so it can start easily after a crash.

DISTRIBUTED FILE SYSTEMS Remote File Access FILE REPLICATION:   Duplicating files on multiple machines improves availability and performance. Placed on failure-independent machines ( they won't fail together ). The main problem is consistency - when one copy changes, how do other copies reflect that change? Often there is a tradeoff: consistency versus availability and performance.   

General File Service Architecture The responsibilities of a DFS are typically distributed among three modules: Client module which emulates the conventional file system interface Server modules(2) which perform operations for clients on directories and on files. Most importantly this architecture enables stateless implementation of the server modules.

File service architecture Client computer Server computer Application program Client module Flat file service Directory service

File Service Architecture Flat File Service: Concerned with implementing operations on the concepts of files. Unique File Identifiers (UFIDs) are used to refer to files in all requests for flat file service operations. Responsibilities of file and directory service is based upon UFID (long sequence of bits so each file has UFID which is unique in DS).

File Service Architecture Directory Service: It provides a mapping between text names for files and their UFIDs Client Obtain UFID by quoting text name to the directory service. Client Module: Run on each client computer Integrate and expand the operations of the flat file service under single application programming interface.

What is NFS? First commercially successful network file system: Developed by Sun Microsystems for their diskless workstations Designed for robustness and “adequate performance” Sun published all protocol specifications Many many implementations

DISTRIBUTED FILE SYSTEMS SUN Network File System OVERVIEW:   Runs on SUNOS - NFS is both an implementation and a specification of how to access remote files. It's both a definition and a specific instance. The goal: to share a file system in a transparent way. Uses client-server model ( for NFS, a node can be both simultaneously.) Can act between any two nodes ( no dedicated server. ) Mount makes a server file-system visible from a client.

highlights NFS is stateless The virtual filesystem interface All client requests must be self-contained The virtual filesystem interface VFS operations VNODE operations Performance issues Impact of tuning on NFS performance

Objectives (I) Machine and Operating System Independence Could be implemented on low-end machines of the mid-80’s Fast Crash Recovery Major reason behind stateless design Transparent Access Remote files should be accessed in exactly the same way as local files

Objectives (II) UNIX semantics should be maintained on client Best way to achieve transparent access “Reasonable” performance Robustness and preservation of UNIX semantics were much more important

Basic design Three important parts The protocol The server side The client side

The protocol (I) Uses the Sun RPC mechanism and Sun eXternal Data Representation (XDR) standard Defined as a set of remote procedures Protocol is stateless Each procedure call contains all the information necessary to complete the call

Advantages of statelessness Crash recovery is very easy: When a server crashes, client just resends request until it gets an answer from the rebooted server Client cannot tell difference between a server that has crashed and recovered and a slow server Client can always repeat any request

Consequences of statelessness Read and writes must specify their start offset Server does not keep track of current position in the file User still use conventional UNIX reads and writes Open system call translates into several lookup calls to server

Server side (II) File handle consists of Server will store Filesystem id identifying disk partition I-node number identifying file within partition Generation number changed every time i-node is reused to store a new file Server will store Filesystem id in filesystem superblock I-node generation number in i-node

Client side (I) Provides transparent interface to NFS Mapping between remote file names and remote file addresses is done a server boot time through remote mount Extension of UNIX mounts Specified in a mount table Makes a remote subtree appear part of a local subtree

Remote mount Client tree Server subtree rmount / Server subtree usr rmount bin After rmount, root of server subtree can be accessed as /usr

Distributed File System: Naming choices (always an issue): Network Read File Data Client Server Distributed File System: Transparent access to files stored on a remote disk Naming choices (always an issue): Hostname:localname: Name files explicitly No location or migration transparency Mounting of remote file systems System manager mounts remote file system by giving name and local mount point Transparent to user: all reads and writes look like local reads and writes to user e.g. /users/sue/foo/sue/foo on server A single, global name space: every file in the world has unique name Location Transparency: servers can change and files can move without involving user mount coeus:/sue kubi:/prog kubi:/jane

Client side (II) Provides transparent access to NFS New virtual filesystem interface supports VFS calls, which operate on whole file system VNODE calls, which operate on individual files Treats all files in the same fashion

Client side (III) User interface is unchanged UNIX system calls VNODE/VFS Common interface Other FS NFS UNIX FS disk RPC/XDR LAN

The Mount Protocol The mount protocol provides four basic services that clients need before they can use NFS: It allows the client to obtain a list of the directory hierarchies (i.e. the file systems) that the client can access through NFS. It accepts full path names That allow the client to identify a particular directory hierarchy. It authenticates each client’s request and validates the client’s permission to access the requested hierarchy. It returns a file handle for the root directory of the hierarchy a client specifies. The client uses the root handle obtained from the mount protocol when making NFS calls.

DISTRIBUTED FILE SYSTEMS SUN Network File System  NFS ARCHITECTURE:   Follow local and remote access through this figure:

Virtual File System (VFS) VFS: Virtual abstraction similar to local file system Instead of “inodes” has “vnodes” Compatible with a variety of local and remote file systems provides object-oriented way of implementing file systems VFS allows the same system call interface (the API) to be used for different types of file systems The API is to the VFS interface, rather than any specific type of file system

DISTRIBUTED FILE SYSTEMS SUN Network File System NFS ARCHITECTURE:   1. UNIX filesystem layer - does normal open / read / etc. commands. 2. Virtual file system ( VFS ) layer - Gives clean layer between user and filesystem. Acts as deflection point by using global vnodes. Understands the difference between local and remote names. Keeps in memory information about what should be deflected (mounted directories) and how to get to these remote directories.  3. System call interface layer - Presents sanitized validated requests in a uniform way to the VFS.

DISTRIBUTED FILE SYSTEMS SUN Network File System CACHES OF REMOTE DATA:   The client keeps: File block cache - ( the contents of a file ) File attribute cache - ( file header info (inode in UNIX) ). The local kernel hangs on to the data after getting it the first time. On an open, local kernel, it checks with server that cached data is still OK. Cached attributes are thrown away after a few seconds.

NFS solution (I) Stateless server does not know how many users are accessing a given file Clients do not know either Clients must Frequently send their modified blocks to the server Frequently ask the server to revalidate the blocks they have in their cache

NFS Pros and Cons NFS Pros: NFS Cons: Simple, Highly portable Sometimes inconsistent! Doesn’t scale to large # clients Must keep checking to see if caches out of date Server becomes bottleneck due to polling traffic

AFS: Andrew File System The Andrew File System (AFS) is a location- independent file system. AFS makes it easy for people to work together on the same files, no matter where the files are located. AFS users do not have to know which machine is storing a file. AFS is a distributed file system which make it as easy to access files stored on a remote computer as files stored on the local disk.

Andrew File System Andrew File System (AFS, late 80’s)  DCE DFS (commercial product) Callbacks: Server records who has copy of file On changes, server immediately tells all with old copy No polling bandwidth (continuous checking) needed Write through on close Changes not propagated to server until close() Session semantics: updates visible to other clients only after the file is closed As a result, do not get partial writes: all or nothing! Although, for processes on local machine, updates visible immediately to other programs who have file open In AFS, everyone who has file open sees old version Don’t get newer versions until reopen file

Andrew File System (con’t) Data cached on local disk of client as well as memory On open with a cache miss (file not on local disk): Get file from server, set up callback with server On write followed by close: Send copy to server; tells all clients with copies to fetch new version from server on next open (using callbacks) What if server crashes? Lose all callback state! Reconstruct callback information from client: go ask everyone “who has which files cached?” AFS Pro: Relative to NFS, less server load: Disk as cache  more files can be cached locally Callbacks  server not involved if file is read-only For both AFS and NFS: central server is bottleneck! Availability: Server is single point of failure Cost: server machine’s high cost relative to workstation

Conclusion To allow many clients to access a server and to keep the servers isolated from client crashes, NFS uses stateless servers. NFS adopted the open-read-write-close paradigm used in UNIX, along with basic file types and file protection modes.