Other File Systems: LFS, NFS, and AFS

Slides:



Advertisements
Similar presentations
More on File Management
Advertisements

CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
Distributed File Systems 17: Distributed File Systems
Ken Birman. Distributed File Systems Goal: view a distributed system as a file system Storage is distributed Web tries to make world a collection of hyperlinked.
Distributed File Systems CS 3100 Distributed File Systems1.
Distributed File Systems
File System Implementation
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
Distributed File Systems: RPC, NFS, and AFS
Other File Systems: AFS, Napster. 2 Recap NFS: –Server exposes one or more directories Client accesses them by mounting the directories –Stateless server.
Chapter 17: Distributed-File Systems Part Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 17 Distributed-File Systems Chapter.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
File Systems Implementation. 2 Recap What we have covered: –User-level view of FS –Storing files: contiguous, linked list, memory table, FAT, I-nodes.
Jeff Chheng Jun Du.  Distributed file system  Designed for scalability, security, and high availability  Descendant of version 2 of Andrew File System.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
File Systems Implementation. 2 Goals for Today Filesystem Implementation Structure for –Storing files –Directories –Managing free space –Shared files.
File Systems (2). Readings r Silbershatz et al: 11.8.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed File Systems Steve Ko Computer Sciences and Engineering University at Buffalo.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Distributed File Systems 1 CS502 Spring 2006 Distributed Files Systems CS-502 Operating Systems Spring 2006.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Advanced Operating Systems - Spring 2009 Lecture 21 – Monday April 6 st, 2009 Dan C. Marinescu Office: HEC 439 B. Office.
Distributed File Systems
Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Silberschatz, Galvin and Gagne ©2007 Chapter 11: File System Implementation.
What is a Distributed File System?? Allows transparent access to remote files over a network. Examples: Network File System (NFS) by Sun Microsystems.
File Systems Implementation
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
1 Shared Files Sharing files among team members A shared file appearing simultaneously in different directories Share file by link File system becomes.
Dr. M. Munlin Network and Distributed System Structures 1 NETE0516 Operating Systems Instructor: ผ. ศ. ดร. หมัดอามีน หมัน หลิน Faculty of Information.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Log-Structured File Systems
Presented By: Samreen Tahir Coda is a network file system and a descendent of the Andrew File System 2. It was designed to be: Highly Highly secure Available.
ITEC 502 컴퓨터 시스템 및 실습 Chapter 11-2: File System Implementation Mi-Jung Choi DPNM Lab. Dept. of CSE, POSTECH.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
CS333 Intro to Operating Systems Jonathan Walpole.
Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas.
Review CS File Systems - Partitions What is a hard disk partition?
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
Distributed Systems: Distributed File Systems Ghada Ahmed, PhD. Assistant Prof., Computer Science Dept. Web:
Case Study -- Sun’s Network File System (NFS) NFS is popular and widely used. NFS was originally designed and implemented by Sun Microsystems for use on.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Chapter 12: File System Implementation
Chapter 17: Distributed-File Systems
Jonathan Walpole Computer Science Portland State University
File System Implementation
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
Chapter 17: Distributed-File Systems
Multiple Processor Systems
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Distributed File Systems
DISTRIBUTED FILE SYSTEMS
Distributed File Systems
CSE 451: Operating Systems Spring Module 21 Distributed File Systems
Distributed File Systems
Multiple Processor and Distributed Systems
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Chapter 15: File System Internals
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Distributed File Systems
Chapter 17: Distributed-File Systems
Distributed File Systems
M05 DISTRIBUTED FILE SYSTEM
Presentation transcript:

Other File Systems: LFS, NFS, and AFS

Goals for Today Discuss specific file systems both local and remote Log-structured file system (LFS) Distributed file systems (DFS) Network file system (NFS) Andrew file system (AFS)

Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger So, a lot of reads do not require disk access Most disk accesses are writes  pre-fetching not very useful Worse, most writes are small  10 ms overhead for 50 µs write Example: to create a new file: i-node of directory needs to be written Directory block needs to be written i-node for the file has to be written Need to write the file Delaying these writes could hamper consistency Solution: LFS to utilize full disk bandwidth

LFS Basic Idea Structure the disk a log Periodically, all pending writes buffered in memory are collected in a single segment The entire segment is written contiguously at end of the log Segment may contain i-nodes, directory entries, data Start of each segment has a summary If segment around 1 MB, then full disk bandwidth can be utilized Note, i-nodes are now scattered on disk Maintain i-node map (entry i points to i-node i on disk) Part of it is cached, reducing the delay in accessing i-node This description works great for disks of infinite size

LFS vs. UFS inode directory data inode map Unix File System Log Blocks written to create two 1-block files: dir1/file1 and dir2/file2, in UFS and LFS Log Log-Structured File System file1 file2

LFS Cleaning Finite disk space implies that the disk is eventually full Fortunately, some segments have stale information A file overwrite causes i-node to point to new blocks Old ones still occupy space Solution: LFS Cleaner thread compacts the log Read segment summary, and see if contents are current File blocks, i-nodes, etc. If not, the segment is marked free, and cleaner moves forward Else, cleaner writes content into new segment at end of the log The segment is marked as free! Disk is a circular buffer, writer adds contents to the front, cleaner cleans content from the back

Distributed File Systems Goal: view a distributed system as a file system Storage is distributed Web tries to make world a collection of hyperlinked documents Issues not common to usual file systems Naming transparency Load balancing Scalability Location and network transparency Fault tolerance We will look at some of these today

Transfer Model Upload/download Model: Remote Access Model: Client downloads file, works on it, and writes it back on server Simple and good performance Remote Access Model: File only on server; client sends commands to get work done

Naming transparency Naming is a mapping from logical to physical objects Ideally client interface should be transparent Not distinguish between remote and local files /machine/path or mounting remote FS in local hierarchy are not transparent A transparent DFS hides the location of files in system 2 forms of transparency: Location transparency: path gives no hint of file location /server1/dir1/dir2/x tells x is on server1, but not where server1 is Location independence: move files without changing names Separate naming hierarchy from storage devices hierarchy

File Sharing Semantics Sequential consistency: reads see previous writes Ordering on all system calls seen by all processors Maintained in single processor systems Can be achieved in DFS with one file server and no caching

Caching Keep repeatedly accessed blocks in cache How it works: Improves performance of further accesses How it works: If needed block not in cache, it is fetched and cached Accesses performed on local copy One master file copy on server, other copies distributed in DFS Cache consistency problem: how to keep cached copy consistent with master file copy Where to cache? Disk: Pros: more reliable, data present locally on recovery Memory: Pros: diskless workstations, quicker data access, Servers maintain cache in memory

File Sharing Semantics Other approaches: Write through caches: immediately propagate changes in cache files to server Reliable but poor performance Delayed write: Writes are not propagated immediately, probably on file close Session semantics (AFS): write file back on close Alternative (NFS): scan cache periodically and flush modified blocks Better performance but poor reliability File Locking: The upload/download model locks a downloaded file Other processes wait for file lock to be released

Network File System (NFS) Developed by Sun Microsystems in 1984 Used to join FSes on multiple computers as one logical whole Used commonly today with UNIX systems Assumptions Allows arbitrary collection of users to share a file system Clients and servers might be on different LANs Machines can be clients and servers at the same time Architecture: A server exports one or more of its directories to remote clients Clients access exported directories by mounting them The contents are then accessed as if they were local

Example

NFS Mount Protocol Client sends path name to server with request to mount Not required to specify where to mount If path is legal and exported, server returns file handle Contains FS type, disk, i-node number of directory, security info Subsequent accesses from client use file handle Mount can be either at boot or automount Using automount, directories are not mounted during boot OS sends a message to servers on first remote file access Automount is helpful since remote dir might not be used at all Mount only affects the client view!

NFS Protocol Supports directory and file access via remote procedure calls (RPCs) All UNIX system calls supported other than open & close Open and close are intentionally not supported For a read, client sends lookup message to server Server looks up file and returns handle Unlike open, lookup does not copy info in internal system tables Subsequently, read contains file handle, offset and num bytes Each message is self-contained Pros: server is stateless, i.e. no state about open files Cons: Locking is difficult, no concurrency control

NFS Implementation Three main layers: System call layer: Handles calls like open, read and close Virtual File System Layer: Maintains table with one entry (v-node) for each open file v-nodes indicate if file is local or remote If remote it has enough info to access them For local files, FS and i-node are recorded NFS Service Layer: This lowest layer implements the NFS protocol

NFS Layer Structure

How NFS works? Mount: Open: Sys ad calls mount program with remote dir, local dir Mount program parses for name of NFS server Contacts server asking for file handle for remote dir If directory exists for remote mounting, server returns handle Client kernel constructs v-node for remote dir Asks NFS client code to construct r-node for file handle Open: Kernel realizes that file is on remotely mounted directory Finds r-node in v-node for the directory NFS client code then opens file, enters r-node for file in VFS, and returns file descriptor for remote node

Cache coherency Clients cache file attributes and data Solutions: If two clients cache the same data, cache coherency is lost Solutions: Each cache block has a timer (3 sec for data, 30 sec for dir) Entry is discarded when timer expires On open of cached file, its last modify time on server is checked If cached copy is old, it is discarded Every 30 sec, cache time expires All dirty blocks are written back to the server

Andrew File System (AFS) Named after Andrew Carnegie and Andrew Mellon Transarc Corp. and then IBM took development of AFS In 2000 IBM made OpenAFS available as open source Features: Uniform name space Location independent file sharing Client side caching with cache consistency Secure authentication via Kerberos Server-side caching in form of replicas High availability through automatic switchover of replicas Scalability to span 5000 workstations

AFS Overview Based on the upload/download model Clients download and cache files Server keeps track of clients that cache the file Clients upload files at end of session Whole file caching is central idea behind AFS Later amended to block operations Simple, effective AFS servers are stateful Keep track of clients that have cached files Recall files that have been modified

AFS Details Has dedicated server machines Clients have partitioned name space: Local name space and shared name space Cluster of dedicated servers (Vice) present shared name space Clients run Virtue protocol to communicate with Vice Clients and servers are grouped into clusters Clusters connected through the WAN Other issues: Scalability, client mobility, security, protection, heterogeneity

AFS: Shared Name Space AFS’s storage is arranged in volumes Usually associated with files of a particular client AFS dir entry maps vice files/dirs to a 96-bit fid Volume number Vnode number: index into i-node array of a volume Uniquifier: allows reuse of vnode numbers Fids are location transparent File movements do not invalidate fids Location information kept in volume-location database Volumes migrated to balance available disk space, utilization Volume movement is atomic; operation aborted on server crash

AFS: Operations and Consistency AFS caches entire files from servers Client interacts with servers only during open and close OS on client intercepts calls, and passes it to Venus Venus is a client process that caches files from servers Venus contacts Vice only on open and close Does not contact if file is already in the cache, and not invalidated Reads and writes bypass Venus Works due to callback: Server updates state to record caching Server notifies client before allowing another client to modify Clients lose their callback when someone writes the file Venus caches dirs and symbolic links for path translation

AFS Implementation Client cache is a local directory on UNIX FS Venus and server processes access file directly by UNIX i-node Venus has 2 caches, one for status & one for data Uses LRU to keep them bounded in size

Summary LFS: NFS: AFS: Local file system Optimize writes Simple distributed file system protocol. No open/close Stateless server Has problems with cache consistency, locking protocol AFS: More complicated distributed file system protocol Stateful server session semantics: consistency on close

Enjoy Spring Break!!!

Storage Area Networks (SANs) New generation of architectures for managing storage in massive data centers For example, Google is said to have 50,000-200,000 computers in various centers Amazon is reaching a similar scale A SAN system is a collection of file systems with tools to help humans administer the system

Examples of SAN issues Where should a file be stored Many of these systems have an indirection mechanism so that a file can move from volume to volume Allows files to migrate, e.g. from a slow server to a fast one or from long term storage onto an active disk system Eco-computing: systems that seek to minimize energy in big data centers

Examples of SAN issues Disk-to-disk backup Might want to do very fast automated backups Ideally, can support this while the disk is actively in use Easiest if two disks are next to each other Challenge: back up entire data center in New York at site in Kentucky US Dept of Treasury e-Cavern

File System Reliability 2 considerations: backups and consistency Why backup? Recover from disaster Recover from stupidity Where to backup? Tertiary storage Tape: holds 10 or 100s of GBs, costs pennies/GB sequential access  high random access time Backup takes time and space

Backup Issues Should the entire FS be backup up? Binaries, special I/O files usually not backed up Do not backup unmodified files since last backup Incremental dumps: complete per month, modified files daily Compress data before writing to tape How to backup an active FS? Not acceptable to take system offline during backup hours Security of backup media

Backup Strategies Physical Dump Logical Dump Start from block 0 of disk, write all blocks in order, stop after last Pros: Simple to implement, speed Cons: skip directories, incremental dumps, restore some file No point dumping unused blocks, avoiding it is a big overhead How to dump bad blocks? Logical Dump Start at a directory dump all directories and files changed since base date Base date could be of last incremental dump, last full dump, etc. Also dump all dirs (even unmodified) in path to a modified file

Logical Dumps Why dump unmodified directories? Restore files on a fresh FS To incrementally recover a single file File that has not changed

A Dumping Algorithm Algorithm: Mark all dirs & modified files Unmark dirs with no mod. files Dump dirs Dump modified files

Logical Dumping Issues Reconstruct the free block list on restore Maintaining consistency across symbolic links UNIX files with holes Should never dump special files, e.g. named pipes