Network File Systems II Frangipani: A Scalable Distributed File System A Low-bandwidth Network File System.

Slides:



Advertisements
Similar presentations
Kernel memory allocation
Advertisements

Petal and Frangipani. Petal/Frangipani Petal Frangipani NFS “SAN” “NAS”
File Systems.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
File Systems Examples.
OVERVIEW LBFS MOTIVATION INTRODUCTION CHALLENGES ADVANTAGES OF LBFS HOW LBFS WORKS? RELATED WORK DESIGN SECURITY ISSUES IMPLEMENTATION SERVER IMPLEMENTATION.
G Robert Grimm New York University Disconnected Operation in the Coda File System.
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
File System Implementation
Coda file system: Disconnected operation By Wallis Chau May 7, 2003.
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
G Robert Grimm New York University SGI’s XFS or Cool Pet Tricks with B+ Trees.
Scalable Clusters Jed Liu 11 April Overview Microsoft Cluster Service Built on Windows NT Provides high availability services Presents itself to.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
1 File Management in Representative Operating Systems.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
Wide-area cooperative storage with CFS
Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc.
Northwestern University 2007 Winter – EECS 443 Advanced Operating Systems The Google File System S. Ghemawat, H. Gobioff and S-T. Leung, The Google File.
Case Study - GFS.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Network File Systems Victoria Krafft CS /4/05.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
A Low-Bandwidth Network File System A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, NYU.
A LOW-BANDWIDTH NETWORK FILE SYSTEM A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, New York U.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
Distributed File Systems
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
Chapter 20 Distributed File Systems Copyright © 2008.
UNIX File and Directory Caching How UNIX Optimizes File System Performance and Presents Data to User Processes Using a Virtual File System.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002.
Jinyong Yoon,  Andrew File System  The Prototype  Changes for Performance  Effect of Changes for Performance  Comparison with A Remote-Open.
Network File System Protocol
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
A Low-bandwidth Network File System Presentation by Joseph Thompson.
CS333 Intro to Operating Systems Jonathan Walpole.
Review CS File Systems - Partitions What is a hard disk partition?
Bigtable: A Distributed Storage System for Structured Data
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
W4118 Operating Systems Instructor: Junfeng Yang.
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Jonathan Walpole Computer Science Portland State University
Chapter 11: File System Implementation
File System Implementation
Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.
Chapter 12: File System Implementation
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
Distributed File Systems
Distributed File Systems
Distributed File Systems
THE GOOGLE FILE SYSTEM.
by Mikael Bjerga & Arne Lange
Distributed File Systems
Distributed File Systems
Presentation transcript:

Network File Systems II Frangipani: A Scalable Distributed File System A Low-bandwidth Network File System

Why Network File Systems? Scalability –support more users and data –handle server failure gracefully Improved accessibility –allow more users access –extend conditions under which access is feasible

File System Requirements Coherence: consistent, predictable file state Efficiency: timely reads and writes Security: provide access control Recoverability: allow backup of file system

Frangipani and LBFS Frangipani file system: transparent scalability –easy administration at any scale –takes advantage of parallelism for good performance Low Bandwidth File System (LBFS): reduce bandwidth to increase performance –takes advantage of duplicate file information –uses cacheing and compression to limit data volume

Features of Frangipani Petal: shared virtual disk Frangipani: provides naming and structure for Petal Lock system: distributed across servers Leases: manage connections with lower state requirements Backups: generated from Petal snapshots using the recovery process

An Example Configuration

The Petal Virtual Disk Storage read/written in blocks Sparse address space: 2 64 Physical storage allotted only on write Allows replication for high availability Read-only snapshot feature

Frangipani Disk Layout Region 1: Disk configuration info (1 TB) Region 2: Log space (1 TB), divided into 256 individual server logs Region 3: Allocation bitmaps (3 TB), chunks owned by individual servers

More Frangipani Disk Layout Region 4: Inodes (1 TB), byte inodes Region 5: Small data blocks (128 TB) 2 35 blocks at 4 KB each Region 6: Large data blocks, TB blocks

Frangipani Server Logs Bounded: 128 KB, split across physical disks Circular buffer scheme: 25% reclaimed when full Uses sequence numbers to mark wrap point 1000 to 1600 operations can be held in the log (entry size 80 to 128 bytes)

Server Logging Write-ahead redo policy File metadata and physical file dated updated on disk after log write Unix daemon handles disk writes every 30 seconds

Lock Service Many reader/single writer “sticky” locks Asynchronous communication Lamport’s Paxos algorithm replicates infrequently-changed data Heartbeat messages determine liveness

Locking: Avoiding Contention Single lockable data structure per disk sector eliminates false sharing Each file, directory, or symlink and its inode treated as a single lockable segment Lock algorithm for aquiring multiple locks to avoid deadlock

Crash Recovery Detection of server crash based on lapsed leases, no network response –Recovery daemon given now owns log and locks –Metadata sequence nums prevent update replay No high-level semantic guarantee to users! Petal snapshot can be used for entire system recovery

Performance Benchmarks

Frangipani: Conclusions Frangipani meets the goals set for it: –coherent access –easy administration –scalable performance (limit is network itself) –good failure recovery Testing on a larger scale will be the true test of Frangipani

Introduction to LBFS Designed for efficient remote file access over low bandwidth networks –Exploits similarities between files and file versions –Client maintains a large cache of working files –Compression further reduces data volume –Uses NFS protocol for access control and access to existing file systems

Why Do We Need LBFS? Typical network file systems designed for 10 Mbit/sec or better bandwidth Problems using FS over WAN: –interactive programs that freeze –batch commands that run several times slower –less agressive applications are starved –some applictions may not run at all!

Why LBS (contined) Downloading and editing files locally can lead to version conflicts Upstream bandwidth is still limited with broadband LBFS eliminates these problems while still preserving consistency

LBFS File Chunk Scheme Server and client keep index of hashed chunks –Server index has chunk hashes for entire FS –Client index has chunk hashes for working files In order to exploit commonality, files need to be broken into chunks

Chunk Creation Algorithm Need to handle shifting offsets while keeping the chunk index managable –Examine every overlapping 48 byte region of the file –With probability 2-13, consider a region to be a breakpoint, or file chunk end marker

Rabin Fingerprints Rabin fingerprints help find breakpoints –Polynomial representation of data modulo an irreducible polynomial –When the low 13 bits of a region’s fingerprint equals a certain number, then it is selected –Given random data, the expected chunk size is 2 13 = 8192 = 8 KB + 48 byte breakpoint

File Revisions With Breakpoints a. Original file c. Insert that includes breakpoint b. Text Insertion d. Elimination of a breakpoint

Breakpoint Pathological Cases Data is usually not random! Worst case scenarios: –All 48 byte regions are breakpoints: the chunk index same size as file –No 48 byte regions are breakpoints: large chunks take extra time and memory for RPC Solution: define bounds: –min chunk size = 2 KB –max chunk size = 64 KB

The Chunk Database Each chunk indexed by the first 64 bits of its SHA-1 hash Keys index tuples: must update when chunk changes LBFS always recomputes hash value before use –hash collisions are detected –penalty of bad DB data only performance hit

Benefits Provided by NFS 3 NFS 3 IDs files by opaque handles that persist through file renaming Handles access control for LBFS Allows LBFS to use NFS protocol to access existing file systems Disadvantage: i-number not changed when file is overwritten, so extra copy required

LBFS Protocol Enhancements Leases save permissions checks and data validation for recently- accessed files Uses RPC, but with agressive pipelining Gzip compression

Maintaining File Consistency Close-to-open consistency Client needs whole-file cache Multiple processes on a single client are allowed write access to same file simultaneously –LBFS writes back to file system on each close –Last close overwrites previous changes

Profile of a Read Request

Profile of a Write Request

Security: One Concern It is possible, through a systematic use of the CONDWRITE RPC call, to determine whether a particular hashed chunk exists in the file system: given away by response time variations

LBFS Server Implementation LBFS can run on top of another FS –server pretends to be an NFS client Server creates a.lbfs.trash dir at root of every exported system –stores temp files indefinitely and garbage collect a random file when full

LBFS Client Implementation Client uses xfs device driver –passes messages through device node in /dev –xfs tells LBFS when to transfer file contents to/from server –LBFS fetches files to client cache, notifies xfs driver of bindings between cache contents and open files

LBFS Performance Testing LBFS consumed far less bandwidth and allowed better application performance under test conditions –Workloads tested were typical applications of MSWord, gcc, and ed –CIFS, NFS, and AFS were tested (based on workload) for comparison –Also tested a “Leases and Gzip” only version

LBFS: Conclusions In low-bandwidth networks, LBFS out- performs the traditional file systems tested –similar consistency guarantees –implemented as transparent layer on top of an existing file system –public key cryptography provides security –client cacheing distributes load and reduces network dependency

Last Word: Frangipani & LBFS Both Frangipani and LBFS meet file system and distributed system requirements, but targeted different problems: –Frangipani achieved transparent scalability without performance loss –LBFS achieved feasible performance over WANs as a transparent add-on to a traditional FS using improved protocols and load sharing