A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002.

Slides:



Advertisements
Similar presentations
The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
Advertisements

The Zebra Striped Network File System Presentation by Joseph Thompson.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
OVERVIEW LBFS MOTIVATION INTRODUCTION CHALLENGES ADVANTAGES OF LBFS HOW LBFS WORKS? RELATED WORK DESIGN SECURITY ISSUES IMPLEMENTATION SERVER IMPLEMENTATION.
Bandwidth and latency optimizations Jinyang Li w/ speculator slides from Ed Nightingale.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
File System Implementation
Other File Systems: AFS, Napster. 2 Recap NFS: –Server exposes one or more directories Client accesses them by mounting the directories –Stateless server.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Rsync: Efficiently Synchronizing Files Using Hashing By David Shao For CS 265, Spring 2004.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
Network File System (NFS) in AIX System COSC513 Operation Systems Instructor: Prof. Anvari Yuan Ma SID:
THE EVOLUTION OF NFS Dave Hitz and Andy Watson Network Appliance, Inc.
File Systems (2). Readings r Silbershatz et al: 11.8.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM R. Sandberg, D. Goldberg S. Kleinman, D. Walsh, R. Lyon Sun Microsystems.
Distributed File Systems Sarah Diesburg Operating Systems CS 3430.
Network File Systems Victoria Krafft CS /4/05.
Lecture 23 The Andrew File System. NFS Architecture client File Server Local FS RPC.
Sun NFS Distributed File System Presentation by Jeff Graham and David Larsen.
A Low-Bandwidth Network File System A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, NYU.
Network File Systems II Frangipani: A Scalable Distributed File System A Low-bandwidth Network File System.
A LOW-BANDWIDTH NETWORK FILE SYSTEM A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, New York U.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
1 The Google File System Reporter: You-Wei Zhang.
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
Spring 2000Nitin BahadurAdvanced Computer Networks A Comparison of Mechanisms for Improving TCP Performance over Wireless Links By: Hari B., Venkata P.
Networked File System CS Introduction to Operating Systems.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Distributed File Systems
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
Practical Byzantine Fault Tolerance
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Serverless Network File Systems Overview by Joseph Thompson.
SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan.
CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Jinyong Yoon,  Andrew File System  The Prototype  Changes for Performance  Effect of Changes for Performance  Comparison with A Remote-Open.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
A Low-bandwidth Network File System Presentation by Joseph Thompson.
CS333 Intro to Operating Systems Jonathan Walpole.
Solutions for the Fourth Problem Set COSC 6360 Fall 2014.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
Alex Chee Daniel LaBare Mike Oster John Spann Bryan Unbangluang Collaborative Document Sharing In Conjunction With.
Distributed Systems: Distributed File Systems Ghada Ahmed, PhD. Assistant Prof., Computer Science Dept. Web:
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Ivy: A Read/Write Peer-to- Peer File System Authors: Muthitacharoen Athicha, Robert Morris, Thomer M. Gil, and Benjie Chen Presented by Saurabh Jha 1.
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
File System Implementation
Web Caching? Web Caching:.
Dave Hitz and Andy Watson Network Appliance, Inc
Synchronization in Distributed File System
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
Kalyan Boggavarapu Lehigh University
Distributed File Systems
Distributed File Systems
Dave Hitz and Andy Watson Network Appliance, Inc
Distributed File Systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Distributed File Systems
Distributed File Systems
Presentation transcript:

A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002

Problem Statement Network file systems cannot be used over slower WANs due to longer delay, less available bandwidth Ad hoc solutions: Copy file to local machine, edit, then upload. Risks update conflicts. Use remote login. Potentially large interactive delays.

Overview of Proposed Solution Transfer less data over the network by exploiting inter-file similarities Follow close-to-open consistency model Efficiently divide files into “chunks” and only transfer chunks the remote machine does not already have

Key Contribution Proposes file update mechanism which avoids transferring redundant data from different versions of a file over the bottleneck resource while providing some consistency. Use hashing to exploit inter-file similarities and save bandwidth Provide an efficient scheme to delineate files while not causing massive breakpoint changes for small modifications Decrease latency delay by pipelining read and write RPCs Provide robustness despite client failures

Related Work AFS: Servers provide callbacks to clients when other clients modify a file Leases: Similar to AFS, but server is only obliged to report changes for specified amount of time rsync: Exploits inter-file similarities when copying directory trees over a network

LBFS Hashing Scheme Use SHA-1 scheme, assumes data with equal hash values are the same (extremely low collision probability) Delineate a file into chunks with hash values based on data within chunk Needs to be resistant to small changes causing completely new chunk values for the file (e.g. inserting a byte at the beginning of the file)

LBFS Hashing Scheme (2) Scan entire file while calculating Rabin fingerprints values for overlapping 48-byte windows If the last 13 bits of the fingerprint match specified value, allow window to be a breakpoint between two chunks Expected chunk size = 2 13 = 8 KB Place upper and lower bound on chunk sizes to avoid pathological cases

Consistency Maintain database correlating hash values to (file, offset, count) tuples. Do not rely on database, always compute hash values to reconstruct file. Operate only on whole files Close-to-open: When a client has written and closed a file, another client opening the same file will see the new contents

Consistency (2) Client receives lease on file Server notifies clients of changes to the file while the lease is valid When a client opens a file for which the lease has expired, it checks the file attributes for modification times If the server version is more recent, the reading protocol takes place

Reading Protocol

Writing Protocol

Implementation Client uses xfs file system, file access is done via NFS, asynchronous RPC over TCP is used for communication All RPC traffic is gzip’d Unix semantics for i-nodes leads to inefficiency and possible inconsistency during server crash

Evaluation The window size for computing the Rabin fingerprint does not seem to have much impact on data sharing Expected chunk sizes were close to expected value Some small changes do remove much commonality between revisions (e.g. renumbering all the pages of a document)

Bandwidth Utilization Tests done on native file system (NFS or CIFS), AFS, just Leases+gzip, and LBFS Tests involve editing MS Word document, recompiling new version of emacs and changing the source tree between perl versions

Bandwidth Utilization Graph

Other Performance Results Reduces application execution time significantly Execution faster than AFS and Leases+gzip as bandwidth decreases Performs better than AFS independent of RTT and similarly to Leases+gzip All perform about the same in the presence of a lossy link

Conclusion LBFS avoids transmitting redundant data to a remote machine Provides method to delineate files in way which is resistant to the propagation of a small modification changing breakpoints Performs much better than the competition

Discussion Why does AFS use more downstream bandwidth than the native file system in Figure 6? Consistency model can be violated. Would server blocking all but one client from writing be better? Tests are in very biased conditions. How about testing a mix where only k of the n files were previously on the client? Scalability is an unaddressed issue. LBFS requires server to do non-trivial computation for each client file access.

Discussion (2) Would most users accept the consistency model, or is it too weak? Could schemes to merge writes be added? Could the scheme be changed to allow block caching rather than entire files? Any more elegant solutions to the static i- number problem?

Consistency Model Problem “If multiple clients are writing the same file, then the last one to close the file will win and overwrite changes from the others.”

Figure 1

Figure 7

Figures 8 and 9

Figure 10