A LOW-BANDWIDTH NETWORK FILE SYSTEM A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, New York U.

Slides:



Advertisements
Similar presentations
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
Advertisements

OVERVIEW LBFS MOTIVATION INTRODUCTION CHALLENGES ADVANTAGES OF LBFS HOW LBFS WORKS? RELATED WORK DESIGN SECURITY ISSUES IMPLEMENTATION SERVER IMPLEMENTATION.
L-18 More DFS. 2 Review of Last Lecture Distributed file systems functionality Implementation mechanisms example  Client side: VFS interception in kernel.
Disconnected Operation in the Coda File System James J. Kistler and M. Satyanarayanan Carnegie Mellon University Presented by Deepak Mehtani.
File System Implementation
Computer Science Lecture 21, page 1 CS677: Distributed OS Today: Coda, xFS Case Study: Coda File System Brief overview of other recent file systems –xFS.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Implementing ISA Server Caching. Caching Overview ISA Server supports caching as a way to improve the speed of retrieving information from the Internet.
Jeff Chheng Jun Du.  Distributed file system  Designed for scalability, security, and high availability  Descendant of version 2 of Andrew File System.
The Google File System.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
By Karan Oberoi.  A directory service (DS) is a software application- or a set of applications - that stores and organizes information about a computer.
File Systems (2). Readings r Silbershatz et al: 11.8.
DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM R. Sandberg, D. Goldberg S. Kleinman, D. Walsh, R. Lyon Sun Microsystems.
Network File Systems Victoria Krafft CS /4/05.
Lecture 23 The Andrew File System. NFS Architecture client File Server Local FS RPC.
A Low-Bandwidth Network File System A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, NYU.
Network File Systems II Frangipani: A Scalable Distributed File System A Low-bandwidth Network File System.
1 The Google File System Reporter: You-Wei Zhang.
Networked File System CS Introduction to Operating Systems.
Distributed File Systems
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,
Presented By: Samreen Tahir Coda is a network file system and a descendent of the Andrew File System 2. It was designed to be: Highly Highly secure Available.
A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002.
Jinyong Yoon,  Andrew File System  The Prototype  Changes for Performance  Effect of Changes for Performance  Comparison with A Remote-Open.
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
Distributed File Systems
ENERGY-EFFICIENCY AND STORAGE FLEXIBILITY IN THE BLUE FILE SYSTEM E. B. Nightingale and J. Flinn University of Michigan.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
A Low-bandwidth Network File System Presentation by Joseph Thompson.
CS333 Intro to Operating Systems Jonathan Walpole.
ITGS Network Architecture. ITGS Network architecture –The way computers are logically organized on a network, and the role each takes. Client/server network.
Distributed Systems CS Consistency and Replication – Part IV Lecture 13, Oct 23, 2013 Mohammad Hammoud.
Lecture 25 The Andrew File System. NFS Architecture client File Server Local FS RPC.
Solutions for the Fourth Problem Set COSC 6360 Fall 2014.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Computer Science Lecture 19, page 1 CS677: Distributed OS Last class: Distributed File Systems Issues in distributed file systems Sun’s Network File System.
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Jonathan Walpole Computer Science Portland State University
Nache: Design and Implementation of a Caching Proxy for NFSv4
Solution to the Fourth COSC 6360 Quiz for Fall 2013
Dave Hitz and Andy Watson Network Appliance, Inc
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
Today: Coda, xFS Case Study: Coda File System
A Redundant Global Storage Architecture
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Distributed File Systems
Distributed File Systems
Cary G. Gray David R. Cheriton Stanford University
CSE 451: Operating Systems Spring Module 21 Distributed File Systems
Dave Hitz and Andy Watson Network Appliance, Inc
DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM
Distributed File Systems
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Replica Placement Model: We consider objects (and don’t worry whether they contain just data or code, or both) Distinguish different processes: A process.
Today: Distributed File Systems
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
THE GOOGLE FILE SYSTEM.
Distributed File Systems
Distributed File Systems
Presentation transcript:

A LOW-BANDWIDTH NETWORK FILE SYSTEM A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, New York U

Highlights A file system for slow or wide-area networks Exploits similarities between files or versions of the same file –Avoids sending data that can be found in the server’s file system or the client’s cache Also uses conventional compression and caching Requires 90% less bandwidth than traditional network file systems

Working on slow networks Can work with local copies –Must then worry about update conflicts Can use remote login –Only for text-based applications Should use instead a low-bandwidth file system –Better than remote login –Must then deal with issues like big autosaves blocking the editor for the duration of transfer

LBFS (I) Client keeps all recently accessed files in its cache LBFS exploits cross file similarities to reduce data transfers between client and server – File server divides the file it stores into variable-size chunks –Indexes these chunks by their hash values

LBFS (II) When transferring a file between the client and the server –LBFS identifies the chunks the receiving side already has –Only transmits the other chunks Provides close-to-open consistency –Same as Coda (and newer versions of NFS)

Related work (I) AFS used callbacks to reduce network traffic Leases are callbacks with expiration date Coda supports slow networks and disconnected operations through optimistic replication Bayou and OceanStore investigate conflict resolution for optimistic updates Lee et al. have extended Coda to support operation-based updates

Related Work (II) Spring and Wetherall use large client and server caches to eliminate redundant network traffic: –Can send address of data already in cache of receiver rather than data themselves Rsync exploits similarities between directory trees containing similar subtrees

LBFS Design Key ideas: –Close-to-open consistency –Have a large persistent file cache at client IDE disks are now large enough for that –Exploits similarities between files (and file versions) Only transmits data chunks containing new data

Identifying Similar Data Chunks LBFS uses collision-resistant property of SHA-1 hash function –Assumes no hash collisions Central challenge is –Keeping the index a reasonable size –Dealing with shifting offsets

The Case against Fixed-Size Blocks File F File F after an insertion The two files do not have a single block in common

The Case against “Diffs” “Diffs” are used by several UNIX utilities –Computed by comparing contents of file with another file – Very efficient Must know which file(s) to compare to Difficult in a file system –Obscure naming of editor buffer files and other temp files

Dividing Files into Chunks LBFS –Only looks for non-overlapping chunks in files –Sets chunk boundaries based on file contents To divide a file into chunks, LBFS –Examines every (overlapping) 48-byte region of the file – Uses Rabin’s fingerprints to select boundary regions or breakpoints

Using Rabin’s Fingerprints Polynomial representation of data in 48-byte region modulo an irreducible polynomial Boundary regions have the 13 least significant bits of their fingerprint equal to an arbitrary predefined value –Assuming random data, expected chunk size is 2 13 = 8K Method is reasonably fast

How it works A file X partitioned into three chunks Same file X after one insertion inside middle chunk Chunk boundaries are arbitrary and identified by the content of their boundary regions New Chunk

Another way to look at it (I) Old File: Four score and seven years ago our fathers brought forth, a new country, conceived in liberty, and dedicated to the proposition that "all men are created equal."

Another way to look at it (II) New File: Four score and seven years ago our fathers brought forth, upon this continent, a new nation, conceived in liberty, and dedicated to the proposition that "all men are created equal"

Another way to look at it (III) Identify Chunks: Four score and seven years ago our fathers brought forth, upon this continent, a new nation, conceived in liberty, and dedicated to the proposition that "all men are created equal"

Another way to look at it (IV) Send back to server the modified chunk: upon this continent, a new nation, conceived in liberty, in compressed form

Pathological cases Having too many chunks require too much aggregate bandwidth Very large chunks would be too difficult to send in a single RPC Chunk sizes must be between 2K and 64K –May have to artificially insert chunk boundaries when files are full of repeated sequences

The chunk database (I) The chunk database –Indexes chunks by first 64 bits of SHA-1 hash –Maps keys to (file,offset, count) triples How to keep this database up to date? –Must update it whenever file is updated –Can still have problems with local updates at server site –Crashes can corrupt database contents

The chunk database (II) Best solution is to tolerate inconsistencies: –LBFS recomputes hash of any data chunk before using it –Recomputed value is also used to detect collisions Very improbable but still possible

Protocol NFS with some changes: –Uses leases to implement close-to-open consistency ( callbacks with limited lifetime ) –Practices aggressive pipelining of RPC calls –Compresses all RPC traffic

Leases Leases are callbacks with –A limited lifetime (a few seconds) – A guarantee that server will not accept updates during lease lifetime without first notifying client Advantages: – No problems with lost callbacks –Automatically expire when server crashes

An example (I) Time Server Alice Requests a lease During duration of lease Alice controls the file Must now renew it

An example (II) Time Server Alice Got a lease During duration of lease Alice controls the file Bob Also requests a lease

An example When server receives Bob's request, –It will try to contact Alice and break the lease Alice will then flush all the blocks she had updated and invalidate the contents of her cache –If Alice does not answer, server must wait until Alice's lease expires

File Consistency LBFS –Caches entire files –Implements close-to-open consistency Client –Gets a lease first time a file is opened for read –Renews expired leases by requesting file attributes –Will then check if cached copy is still current

Reads and writes Use additional calls not in NFS –GETHASH for reads –MKTMPFILE,and three other for write Server ensures atomicity of updates by writing them first into a temporary file

Security More of an issue than in a well-controlled LAN Uses SFS security infrastructure –Servers have public keys and authenticate themselves to clients New Problem: –All LBFS users can check whether file system contains a specific chunk of data –Requires observing subtle timing differences

Implementation Some problems with the way NFS allocates i-node numbers

Evaluation (I) Compared upstream and downstream bandwidth of LBFS with those of –CIFS (Common Internet File System) –NFS –AFS –LBFS with leases and gzip but w/o chunking Downstream traffic benefits most of chunking

Evaluation (II) First four bars of each workload show upstream bandwidth, second four downstream bandwidth

Conclusions LBFS bandwidth usage is one order of magnitude less than conventional file systems