Distributed FS, Continued Andy Wang COP 5611 Advanced Operating Systems.

Slides:



Advertisements
Similar presentations
The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
Advertisements

The Zebra Striped Network File System Presentation by Joseph Thompson.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
G Robert Grimm New York University Disconnected Operation in the Coda File System.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Other File Systems: AFS, Napster. 2 Recap NFS: –Server exposes one or more directories Client accesses them by mounting the directories –Stateless server.
Computer Science Lecture 21, page 1 CS677: Distributed OS Today: Coda, xFS Case Study: Coda File System Brief overview of other recent file systems –xFS.
Jeff Chheng Jun Du.  Distributed file system  Designed for scalability, security, and high availability  Descendant of version 2 of Andrew File System.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
The Design and Implementation of a Log-Structured File System Presented by Carl Yao.
Case Study - GFS.
File Systems (2). Readings r Silbershatz et al: 11.8.
Distributed File Systems Sarah Diesburg Operating Systems CS 3430.
Network File Systems Victoria Krafft CS /4/05.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed File Systems Steve Ko Computer Sciences and Engineering University at Buffalo.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Distributed Deadlocks and Transaction Recovery.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Distributed Systems Principles and Paradigms Chapter 10 Distributed File Systems 01 Introduction 02 Communication 03 Processes 04 Naming 05 Synchronization.
Mobility in Distributed Computing With Special Emphasis on Data Mobility.
Networked File System CS Introduction to Operating Systems.
Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.
Distributed File Systems
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
What is a Distributed File System?? Allows transparent access to remote files over a network. Examples: Network File System (NFS) by Sun Microsystems.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Distributed File Systems Andy Wang COP 5611 Advanced Operating Systems.
Presenters: Rezan Amiri Sahar Delroshan
Serverless Network File Systems Overview by Joseph Thompson.
CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,
Example: Rumor Performance Evaluation Andy Wang CIS 5930 Computer Systems Performance Analysis.
Distributed OSes Continued Andy Wang COP 5611 Advanced Operating Systems.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Presented By: Samreen Tahir Coda is a network file system and a descendent of the Andrew File System 2. It was designed to be: Highly Highly secure Available.
CS425 / CSE424 / ECE428 — Distributed Systems — Fall 2011 Some material derived from slides by Prashant Shenoy (Umass) & courses.washington.edu/css434/students/Coda.ppt.
Distributed File Systems Architecture – 11.1 Processes – 11.2 Communication – 11.3 Naming – 11.4.
Write Conflicts in Optimistic Replication Problem: replicas may accept conflicting writes. How to detect/resolve the conflicts? client B client A replica.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
Truly Distributed File Systems Paul Timmins CS 535.
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Nomadic File Systems Uri Moszkowicz 05/02/02.
Distributed File Systems
Chapter 25: Advanced Data Types and New Applications
File System Implementation
Example Replicated File Systems
Today: Coda, xFS Case Study: Coda File System
Distributed File Systems
Distributed File Systems
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
Distributed File Systems
Distributed FS, Continued
Distributed OSes Continued
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
THE GOOGLE FILE SYSTEM.
Distributed File Systems
Distributed File Systems
Presentation transcript:

Distributed FS, Continued Andy Wang COP 5611 Advanced Operating Systems

Outline Replicated file systems Ficus Coda Serverless file systems

Replicated File Systems NFS provides remote access AFS provides high quality caching Why isn’t this enough? More precisely, when isn’t this enough?

When Do You Need Replication? For write performance For reliability For availability For mobile computing For load sharing Optimistic replication increases these advantages

Some Replicated File Systems Locus Ficus Coda Rumor All optimistic: few conservative file replication systems have been built

Ficus Optimistic file replication based on peer-to-peer model Built in Unix context Meant to service large network of workstations Built using stackable layers

Peer-To-Peer Replication All replicas are equal No replicas are masters, or servers All replicas can provide any service All replicas can propagate updates to all other replicas Client/server is the other popular model

Basic Ficus Architecture Ficus replicates at volume granularity Given volume can be replicated many times Performance limitations on scale Updates propagated as they occur On single best-efforts basis Consistency achieved by periodic reconciliation

Stackable Layers in Ficus Ficus is built out of several stackable layers Exact composition depends on what generation of system you look at

Ficus Stackable Layers Diagram Select FLFS Storage FPFS Transport Storage FPFS

Ficus Diagram Site A Site B Site C 1 2 3

An Update Occurs Site A Site B Site C 1 2 3

Reconciliation in Ficus Reconciliation process runs periodically on each Ficus site For each local volume replica Reconciliation strategy implies eventual consistency guarantee Frequency of reconciliation affects how long “eventually” takes

Steps in Reconciliation 1. Get information about the state of a remote replica 2.Get information about the state of the local replica 3.Compare the two sets of information 4.Change local replica to reflect remote changes

Ficus Reconciliation Diagram C Reconciles With A Site A Site B Site C 1 2 3

Ficus Reconciliation Diagram Con’t B Reconciles With C Site A Site B Site C 1 2 3

Gossiping and Reconciliation Reconciliation benefits from the use of gossip In example just shown, an update originating at A got to B through communications between B and C So B can get the update without talking to A directly

Benefits of Gossiping Potentially less communications Shares load of sending updates Easier recovery behavior Handles disconnections nicely Handles mobile computing nicely Peer model systems get more benefit than client/server model systems

Reconciliation Topology Reconciliation in Ficus is pair-wise In the general case, which pairs of replicas should reconcile? Reconciling all pairs is unnecessary Due to gossip Want to minimize number of recons But propagate data quickly

Ficus Ring Reconciliation Topology

Adaptive Ring Reconciliation Topology

Problems in File Reconciliation Recognizing updates Recognizing update conflicts Handling conflicts Recognizing name conflicts Update/remove conflicts Garbage collection Fiscus has solutions for all these problems

Recognizing Updates in Ficus Ficus keeps per-file version vectors Updates detected by version vector comparisons The data for the later version can then be propagated Ficus propagates full files

Recognizing Update Conflicts in Ficus Concurrent update can lead to update conflicts Version vectors permit detection of update conflicts Works for n-way conflicts, too

Handling Update Conflicts in Ficus Ficus uses resolver programs to handle conflicts Resolvers work on one pair of replicas of one file System attempts to deduce file type and call proper resolver If all resolvers fail, notify user Ficus also blocks access to file

Handling Directory Conflicts in Ficus Directory updates have very limited semantics So directory conflicts are easier to deal with Ficus uses special in-kernel mechanisms to automatically fix most directory conflicts

Directory Conflict Diagram Earth Mars Saturn Earth Mars Sedna Replica 2 Replica 1

How Did This Directory Get Into This State? If we could figure out what operations were performed on each side that cased each replica to enter this state, We could produce a merged version But there are two possibilities

Possibility 1 1. Earth and Mars exist 2. Create Saturn at replica 1 3. Create Sedna at replica 2 Correct result is directory containing Earth, Mars, Saturn, and Sedna

The Create/Delete Ambiguity This is an example of a general problem with replicated data Cannot be solved with per-file version vectors Requires per-entry information Ficus keeps such information Must save removed files’ entries for a while

Possibility 2 1. Earth, Mars, and Saturn exist 2. Delete Saturn at replica 2 3. Create Sedna at replica 2 Correct result is directory containing Earth, Mars, and Sedna And there are other possibilities

Recognizing Name Conflicts in Ficus Name conflicts occur when two different files are concurrently given same name Ficus recognizes them with its per-entry directory info Then what? Handle similarly to update conflicts Add disambiguating suffixes to names

Internal Representation of Problem Directory Earth Mars Saturn Earth Mars Saturn Sedna Replica 1Replica 2

Update/Remove Conflicts Consider case where file “ Saturn” has two replicas 1. Replica 1 receives an update 2. Replica 2 is removed What should happen? A matter of systems semantics, basically

Ficus’ No-Lost-Updates Semantics Ficus handles this problem by defining its semantics to be no-lost-updates In other words, the update must not disappear But the remove must happen Put “Saturn” in the orphanage Requires temporarily saving removed files

Removals and Hard Links Unix and Ficus support hard links Effectively, multiple names for a file Cannot remove a file’s bits until the last hard link to the file is removed Tricky in a distributed system

Link Example Replica 1 foodir redblue Replica 2 foodir redblue

Link Example, Part II Replica 1 foodir redblue Replica 2 foodir redblue update blue

Link Example, Part III Replica 1 foodir redblue Replica 2 foodir redblue delete blue bardir create hard link in bardir to blue

What Should Happen Here? Clearly, the link named foodir/blue should disappear And the link in bardir link point to? But what version of the data should the bardir link point to? No-lost-update semantics say it must be the update at replica 1

Garbage Collection in Ficus Ficus cannot throw away removed things at once Directory entries Updated files for no-lost-updates Non-updated files due to hard links When can Ficus reclaim the space these use?

When Can I Throw Away My Data Not until all links to the file disappear Global information, not local Moreover, just because I know all links have disappeared doesn’t mean I can throw everything away Must wait till everyone knows Requires two trips around the ring

Why Can’t I Forget When I Know There Are No Links I can throw the data away I don’t need it, nobody else does either But I can’t forget that I knew this Because not everyone knows it For them to throw their data away, they must learn So I must remember for their benefit

Coda A different approach to optimistic replication Inherits a lot form Andrew Basically, a client/server solution Developed at CMU

Coda Replication Model Files stored permanently at server machines Client workstations download temporary replicas, not cached copies Can perform updates without getting token from the server So concurrent updates possible

Detecting Concurrent Updates Workstation replicas only reconcile with their server At recon time, they compare their state of files with server’s state Detecting any problems Since workstations don’t gossip, detection is easier than in Ficus

Handling Concurrent Updates Basic strategy is similar to Ficus’ Resolver programs are called to deal with conflicts Coda allows resolvers to deal with multiple related conflicts at once Also has some other refinements to conflict resolution

Server Replication in Coda Unlike Andrew, writable copies of a file can be stored at multiple servers Servers have peer-to-peer replication Servers have strong connectivity, crash infrequently Thus, Coda uses simpler peer-to-peer algorithms than Ficus must

Why Is Coda Better Than AFS? Writes don’t lock the file Writes happen quicker More local autonomy Less write traffic on the network Workstations can be disconnected Better load sharing among servers

Comparing Coda to Ficus Coda uses simpler algorithms Less likely to be bugs Less likely to be performance problems Coda doesn’t allow client gossiping Coda has built-in security Coda garbage collection simpler

Serverless Network File Systems New network technologies are much faster, with much higher bandwidth In some cases, going over the net is quicker than going to local disk How can we improve file systems by taking advantage of this change?

Fundamental Ideas of Serverless File Systems Peer workstations providing file service for each other High degree of location independence Make use of all machine’s caches Provide reliability in case of failures

xFS Serverless file system project at Berkeley Inherits ideas from several sources LFS Zebra (RAID-like ideas) Multiprocessor cache consistency Built for Network of Workstations (NOW) environment

What Does a File Server Do? Stores file data blocks on its disks Maintains file location information Maintains cache of data blocks Manages cache consistency for its clients

xFS Must Provide These Services In essence, every machine takes on some of the server’s responsibilities Any data or metadata might be located at any machine Key challenge is providing same services centralized server provided in a distributed system

Key xFS Concepts Metadata manager Stripe groups for data storage Cooperative caching Distributed cleaning processes

How Do I Locate a File in xFS? I’ve got a file name, but where is it? Assuming it’s not locally cached File’s director converts name to a unique index number Consult the metadata manager to find out where file with that index number is stored-the manager map

The Manger Map Data structure that allows translation of index numbers to file managers Not necessarily file locations Kept by each metadata manager Globally replicated data structure Simply says what machine manages the file

Using the Manager Map Look up index number in local map Index numbers are clustered, so many fewer entries than files Send request to responsible manager

What Does the Manager Do? Manager keeps two types of information 1. imap information 2. caching information If some other sites has the file in its cache, tell requester to go to that site Always use cache before disk Even if cache is remote

What if No One Caches the Block? Metadata manager for this file then must consul its imap Imap tells which disks store the data block Files are striped across disks stored on multiple machines Typically single block is on one disk

Writing Data xFS uses RAID-like methods to store data RAID sucks for small writes So xFS avoids small writes By using LFS-style operations Batch writes until you have a full stripe’s worth

Stripe Groups Set of disks that cooperatively store data in RAID fashion xFS uses single parity disk Alternative to striping all data across all disks

Cooperative Caching Each site’s cache can service requests from all other sites Working from assumption that network access is quicker than disk access Metadata managers used to keep track of where data is cached So remote cache access takes 3 network hops

Getting a Block from a Remote Cache Manager Map Client Cache Consistency Sate MetaData Server Unix Cache Caching Site Request Block 1 2 3

Providing Cache Consistency Per-block token consistency To write a block, client requests token from metadata server Metadata server retrievers token from whoever has it And invalidates other caches Writing site keeps token

Which Sites Should Manage Which Files? Could randomly assign equal number of file index groups to each site Better if the site using a file also manages it In particular, if most frequent writer manages it Can reduce network traffic by ~ 50%

Cleaning Up File data (and metadata) is stored in log structures spread across machines A distributed cleaning method is required Each machine stores info on its usage of stripe groups Each clans up its own mess

Basic Performance Results Early results from incomplete system Can provide up to 10 times the bandwidth of file data as single NFS server Even better on creating small files Doesn’t compare xFS to multimachine servers