Example Replicated File Systems

Slides:



Advertisements
Similar presentations
G Robert Grimm New York University Disconnected Operation in the Coda File System.
Advertisements

Transaction Management and Concurrency Control
Coda file system: Disconnected operation By Wallis Chau May 7, 2003.
Computer Science Lecture 21, page 1 CS677: Distributed OS Today: Coda, xFS Case Study: Coda File System Brief overview of other recent file systems –xFS.
Distributed File System: Design Comparisons II Pei Cao Cisco Systems, Inc.
Jeff Chheng Jun Du.  Distributed file system  Designed for scalability, security, and high availability  Descendant of version 2 of Andrew File System.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
National Manager Database Services
Distributed File Systems Sarah Diesburg Operating Systems CS 3430.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Day 10 Hardware Fault Tolerance RAID. High availability All servers should be on UPSs –2 Types Smart UPS –Serial cable connects from UPS to computer.
Distributed Systems Principles and Paradigms Chapter 10 Distributed File Systems 01 Introduction 02 Communication 03 Processes 04 Naming 05 Synchronization.
Mobility in Distributed Computing With Special Emphasis on Data Mobility.
Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.
Module 12: Designing High Availability in Windows Server ® 2008.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Distributed File Systems
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,
Example: Rumor Performance Evaluation Andy Wang CIS 5930 Computer Systems Performance Analysis.
Distributed OSes Continued Andy Wang COP 5611 Advanced Operating Systems.
CS425 / CSE424 / ECE428 — Distributed Systems — Fall 2011 Some material derived from slides by Prashant Shenoy (Umass) & courses.washington.edu/css434/students/Coda.ppt.
Distributed FS, Continued Andy Wang COP 5611 Advanced Operating Systems.
Write Conflicts in Optimistic Replication Problem: replicas may accept conflicting writes. How to detect/resolve the conflicts? client B client A replica.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
Lecture 15 Page 1 CS 236 Online Evaluating Running Systems Evaluating system security requires knowing what’s going on Many steps are necessary for a full.
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
Networking Objectives Understand what the following policies will contain – Disaster recovery – Backup – Archiving – Acceptable use – failover.
Mobile File Systems.
Jonathan Walpole Computer Science Portland State University
High Availability 24 hours a day, 7 days a week, 365 days a year…
Nomadic File Systems Uri Moszkowicz 05/02/02.
Transactions and Reliability
High Availability Linux (HA Linux)
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Protecting Memory What is there to protect in memory?
Distributed File Systems
Protecting Memory What is there to protect in memory?
Chapter 25: Advanced Data Types and New Applications
Lecturer : Dr. Pavle Mogin
LCGAA nightlies infrastructure
Multiprocessor Cache Coherency
Swapping Segmented paging allows us to have non-contiguous allocations
Introduction to Computers
View Change Protocols and Reconfiguration
Data Replication CS 188 Distributed Systems February 3, 2015
Introduction There are many situations in which we might use replicated data Let’s look at another, different one And design a system to work well in that.
Advanced Operating Systems Chapter 11 Distributed File systems 11
Today: Coda, xFS Case Study: Coda File System
CSE 486/586 Distributed Systems Consistency --- 1
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Distributed File Systems
Distributed File Systems
CSE 451: Operating Systems Spring Module 21 Distributed File Systems
Distributed File Systems
Distributed FS, Continued
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Distributed OSes Continued
Distributed Systems CS
View Change Protocols and Reconfiguration
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
THE GOOGLE FILE SYSTEM.
Last Class: Web Caching
Distributed File Systems
Distributed File Systems
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
System-Level Support CIS 640.
Presentation transcript:

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

Example Replicated File Systems NFS Coda Ficus

NFS Originally NFS did not have any replication capability Replication of read-only file systems added later Primary copy read/write replication added later

NFS Read-Only Replication Almost by hand Sysadmin ensures multiple copies of file systems are identical Typically on different machines Avoid writing to any replica E.g., mount them read-only Use automounting facilities to handle failover and load balancing

Primary Copy NFS Replication Commonly referred to as DRDB Typically two replicas Primarily for reliability One replica is the primary It can be written Other replica mirrors the primary Provides service if primary unavailable

Some Primary Copy Issues Handling updates How and when do they propagate? Determining failure Of the secondary copy Of the primary copy Handling recovery

Update Issues In DRDB Two choices: Synchronous Writes don’t return until both copies are updated Asynchronous Writes return once primary updated Secondary updated later

Implications of Synchronous Writes Slower, since can’t indicate success till both copies are written One is written across the network, ensuring slowness Fewer consistency issues If write returned, both copies have it If not, neither does Real bad timing requires some cleanup

Implications of Asynchronous Writes Faster, since you only wait for primary copy Almost always works just fine Almost always Problems when it doesn’t though Different values of same data at different copies May not clear how it happened Perhaps even worse

Detecting Failures DRDB usually uses a heartbeat process Primary and secondary expect to communicate every few seconds E.g., every two seconds If too many heartbeats in a row missed, declare the partner “dead” Might just be unreachable, though

Responding To Failures Switch service from the primary to the secondary Which becomes the primary Including write service Ensures continued operation after failure Update logging ensure new primary is up to date

Recovery From Failures Recovered node becomes the secondary Receives missed updates from primary Complications if network failure caused the failure The split brain problem

The Split Brain Problem Primary Secondary Primary NETWORK PARTITION! Update 1 Update 2 Update 3 Now what?

The “Simple” Solution Prevent access to both Until sysadmin designates one of them as the new primary Throw away the other and reset to the designated primary Simple for the sysadmin, maybe not for the users

What Other Solution Is Possible? Try to figure out what the correct version of the data is In NFS case, chances are good writes are to different files In which case, you probably just need the most recent copy of each file But there are complex cases NFS replication doesn’t try to do this

Coda A follow-on to the Andrew File System (AFS) Using the basic philosophy of AFS But specifically to handle mobile computers

Clients request files from the servers The AFS System A server pool Clients request files from the servers Client workstations

AFS Characteristics Files permanently stored at exactly one server Clients keep cached copies Writes cached until file close Asynchronous writes Other copies then invalidated Stateful servers Unless write conflicts

Adding Mobile Computers A server pool Just like AFS, except . . . Client workstations Some of the clients are mobile

Why Does That Make a Difference? Mobile computers come and go Well, so do users at workstations But mobile computers take their files with them And expect to access them while they are gone What happens when they do?

The Mobile Problem for AFS Now it reconnects The laptop downloads some files to its disk Then it disconnects from the network Then it uses the files And maybe writes them

Why Is This Different Than Normal AFS? We might get write conflicts here Normal AFS might, too But normal AFS conflicts have a small window Truly concurrent writes only Cache invalidation when someone closes For laptop, close could occur weeks before reconnect

Handling Disconnected Operations Could use a solution like NFS Server has primary copy Client has secondary copy If client can’t access server, can’t write Or could use an optimistic solution Assume no one else is going to write your file, so go ahead yourself Detect problems and fix as needed

The Coda Approach Essentially optimistic When connected, operates much like AFS When disconnected, client is allowed to update cached files Access control permitting But unlike AFS, can’t propagate updates on file close After all, it’s disconnected Instead, remember this failure until later

Ficus A more peer-oriented replicated file system A descendant of the Locus operating system Specifically designed for mobile computers

AFS, Coda, and Caching Like AFS, client machines only cache files An AFS cache miss is just a performance penalty Get it from the server A Coda cache miss when disconnected is a disaster User can’t access his file

Avoiding Disconnected Cache Misses Really requires thinking ahead Initially Coda required users to do it Maintain a list of files they wanted to be sure to always cache In case of disconnected operations Eventually went to a hoarding solution We’ll discuss hoarding later

Coda Reintegration When a disconnected Coda client reconnects Tries to propagate updates occurring during disconnection to a server If no one else updated that file, just like a normal AFS update If someone else updated the file during disconnection, what then?

Coda and Conflicts Such update problems on Coda reintegration are conflicts Two (or more) users made concurrent writes to a file Original solution was that later update (mostly) lost Update on server wins Other update put in special conflict directory Owning user or sysadmin notified to take action Or not take action . . .

Later Coda Conflict Solutions Automated reconciliation of conflicts When possible User tools to help handle them when automation doesn’t work Can you think of particularly problematic issues here?

The Locus Computing Model System composed of many personal workstations Connected by a local area network Shared by all! But provide the illusion of . . . And perhaps a few shared server machines All machines have dedicated storage

The Ficus Computing Model Just like the Locus model, except . . . Some of the workstations are portable computers Which might disconnect from the network Taking their storage with them

Ficus Shares Some Problems With Coda Portable computers can only access local disks while disconnected Updates involving disconnected computers are complicated And can even cause conflicts

Ficus Has Some Unique Problems, Too It’s really . . . What happens to this when the portables’ storage goes away? And, unfortunately. . .

Handling the Problems Rely on replication Replicate the files that the portable needs while disconnected Replicate the files it’s taking away when it departs So everyone else can still see them

Updates in Ficus Ficus uses peer replication No primary copy All replicas are equally good So if access permissions allow update And you can get to a replica You can update it How does Ficus handle that?

The Easy Case All replicas are present and available Allow update to one of the replicas Make a reasonable effort to propagate the update to all others But not synchronously On a good day, this works and everything is identical

The Hard Case The best effort to propagate an update from the original replica fails Perhaps because you can’t reach one or more other replicas Perhaps because the portable computers holding them are elsewhere

Handling Updates With primary copies With peer copies Secondary If they’re the same, no problem If they’re the same, still no problem If they’re the different, the primary always wins But what if they’re different? Only possible reason is that the secondary is old

What Are the Possibilities? 1. One is old and the other is updated Or . . . How do we tell which is the new one? 2. Both have been updated Now what?

More Complicated If >2 Replicas What’s the right thing to do? Here’s just one example And how do you figure that out? Replica 1 Replica 2 Replica 3 Somehow you figure out replica 2 is newer than replica 3 Update replica 1 Propagate to replica 2 Propagate to replica 3 Update replica 1

Reconciliation Always an option in Locus and Ficus Much more important with disconnected operation When a replica notices a previously unavailable replica, Check for missing updates and trade information about them The async operation that ensures eventual update propagation

Gossiping in Ficus Primary copy replication and systems like Coda always propagate updates the same way Other replicas give their updates to a single site And get new updates from that site Peer systems like Ficus have another option Any peer with later updates can pass them to you Even if they aren’t the primary and didn’t create the updates In file systems, this is called gossiping

How Does Ficus Track Updates? Ficus uses version vectors An applied type of vector clock These clocks keep one vector element per replica With a vector clock stored at each replica Clocks “tick” only on updates

Version Vector Use Example Replica 1 Replica 2 Replica 3 1 1 1 1 1 When replica 2 comes back, its version will be recognized as old Compared to either replica 1 or replica 3

Version Vectors and Conflicts Ficus recognizes concurrent (and thus conflicting) writes Using version vectors If neither of two version vectors dominates the other, there’s a conflict Implying concurrent write Typically detected during reconciliation

For Example CONFLICT! Replica 1 Replica 2 1 1 1 1

Now What? Conflicting files represent concurrent writes There is no “correct” order to apply them Use other techniques to resolve the conflicts Creating a semantically correct and/or acceptable version

Example Conflict Resolution Identical conflicts Same update made in two different places Easy to resolve Assuming updates in question are idempotent Conflicts involving append-only files Merge the appends Most Unix directory conflicts are automatically resolvable

Ficus Replication Granularity NFS replicates volumes Coda replicates individual files Ficus replicates volumes Later, selective replication of files within volumes added

Hoarding A portable machine off the network must operate off its own disk Only! So it better replicate the files it needs If you know/predict portable disconnection, pre-replicate those files That’s called hoarding

Mechanics of Hoarding Mechanically easy if you replicate at file granularity E.g., Coda or Ficus with selective replication Simply replicate what you need Inefficient if you replicate at volume granularity

What Do You Hoard? Could be done manually Doesn’t work out well Could replicate every file the portable ever touches Might overfill its disk Could use LRU Experience shows that fails oddly

What Does Work Well? You might think clustering Identify files that are used together If one of them recently used, hoard them all Basic approach in Seer Actually, LRU plus some sleazy tricks works equally well And is much cheaper