Example Replicated File Systems

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

Example Replicated File Systems
NFS Coda Ficus

NFS Originally NFS did not have any replication capability
Replication of read-only file systems added later Primary copy read/write replication added later

NFS Read-Only Replication
Almost by hand Sysadmin ensures multiple copies of file systems are identical Typically on different machines Avoid writing to any replica E.g., mount them read-only Use automounting facilities to handle failover and load balancing

Primary Copy NFS Replication
Commonly referred to as DRDB Typically two replicas Primarily for reliability One replica is the primary It can be written Other replica mirrors the primary Provides service if primary unavailable

Some Primary Copy Issues
Handling updates How and when do they propagate? Determining failure Of the secondary copy Of the primary copy Handling recovery

Update Issues In DRDB Two choices: Synchronous
Writes don’t return until both copies are updated Asynchronous Writes return once primary updated Secondary updated later

Implications of Synchronous Writes
Slower, since can’t indicate success till both copies are written One is written across the network, ensuring slowness Fewer consistency issues If write returned, both copies have it If not, neither does Real bad timing requires some cleanup

Implications of Asynchronous Writes
Faster, since you only wait for primary copy Almost always works just fine Almost always Problems when it doesn’t though Different values of same data at different copies May not clear how it happened Perhaps even worse

Detecting Failures DRDB usually uses a heartbeat process
Primary and secondary expect to communicate every few seconds E.g., every two seconds If too many heartbeats in a row missed, declare the partner “dead” Might just be unreachable, though

Responding To Failures
Switch service from the primary to the secondary Which becomes the primary Including write service Ensures continued operation after failure Update logging ensure new primary is up to date

Recovery From Failures
Recovered node becomes the secondary Receives missed updates from primary Complications if network failure caused the failure The split brain problem

The Split Brain Problem
Primary Secondary Primary NETWORK PARTITION! Update 1 Update 2 Update 3 Now what?

The “Simple” Solution Prevent access to both
Until sysadmin designates one of them as the new primary Throw away the other and reset to the designated primary Simple for the sysadmin, maybe not for the users

What Other Solution Is Possible?
Try to figure out what the correct version of the data is In NFS case, chances are good writes are to different files In which case, you probably just need the most recent copy of each file But there are complex cases NFS replication doesn’t try to do this

Coda A follow-on to the Andrew File System (AFS)
Using the basic philosophy of AFS But specifically to handle mobile computers

Clients request files from the servers
The AFS System A server pool Clients request files from the servers Client workstations

AFS Characteristics Files permanently stored at exactly one server
Clients keep cached copies Writes cached until file close Asynchronous writes Other copies then invalidated Stateful servers Unless write conflicts

Adding Mobile Computers
A server pool Just like AFS, except Client workstations Some of the clients are mobile

Why Does That Make a Difference?
Mobile computers come and go Well, so do users at workstations But mobile computers take their files with them And expect to access them while they are gone What happens when they do?

The Mobile Problem for AFS
Now it reconnects The laptop downloads some files to its disk Then it disconnects from the network Then it uses the files And maybe writes them

Why Is This Different Than Normal AFS?
We might get write conflicts here Normal AFS might, too But normal AFS conflicts have a small window Truly concurrent writes only Cache invalidation when someone closes For laptop, close could occur weeks before reconnect

Handling Disconnected Operations
Could use a solution like NFS Server has primary copy Client has secondary copy If client can’t access server, can’t write Or could use an optimistic solution Assume no one else is going to write your file, so go ahead yourself Detect problems and fix as needed

The Coda Approach Essentially optimistic
When connected, operates much like AFS When disconnected, client is allowed to update cached files Access control permitting But unlike AFS, can’t propagate updates on file close After all, it’s disconnected Instead, remember this failure until later

Ficus A more peer-oriented replicated file system
A descendant of the Locus operating system Specifically designed for mobile computers

AFS, Coda, and Caching Like AFS, client machines only cache files
An AFS cache miss is just a performance penalty Get it from the server A Coda cache miss when disconnected is a disaster User can’t access his file

Avoiding Disconnected Cache Misses
Really requires thinking ahead Initially Coda required users to do it Maintain a list of files they wanted to be sure to always cache In case of disconnected operations Eventually went to a hoarding solution We’ll discuss hoarding later

Coda Reintegration When a disconnected Coda client reconnects
Tries to propagate updates occurring during disconnection to a server If no one else updated that file, just like a normal AFS update If someone else updated the file during disconnection, what then?

Coda and Conflicts Such update problems on Coda reintegration are conflicts Two (or more) users made concurrent writes to a file Original solution was that later update (mostly) lost Update on server wins Other update put in special conflict directory Owning user or sysadmin notified to take action Or not take action . . .

Later Coda Conflict Solutions
Automated reconciliation of conflicts When possible User tools to help handle them when automation doesn’t work Can you think of particularly problematic issues here?

The Locus Computing Model
System composed of many personal workstations Connected by a local area network Shared by all! But provide the illusion of . . . And perhaps a few shared server machines All machines have dedicated storage

The Ficus Computing Model
Just like the Locus model, except . . . Some of the workstations are portable computers Which might disconnect from the network Taking their storage with them

Ficus Shares Some Problems With Coda
Portable computers can only access local disks while disconnected Updates involving disconnected computers are complicated And can even cause conflicts

Ficus Has Some Unique Problems, Too
It’s really . . . What happens to this when the portables’ storage goes away? And, unfortunately. . .

Handling the Problems Rely on replication
Replicate the files that the portable needs while disconnected Replicate the files it’s taking away when it departs So everyone else can still see them

Updates in Ficus Ficus uses peer replication No primary copy
All replicas are equally good So if access permissions allow update And you can get to a replica You can update it How does Ficus handle that?

The Easy Case All replicas are present and available
Allow update to one of the replicas Make a reasonable effort to propagate the update to all others But not synchronously On a good day, this works and everything is identical

The Hard Case The best effort to propagate an update from the original replica fails Perhaps because you can’t reach one or more other replicas Perhaps because the portable computers holding them are elsewhere

Handling Updates With primary copies With peer copies
Secondary If they’re the same, no problem If they’re the same, still no problem If they’re the different, the primary always wins But what if they’re different? Only possible reason is that the secondary is old

What Are the Possibilities?
1. One is old and the other is updated Or . . . How do we tell which is the new one? 2. Both have been updated Now what?

More Complicated If >2 Replicas
What’s the right thing to do? Here’s just one example And how do you figure that out? Replica 1 Replica 2 Replica 3 Somehow you figure out replica 2 is newer than replica 3 Update replica 1 Propagate to replica 2 Propagate to replica 3 Update replica 1

Reconciliation Always an option in Locus and Ficus
Much more important with disconnected operation When a replica notices a previously unavailable replica, Check for missing updates and trade information about them The async operation that ensures eventual update propagation

Gossiping in Ficus Primary copy replication and systems like Coda always propagate updates the same way Other replicas give their updates to a single site And get new updates from that site Peer systems like Ficus have another option Any peer with later updates can pass them to you Even if they aren’t the primary and didn’t create the updates In file systems, this is called gossiping

How Does Ficus Track Updates?
Ficus uses version vectors An applied type of vector clock These clocks keep one vector element per replica With a vector clock stored at each replica Clocks “tick” only on updates

Version Vector Use Example
Replica 1 Replica 2 Replica 3 1 1 1 1 1 When replica 2 comes back, its version will be recognized as old Compared to either replica 1 or replica 3

Version Vectors and Conflicts
Ficus recognizes concurrent (and thus conflicting) writes Using version vectors If neither of two version vectors dominates the other, there’s a conflict Implying concurrent write Typically detected during reconciliation

For Example CONFLICT! Replica 1 Replica 2 1 1 1 1

Now What? Conflicting files represent concurrent writes
There is no “correct” order to apply them Use other techniques to resolve the conflicts Creating a semantically correct and/or acceptable version

Example Conflict Resolution
Identical conflicts Same update made in two different places Easy to resolve Assuming updates in question are idempotent Conflicts involving append-only files Merge the appends Most Unix directory conflicts are automatically resolvable

Ficus Replication Granularity
NFS replicates volumes Coda replicates individual files Ficus replicates volumes Later, selective replication of files within volumes added

Hoarding A portable machine off the network must operate off its own disk Only! So it better replicate the files it needs If you know/predict portable disconnection, pre-replicate those files That’s called hoarding

Mechanics of Hoarding Mechanically easy if you replicate at file granularity E.g., Coda or Ficus with selective replication Simply replicate what you need Inefficient if you replicate at volume granularity

What Do You Hoard? Could be done manually Doesn’t work out well
Could replicate every file the portable ever touches Might overfill its disk Could use LRU Experience shows that fails oddly

What Does Work Well? You might think clustering
Identify files that are used together If one of them recently used, hoard them all Basic approach in Seer Actually, LRU plus some sleazy tricks works equally well And is much cheaper

Example Replicated File Systems

Similar presentations

Presentation on theme: "Example Replicated File Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Example Replicated File Systems

Similar presentations

Presentation on theme: "Example Replicated File Systems"— Presentation transcript:

Similar presentations

About project

Feedback