Download presentation
Presentation is loading. Please wait.
Published byBryan Butler Modified over 9 years ago
1
EE324 INTRO TO DISTRIBUTED SYSTEMS L-20 More DFS
2
Andrew File System Let’s start with a familiar example: andrew 10,000s of machines 10,000s of people Goal: Have a consistent namespace for files across computers Allow any authorized user to access their files from any computer DiskDiskDisk Terabytes of disk
3
Callbacks 3 When a client opens an AFS file for the first time, se rver promises to notify it whenever it receives a new version of the file from any other client Promise is called a callback Relieves the server from having to answer a call fro m the client every time the file is opened Significant reduction of server workload (remember NFS asks server every 60 secs.)
4
AFS summary Client-side caching is a fundamental technique to improve scalability and performance But raises important questions of cache consistency Timeouts and callbacks are common methods for providing (some forms of) consistency. AFS picked session semantics (close-to-open consistency) as a good balance of usability (the model seems intuitive to users), performance, etc. AFS authors argued that apps with highly concurrent, shared access, like databases, needed a different model
5
Today's Lecture 5 Other types of DFS Coda – disconnected operation Programming assignment 4
6
Background 6 We are back to 1990s. Network is slow and not stable Terminal “powerful” client 33MHz CPU, 16MB RAM, 100MB hard drive Mobile Users appeared 1st IBM Thinkpad in 1992 We can do work at client without network
7
CODA 7 Successor of the very successful Andrew File System (AFS) AFS First DFS aimed at a campus-sized user community Key ideas include Session semantics: open-to-close consistency callbacks
8
Hardware Model 8 Similarity CODA and AFS assume that client workstations are pers onal computers controlled by their user/owner Fully autonomous Cannot be trusted Difference CODA allows owners of laptops to operate them in dis connected mode Opposite of ubiquitous connectivity
9
Coda 9 Must handle two types of failures Server failures: Data servers are replicated Communication failures and voluntary disconnections Coda uses optimistic replication and file hoarding
10
Design Rationale 10 Scalability Callback cache coherence (inherit from AFS) Whole file caching Portable workstations User’s assistance in cache management
11
Design Rationale –Replica Control 11 Pessimistic Disable all partitioned writes - Require a client to acquire control of a cached object pri or to disconnection Optimistic Assuming no others touching the file - sophisticated: conflict detection + fact: low write-sharing in Unix + high availability: access anything in range without lock
12
Pessimistic Replica Control 12 Would require client to acquire exclusive (RW) or s hared (R) control of cached objects before accessing them in disconnected mode: Acceptable solution for voluntary disconnections Does not work for involuntary disconnections What if the laptop remains disconnected for a long time?
13
Leases 13 We could grant exclusive/shared control of the cach ed objects for a limited amount of time Works very well in connected mode Reduces server workload Server can keep leases in volatile storage as long as th eir duration is shorter than boot time Would only work for very short disconnection perio ds
14
Optimistic Replica Control (I) 14 Optimistic replica control allows access in every disco nnected mode Tolerates temporary inconsistencies Promises to detect them later Provides much higher data availability
15
Optimistic Replica Control (II) 15 Defines an accessible universe: set of replicas that t he user can access Accessible universe varies over time At any time, user Will read from the latest replica(s) in his accessible univ erse Will update all replicas in his accessible universe
16
Coda (Venus) States 16 1. Hoarding: Normal operation mode 2. Emulating: Disconnected operation mode 3. Reintegrating: Propagates changes and detects inconsistencies Hoarding Emulating Recovering
17
Hoarding 17 Hoard useful data for disconnection Balance the needs of connected and disconnected o peration. Cache size is restricted Unpredictable disconnections Prioritized algorithm – cache manage hoard walking – reevaluate objects
18
Prioritized algorithm 18 User defined hoard priority p: how interest it is? Recent Usage q Object priority = f(p,q) Kick out the one with lowest priority + Fully tunable Everything can be customized - Not tunable (?) - No idea how to customize
19
Emulation 19 In emulation mode: Attempts to access files that are not in the client caches appear as failures to application All changes are written in a persistent log, the client modification log (CML)
20
Persistence 20 Venus keeps its cache and related data structures in non-volatile storage
21
Reintegration 21 When workstation gets reconnected, Coda initiates a reintegration process Performed one volume at a time Venus ships replay log to all volumes Each volume performs a log replay algorithm Only care write/write confliction Succeed? Yes. Free logs, reset priority No. Save logs to a tar. Ask for help
22
Performance 22 Duration of Reintegration A few hours disconnection 1 min But sometimes much longer Cache size 100MB at client is enough for a “typical” workday Conflicts No Conflict at all! Why? Over 99% modification by the same person Two users modify the same obj within a day: <0.75%
23
Coda Summary 23 Puts scalability and availability before data consistency Unlike NFS Assumes that inconsistent updates are very infreque nt Introduced disconnected operation mode and file h oarding
24
Today's Lecture 24 Other types of DFS Coda – disconnected operation Programming assignment 4 Note: Slides and project borrowed from David Andersen (CMU)
25
Filesystems Last time: Looked at how we could use RPC to split filesystem functionality between client and server But pretty much, we didn’t change the design We just moved the entire filesystem to the server and then added some caching on the client in various ways
26
You can go farther... But it requires ripping apart the filesystem functionality into modules and placing those modules at different computers on the network So now we need to ask... what does a filesystem do, anyway?
27
Well, there’s a disk... disks store bits. in fixed-length pieces called sectors or blocks but a filesystem has... files. and often directories. and maybe permissions. creation and modification time. and other stuff about the files. (“metadata”)
28
Filesystem functionality Directory management (maps entries in a hierarchy of names to files-on-disk) File management (manages adding, reading, changing, appending, deleting) individual files Space management: where on disk to store these things? Metadata management
29
Conventional filesystem Useful concepts: [pictures] “Superblock” -- well-known location on disk where top-level filesystem info is stored (pointers to more structures, etc.) “Free list” or “Free space bitmap” -- data structures to remember what’s used on disk and what’s not. Why? Fast allocation of space for new files. “inode” - short for index node - stores all metadata about a file, plus information pointing to where the file is stored on disk Directory entries point to inodes “extent” - a way of remembering where on disk a file is stored. Instead of listing all blocks, list a starting block and a range. More compact representation, but requires large contiguous block allocation.
30
Filesystem “VFS” ops VFS: (‘virtual filesystem‘): common abstraction layer inside kernels for building filesystems -- interface is common across FS implementations Think of this as an abstract data type for filesystems has both syntax (function names, return values, etc) and semantics (“don’t block on this call”, etc.) One key thing to note: The VFS itself may do some caching and other management... in particular: often maintains an inode cache
31
FUSE The lab will use FUSE FUSE is a way to implement filesystems in user space (as normal programs), but have them available through the kernel -- like normal files It has a kinda VFS-like interface
32
Figure from FUSE documentation
33
Directory operations readdir(path) - return directory entries for each file in the directory mkdir(path) -- create a new directory rmdir(path) -- remove the named directory
34
File operations mknod(path, mode, dev) -- create a new “node” (generic: a file is one type of node; a device node is another) unlink(path) -- remove link to inode, decrementing inode’s reference count many filesystems permit “hard links” -- multiple directory entries pointing to the same file rename(path, newpath) open -- open a file, returning a file handle , read, write truncate -- cut off at particular length flush -- close one handle to an open file release -- completely close file handle
35
Metadata ops getattr(path) -- return metadata struct chmod / chown (ownership & perms)
36
Back to goals of DFS Users should have same view of system, be able to share files Last time: Central fileserver handles all filesystem operations -- consistency was easy, but overhead high, scalability poor Moved to NFS and then AFS: Added more and more caching at client; added cache consistency problems Solved using timeouts or callbacks to expire cached contents
37
Scaling beyond... What happens if you want to build AFS for all of KAIST? More disks than one machine can handle; more users than one machine can handle Simplest idea: Partition users onto different servers How do we handle a move across servers? How to divide the users? Statically? What about load balancing for operations & for space? Some files become drastically more popular?
38
“Cluster” filesystems Lab inspired by Frangipani, a scalable distributed filesystem. Think back to our list of things that filesystems have to do Concurrency management Space allocation and data storage Directory management and naming
39
Frangipani design Program Frangipani file server Distributed lock service Petal distributed virtual disk Physical disks Petal aggregates many disks (across many machines_ into one big “virtual disk”. Simplifying abstraction for both design &implementation. exports extents - provides allocation, deallocation, etc. Internally: maps (virtual disk, offset) to (server, physical disk, offset) Frangipani stores all data (inodes, directories, data) in petal; uses lock server for consistency (eg, creating file)
40
Consequential design
41
Compare with NFS/AFS In NFS/AFS, clients just relay all FS calls to the server; central server. Here, clients run enough code to know which server to direct things to; are active participants in filesystem.
42
Programming Assignment: YFS Yet-another File System. :) Simpler version of what we just talked about: only one extent server (you don’t have to implement Petal; single lock server)
43
Each server written in C++ yfs_client interfaces with OS through fuse Following labs will build YFS incrementally, starting with the lock server and building up through supporting file & directory ops distributed around the network
44
Warning This lab is difficult. Assumes a bit more C++ Please please please get started early; ask course staff for help. It will not destroy you; it will make you stronger. But it may well take a lot of work and be pretty intensive.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.