Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed File Systems.

Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Distributed File Systems

CMSC 621, Fall 2003 2 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 DFS A distributed file system is a module that implements a common file system shared by all nodes in a distributed system DFS should offer network transparency high availability key DFS services file server (store, and read/write files) name server (map names to stored objects) cache manager (file caching at clients or servers)

CMSC 621, Fall 2003 3 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 DFS Mechanisms Mounting Caching Hints Bulk data transfers

CMSC 621, Fall 2003 4 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 DFS mechanisms mounting name space = collection of names of stored objects which may or may not share a common name resolution mechanism binds a name space to a name (mount point) in another name space mount tables maintaining the map of mount points to stored objects mount tables can be kept at clients or servers caching amortize access cost of remote or disk data over many references can be done at clients and/or servers can be main memory or disk caches helps to reduce delays (disk or network) in accessing stored objects helps to reduce server loads and network traffic

CMSC 621, Fall 2003 5 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 DFS mechanisms hints caching introduces the problem a cache consistency ensuring cache consistency is expensive cached info can be used as a hint (e.g. mapping of a name to a stored object) bulk data transfers overhead in executing network protocols is high network transit delays are small solution: amortize protocol processing overhead and disk seek times and latencies over many file blocks

CMSC 621, Fall 2003 6 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Name Resolution Issues naming schemes host:filename simple and efficient no location-transparency mounting single global name space uniqueness of names requires cooperating servers Context-aware partition the name space into contexts name resolution is always performed with respect to a given context name servers single name server different name servers for different parts of a name space

CMSC 621, Fall 2003 7 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Caching Issues Main memory caches faster access diskless clients can also use caching single design for both client and server caches compete with Virtual Memory manager for physical memory can not completely cache large stored objects block-level caching is complex to implement can not be used by portable clients Disk caches remove some of drawbacks of the main memory caches

CMSC 621, Fall 2003 8 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Caching Issues Writing policy write-through every client’s write request is performed at the servers immediately delayed writing client’s writes are reflected to the stored objects at servers after some delay many writes in the cache writes to short-lived objects are not done at servers 20-30% of new data are deleted within 30 secs lost data is an issue delayed writing until file close most files are open for a short time

CMSC 621, Fall 2003 9 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Caching Issues Approaches to deal with the cache consistency problem server-initiated servers inform client cache managers whenever their cached data become stale servers need to keep track who cached which file blocks client-initiated clients validate data with servers before using partially negates caching benefits disable caching when concurrent-write sharing is detected concurrent-write sharing: multiple clients opened a file with at least one of them opened for writing avoid concurrent-write sharing by using locking

CMSC 621, Fall 2003 10 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 More Caching Consistency Issues The sequential-write sharing problem occurs when a client opens a (previously opened) file that has recently been modified and closed by another client causes problems A client may still have (outdated) file blocks in its cache Other client may have not written its modified cached file blocks to file server solutions associate file timestamps with all cached file blocks; at file open request current file timestamp from file server file server asks the client with the modified cached blocks to flush its data to server when another client opens a file for writing

CMSC 621, Fall 2003 11 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Availability Issues Replication can help in increasing data availability is expensive due to extra storage for replicas and due to overhead in maintaining the replicas consistent Main problems maintaining replica consistency detecting replica inconsistencies and recovering from them handle network partitions placing replicas where needed keep the rate of deadlocks small and availability high

CMSC 621, Fall 2003 12 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Availability Issues Unit of replication complete file or file block allows replication of only the data that are needed replica management is harder (locating replicas, ensuring file protection, etc) volume (group) of files wasteful if many files are not needed replica management simpler pack, a subset of the files in a user’s primary pack mutual consistency among replicas Let most current replica= replica with highest timestamp in a quorum Use voting to read/write replicas and keep at least one replica current Only votes from most current replicas are valid

CMSC 621, Fall 2003 13 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Scalability & Semantic Issues Caching & cache consistency take advantage of file usage patterns many widely used and shared files are accessed in read-only mode data a client needs are often found in another client’s cache organize client caches and file servers in a hierarchy for each file implement file servers, name servers, and cache managers as multithreaded processes common FS semantics: each read operation returns data due to the most recent write operation providing these semantics in DFS is difficult and expensive

CMSC 621, Fall 2003 14 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 NFS

CMSC 621, Fall 2003 15 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 NFS Interfaces file system virtual file system (VFS) vnodes uniquely identify objects in the FS contain mount table info (pointers to parent FS and mounted FS) RPC and XDR (external data representation)

CMSC 621, Fall 2003 16 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 NFS Naming and Location Filenames are mapped to represented object at first use mapping is done at the servers by sequentially resolving each element of a pathname using the vnode information until a file handle is obtained

CMSC 621, Fall 2003 17 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 NFS Caching File Caching read ahead and 8KB file blocks are used files or file blocks are cached with timestamp of last update cached blocks are assumed valid for a preset time period block validation is performed at file open and after timeout at the server upon detecting an invalid block all blocks of the file are discarded delayed writing policy with modified blocks flushed to server upon file close

CMSC 621, Fall 2003 18 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 NFS Caching Directory name lookup caching directory names ==> vnodes cached entries are updated upon lookup failure or when new info is received File/Directory attribute cache access to file/dir attributes accounts for 90% of file requests file attributes are discarded after 3 secs dir. Attributes are discarded after 30 secs dir. Changes are performed at the server NFS servers are stateless

CMSC 621, Fall 2003 19 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Sprite File System Name space is a single hierarchy of domains Each server stores one or more domains Domains have unique prefixes mount points link domains in single hierarchy clients maintain prefix table

CMSC 621, Fall 2003 20 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Sprite FS - Prefix tables locating files in Sprite each client finds longest prefix match in its prefix table and then sends remaining of pathname to the matching server together with the domain token in its prefix table server replies with file token or with a new pathname if the “file” is a remote link each client request contains the filename and domain token when client fails to find matching prefix or fails during a file open client broadcasts pathname and server with matching domain replies with domain/file token entries in prefix table are hints

CMSC 621, Fall 2003 21 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Sprite FS - Caching Client-cache in main memory file block size is 4KB cache entries are addressed with file token and block#, which allows blocks to be added without contacting the server blocks can be accessed without accessing file’s disk map to get block’s disk address clients do not cache directories to avoid inconsistencies servers have main memory caches as well delayed writing policy is used

CMSC 621, Fall 2003 22 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Sprite FS - Cache Writing Policy Observations BSD 20-30% of new data live less than 30 secs 75% of files are open for less than 0.5 secs 90% of files are open for less than 10 secs recent study 65-80% of files are open for less than 30 secs 4-27% of new data are deleted within 30 secs One can reduce traffic by not updating servers at file close immediately not updating servers when caches are updated

CMSC 621, Fall 2003 23 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Sprite Cache Writing Policy Delayed writing policy every 5 secs flush client’s cached (modified) blocks to server if they haven’t been modified within the last 30 secs flush blocks from server’s cache to disk within 30-60 secs afterwards replacement policy: LRU 80% of time blocks ejected to make room for other blocks 20% of time to return memory to VM cache blocks are unreferenced for about 1hr before ejected cache misses 40% on reads and 1% on writes

CMSC 621, Fall 2003 24 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Sprite Cache Consistency Server initiated avoid concurrent-write sharing by disabling caching for files open concurrently for reading and writing ask client writing file to flush its blocks inform all other clients that file is not cacheable file becomes cacheable when all clients close the file again solve sequential-write sharing using version numbers each client keeps the version# of file whose blocks it caches server increments version# each time file is opened for writing client is informed of file version# at file open server keeps track of last writer; server asks last writer to flush its cached blocked if file is opened by another client

CMSC 621, Fall 2003 25 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Sprite VM and FS Cache Contention VM and FS compete for physical memory VM and FS negotiate for physical memory usage separate pools of blocks using the time of last access to determine winner; VM is given slight preference (it losses only if a block hasn’t been referenced for 20 mins) double caching is a problem FS marks blocks of newly compiled code with infinite time of last reference backing files=swapped-out pages (including process state and data segments) clients bypass FS cache when reading/writing backing files

CMSC 621, Fall 2003 26 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 CODA Goals scalability availability disconnected operation Volume = collection of files and directories on a single server unit of replication FS objects have a unique FID which consists of 32-bit volume number 32-bit vnode number 32-bit uniquifier replicas of a FS object have the same FID

CMSC 621, Fall 2003 27 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 CODA Location Volume Location database replicated at each server Volume Replication database replicated at each server Volume Storage Group (VSG) Venus client cache manager caching in local disk AVSG=client accessible nodes in VSG preferred server in AVSG

CMSC 621, Fall 2003 28 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 CODA Caching & Replication Venus caches files/dirs on demand from the server in AVSG with the most up-to-date data on file access users can indicated caching priorities for file/dirs users can bracket action sequences Venus established callbacks at preferred server for each FS object Server callbacks server tells client that cached object is invalid lost callbacks can happen

CMSC 621, Fall 2003 29 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 CODA AVSG Maintenance Venus tracks changes in AVSG new nodes in VSG that should or should not be in its AVSG by periodically probing every node in VSG removes a node from AVSG if operation fails chooses a new preferred server if needed Coda Version Vector (CVV) both for volumes and files/dirs vector with one entry for each node in VSG indicating the number of updates of the volume or FS object

CMSC 621, Fall 2003 30 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Coda Replica Management State of an object or replica each modification is tagged with a storeid update history = sequence of storeids state is a truncated update history latest storeid LSID CVV

CMSC 621, Fall 2003 31 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Coda Replica Management comparing replicas A & B leads to one of four cases LSID-A = LSID-B & CVV-A = CVV-B => strong equality LSID-A = LSID-B & CVV-A != CVV-B => weak equality LSID-A != LSID-B & CVV-A >= CVV-B => A dominates B otherwise => inconsistent when S receives an update for a replica C checks the state of S and C; test is successful if for files, it leads to strong equality or dominance for dirs, it leads to strong equality

CMSC 621, Fall 2003 32 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Coda Replica Management When C wants to update a replicated object phase I sent update to every node in its AVSG each node performs a check of replica states (cached object and replicated object), and informs the client of the result, and performs the update if successful if unsuccessfull, pauses client, server tries to resolve problem automatically, if not then client aborts else client resumes phase II client sends updated object state to every site in AVSG

CMSC 621, Fall 2003 33 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Coda Replica Management Force operation between servers happens when Venus informs AVSG of weak consistency in AVSG server with dominant replica overwrites data and state of dominated server for directories is done with the help of locking one directory at a time repair operation automatic; proceeds in two phases as in an update migrate operation moves inconsitent data to a covolume for manual repair

CMSC 621, Fall 2003 34 URL: http://www.csee.umbc.edu/~kalpakis/Courses/621 Conflict Resolution Conflicts between files are done by the user using the repair tool which bypasses Coda update rules; inconsistent files are inaccessible to CODA directories uses the fact that a dir is a list of files non-automated conflicts update/update (for attributes) remove/update create/create (adding identical files) all other conflicts can be resolved easily inconsistent objects and objects without automatic conflict resolution are placed in covolumes

Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed File Systems.

Similar presentations

Presentation on theme: "Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed File Systems."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed File Systems.

Similar presentations

Presentation on theme: "Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed File Systems."— Presentation transcript:

Similar presentations

About project

Feedback