Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction Details on actual remote file systems CIFS NFS AFS.

Similar presentations


Presentation on theme: "Introduction Details on actual remote file systems CIFS NFS AFS."— Presentation transcript:

1 Examples of Remote File Systems CS 188 Distributed Systems January 29, 2015

2 Introduction Details on actual remote file systems CIFS NFS AFS

3 Common Internet File System
Originally a proprietary Microsoft Protocol For use in Windows environments Now a standard usable on most platforms Designed to enable “work group” computing Group of PCs sharing same data, printers Any PC can export its resources to the group They chose a peer solution Though they treat it as client/server Any machine can act as client or server Work group is the union of those resources

4 CIFS Architecture Standard remote file access architecture
Based on SMB protocol State-full per-user client/server sessions Password or challenge/response authentication Server tracks open files, offsets, updates Makes server fail-over much more difficult Opportunistic locking Client can cache file if nobody else using/writing it Otherwise all reads/writes must be synchronous Servers regularly advertise what they export Enabling clients to “browse” the workgroup

5 Benefits of Opportunistic Locking
A big performance win Getting permission from server before each write is a huge expense In both time and server loading If no conflicting file use 99.99% of the time, opportunistic locks greatly reduce overhead When they can’t be used, CIFS does provide correct centralized serialization

6 CIFS Pros and Cons Performance/Scalability Transparency
Opportunistic locks enable good performance when shared access is rare Otherwise, forced synchronous I/O is slow Transparency Very good, especially the global name space Conflict Prevention File/record locking and synchronous writes work well Robustness State-full servers make seamless fail-over difficult

7 The Network File System (NFS)
Transparent, heterogeneous file system sharing Local and remote files are indistinguishable Peer-to-peer and client-server sharing Disk-full clients can export file systems to others Able to support diskless (or dataless) clients Minimal client-side administration High efficiency and high availability Read performance competitive with local disks Scalable to huge numbers of clients Seamless fail-over for all readers and some writers

8 NFS Example / / foo bar A B x y one two two two open(/bar/two)
open(/A/two) mount(bar,node2,A) Node 1: NFS Client Node 2: NFS Server / / foo bar A B x y one two two two

9 NFS Implementation Code at both client and server
Client code implements a virtual file system Translates opens, reads, etc. into RPC operations Server converts incoming RPC requests to operations on local files

10 NFS Handles When a file is opened at the client, the NFS server creates a file handle Opaque to that client Meaningful to the server Client names file by providing handle to server File handles can become stale Typically when file they point to disappears/changes inode numbers

11 NFS Processes In addition to virtual file system/RPC code, NFS uses long-running processes At the application level, but usually only stubs Which call special NFS kernel code nfsd daemons - server daemons that accept RPC calls for NFS rpc.mountd daemons - server daemons that handle mount requests biod daemons - optional client daemons that can improve performance

12 The NFS Protocol Relies on idempotent operations and stateless server
Built on top of a remote procedure call protocol With eXternal Data Representation, server binding Versions of RPC over both TCP or UDP Optional encryption (may be provided at lower level) Scope – most normal file operations Lookup (open), read, write, read-directory, stat, etc. Some operations not quite the same as local Supports client or server-side authentication Supports client-side caching of file contents Locking and auto-mounting done with another protocol

13 NFS From the Client Side
User issues a normal file operation Like chmod() Passes through VFS to client-side NFS implementation Client-side implementation formats and sends RPC packet to server Actually, arranges that client process sends it, so client process blocks

14 NFS From the Server Side
Server side’s file system isn’t NFS EXT3, BTRFS, or some other local file system, typically working off disk This may be a very different file system than what’s on the client rpc.mountd and nfsd map incoming RPC calls into VFS calls on local file system Again, most of the code in the kernel Servers keep no state on previous operations So NFS server operations must be self-contained

15 Implications of Statelessness
RPC requests must completely describe operations NFS requests must be idempotent Stateless transport protocol (e.g., UDP) is OK, at least for small requests Servers need not worry about client crashes Server crashes won’t leave junk lying around

16 One Very Important Implication of NFS Statelessness
Servers don’t know what files clients think are open Unlike many other remote file systems Like CIFS Makes it harder to provide certain semantics to the remote users But easier for normal server operations And recovery from failures

17 Sleazy NFS Tricks NFS does lots of tricks to make it look like normal POSIX file semantics E.g., if client unlinks file he has open, send rename to server rather than remove Perform actual remove when file is closed Won’t work if file removed on server What happens if client crashes?

18 NFS Performance How does NFS avoid always going across the net?
Obviously, cache the data on the client Done through an internal buffer cache NFS knows what it has kept there Responds from cache, when it can Different caching strategies for data and metadata biod does read-ahead for sequential access Tending to pre-fill the cache

19 Why is Caching File Data Important?
Reads often done in small increments E.g., 128 bytes Each network round trip involves multiple RPC packets Which is expensive NFS client usually asks server for much more data Say, 8K bytes Which it stores internally If client wants the next 128 bytes, no need to go over the network

20 NFS File Attribute Caching
Attribute caching very important for performance Many applications get and set file attributes frequently So they need to do it fast NFS internally caches attributes Changes to cached attributes not written back immediately Typically after seconds

21 NFS Authentication How can we trust NFS clients to authenticate themselves? NFS not not designed for direct use by user applications It permits one operating system instance to access files belonging to another OS instance If we trust the remote OS to see the files, might as well trust it to authenticate the user Obviously, don’t use NFS if you don’t trust the remote OS . . .

22 NFS and Updates An NFS server does not prevent conflicting updates
As with local file systems, this is application’s job Auxiliary server/protocol for file and record locking All leases are maintained on the lock server All lock/unlock operations handed by lock server Locking integrated into basic protocol in NSF version 4 Client/network failure handling Server can break locks if client dies or times out “Stale-handle” errors inform client of broken lock Client response to these errors are application specific Lock server failure handling is very complex What are the advantages of handling locking in a different protocol than file access?

23 NFS Pros and Cons Transparency/Heterogeneity
Local/remote transparency is excellent NFS works with all major OSes and FSes Performance Read performance may be better than local disk Write performance slower than local disk Robustness Transparent fail-over capability for readers Recoverable fail-over capability for writers

24 The Andrew File System AFS Developed at CMU
Designed originally to support student and faculty use Generally, large numbers of users of a single organization Uses a client/server model Makes use of whole-file caching

25 Basic AFS Approach Use dedicated file server machines to store files
Several, to share the load All files stored at servers permanently Except workstation config and temporary files Users’ personal files stored at servers Only make files available to client workstations on demand Assume reasonable level of reliability and connectivity

26 AFS Basics Designed for scalability, performance
Large numbers of clients (~5-10K) and few servers Needed performance of local file systems Very low per-client load imposed on servers No administration or back-up for client disks Master files reside on a file server Local file system is used as a local cache Local reads satisfied from cache when possible Files are only read from server if not in cache Simple synchronization of updates

27 AFS Architecture client server Andrew cache mangaer Andrew Agent
socket I/O socket I/O UDP TCP UDP TCP EXT3 FS EXT3 FS Andrew Relay IP IP MAC driver MAC driver block I/O block I/O NIC driver NIC driver disk driver disk driver local FS (cache only) remote server file system

28 Server File Storage Each file is stored at one server
Files organized into hierarchical subtrees All servers maintain a map of which subtrees are at which servers Clients asking any server for a file can be directed to the right server

29 Multiple File Copies in AFS
Server always keeps a copy of each file Multiple clients might also be caching a copy Clients check for local copies in cache at open time If no local copy exists, fetch it from server If local copy exists, see if it is still up-to-date Compare file size and modification time with server Optimizations reduce overhead of checking Subscribe/broadcast change notifications Time-to-live on cached file attributes and contents

30 AFS and Updates Updates made directly to the local cached copy (only)
Send updates to server when file is closed Wait for all changes to be completed File may be deleted before it is closed E.g., temporary files that servers need not know about When server receives update, uses callback mechanism to provide consistency

31 AFS Callbacks Servers keep track of who cached a file
If one cached copy is updated, cache invalidation messages sent to all others Clients receiving a callback message discard their cached copy If further activity on file, get a new copy from the server There could be problems . . .

32 AFS Pros and Cons Performance and Scalability Robustness Transparency
All file access by user/applications is local Update checking (with time-to-live) is relatively cheap Both fetch and update propagation are very efficient Minimal per-client server load (once cache filled) Robustness No server fail-over, but have local copies of most files Transparency Mostly perfect - all file access operations are local Pray that we don't have any update conflicts AFS is still fairly widely used Is this really a good tradeoff? Would it be if Andrew supported disconnected clients, like portable computers? What then?

33 A Diversion Into Generality
The problem AFS addresses via callbacks must be addressed by any remote file system What is the nature of the problem? What are the choices for addressing it? What choices do real systems choose?

34 Illustrating the Problem
File server file foo We might be in trouble open(foo) open(foo) write(foo) write(foo) File client A File client B

35 But B has a different version of foo
What Happens Next? File server What does system do now? But B has a different version of foo close(foo) File client A File client B

36 Caching records allow server to perform callbacks
AFS Callbacks AFS server File foo: Client A Caching records allow server to perform callbacks Client B invalidate(foo) AFS client A AFS client B What does B do now?

37 The AFS Solution Allow conflicts to occur
Locking can be used to prevent them But AFS locking is advisory, not mandatory Conflicts handled at the client Originally action not specified Later versions of AFS included automated and manual conflict resolution tools

38 Conflicts and Other File Systems
This problem is not unique to AFS Any file system that allows caching faces this issue Possible options: Don’t allow multiple nodes to cache Invalidate all cached copies before writes Obtain locks to ensure no conflicts Detect and handle conflicts Allow multiple versions of a file

39 Implications of the Choices
Allow only one node to cache No conflicts possible, so good consistency Only one site can access file at a time, so concurrency is poor Requires records at server Complications in face of failures

40 Implications of the Choices
Invalidate all cached copies before write No conflicts possible, so good consistency Concurrency good when nobody writes Updates delayed while handling invalidation of other copies Requires records at server Complications in face of failures

41 Implications of the Choices
Obtain locks to ensure no conflicts No conflicts possible, so good consistency Good concurrency if no one obtains a write lock Updates not delayed (once lock obtained) Introduces possibility of deadlock Leases remove that Requires records at server Complications in face of failures

42 Implications of the Choices
Detect and handle conflicts Conflicting updates are possible and may be hard to handle Possibly requiring human intervention Good concurrency for reads and writes Minimal recordkeeping at server Maybe none in some cases Failures increase chances of problems, but otherwise no extra complications

43 Implications of the Choices
Allow multiple versions of a file Excellent concurrency for both read and write No extra complications for failures Weird model of file behavior that can confuse users Requires substantial recordkeeping

44 Some Other Systems’ Choices
CIFS – use locking to prevent concurrency problems (option 3) NFS – allow concurrent updates, but minimize chances and detect them (option 4) Ficus – similar to AFS Lotus Notes – treat updates as “comments” on original (option 5)

45 One More (Crazy?) Option
Allow anyone to write their cached copy Detect conflicting writes Use some precedence mechanism to select one write to keep Roll back all the others What about actions based on those writes occurring, though?

46 A Generalization To Storage Issues
We were discussing distributed caching Where there is one node that has the one true copy of the data What if that’s not the case? What if there is no primary copy? How do our solutions change?

47 Making It More General This issue isn’t just about caching for file systems The cached copies are distributed state The underlying issue is consistency of distributed state So the problem arises in other distributed systems problems Including problems not related to distributed storage

48 For Example, Four nodes are performing a distributed computation
There are two different algorithms they could be using One works well for simple data One is more suitable for complex data What if 2 nodes choose to use algorithm A and 2 nodes choose to use algorithm B? A very similar problem

49 Something to Bear In Mind
The example shown had two cached copies Your solution must work for more than two Unless your system only supports two users Ideally, it should work for as many copies as is possible This point is true for any proposed solution to any distributed systems problem

50 Comparing Sample Systems
How do: CIFS NFS AFS handle similar problems with similar goals?

51 NFS Vs. CIFS Functionality NFS is more portable (platforms, OS, FS)
CIFS provides much better write serialization Performance and robustness NFS provides greater read scalability NFS has much better fail-over characteristics Security NFS supports more security models CIFS gives the server better authorization control

52 AFS vs. NFS Functionality Performance and robustness Ease of use
AFS not very portable Both designed for continuous connection client/server NFS supports diskless clients without local file systems Performance and robustness AFS generates much less network traffic, server load They yield similar client response times AFS highly robust if servers don’t fail; client failures no problem Ease of use NFS provides for better transparency NFS has enforced locking and limited fail-over Security AFS requires strong authentication to trusted servers NFS requires more support in operating system

53 AFS vs. CIFS CIFS is more widely usable CIFS has better consistency
But higher overheads CIFS has more issues with client failures (Potentially) similar security properties

54 Some Other Possibilities
What if the machines sharing files are portable and not always connected? What if the machines communicate across the Internet? What if the load on some files is too heavy for a single machine? What if we have poor trust in some remote machines?

55 Conclusion Real remote file systems tend to be designed for normal cases They make engineering choices on that basis Performance usually trumps consistency and correctness Limiting state is a popular option


Download ppt "Introduction Details on actual remote file systems CIFS NFS AFS."

Similar presentations


Ads by Google