Download presentation
Presentation is loading. Please wait.
Published byJewel Jefferson Modified over 9 years ago
1
FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment A. Atta, W. J. Bolowsky, M. Castro, G. Cermak, R. Chaiken, J. R. Douceur, J. Howell, J. R. Lorch, M. Theimer, R. P. Wattenhoffer Microsoft Research
2
Paper highlights Paper discusses a distributed file system lacking a central server –Files and directories reside on client machines –Files are encrypted and replicated –Directory metadata are maintained by Byzantine-replicated finite state machines
3
Serverless file systems Idea is not new –xFS (Anderson et al. SOSP 1995) Objective is to utilize free disk space and processing power of client machines Two major issues are –Availability of files –Security
4
Design assumptions (I) 1.Farsite is intended to run on the desktops of a large corporation or a university: –Maximum scale of ~10 5 machines –Interconnected by a high-bandwidth low- latency network –Most machines up most of the time –Uncorrelated machine failures
5
Design assumptions (II) 2.No files are both –Read by many users and –Frequently updated by at least one user (very infrequent in Windows NT file system) 3.Small but significant fraction of users will maliciously attempt to destroy or corrupt file data and metadata
6
Design assumptions (III) 4.Large fraction of users may independently attempt unauthorized accesses 5.Each machine is under the control of its immediate user –Cannot be subverted by other people 6.No user sensitive data persist after logout or system reboot –Not true for any commodity OS
7
Enabling technology trends (I) 1.General increase in unused disk capacity: for 4800 desktops at Microsoft research YearUnused disk space 1998 49% 1999 50% 2000 58%
8
Enabling technology trends (II) 2.Lowered cost of cryptographic operations: –Can now encrypt data at 72MB/s –Faster than disk sequential I/O bandwidth (32MB/s)
9
Namespace roots Farsite provides hierarchical directory namespaces –Each namespace has its own root –Each root has a unique root name –Each root is managed by a designated set of machines forming a Byzantine-fault-tolerant group No need for a protected set of machines
10
Trust and certification (I) Basic Requirements –Users must trust the machines that offer to present data or metadata –Machines must trust the validity of requests from remote users –System security must trust that machines that claim to be distinct are truly distinct To prevent Sybil attacks
11
Sybil attacks (Douceur 2002) Possible whenever redundancy is used to increase security Single rogue entity can –Pretend to be many and –End controlling a large part of the system Cannot prevent them without a logically centralized authority certifying identities
12
Trust and certification (II) Farsite manages trust through public-key cryptographic certificates – Namespace certificates – User certificates – Machine certificates
13
Trust and certification (III) Bootstrapped by fiat : –Machines told to accept certificates that can be authenticated with some public keys –Associated private keys are called Certification Authorities (CA) Certificates created either by CAs themselves or by users authorized to create certificates
14
Trust and certification (IV) User private keys are –Encrypted with a symmetric key derived from user password –Stored in a globally-readable directory in Farsite Does not require users to modify their behavior User or machine keys can be revoked
15
Handling malicious behaviors Most fault-tolerant file systems do not protect users’ files against malicious behaviors of hosts They assume that a host will either behave correctly or crash Malicious behaviors are often called Byzantine failures –One or more hosts act as if they were controlled by very clever traitors
16
System architecture (I) Each Farsite client will deal with two different sets of hosts –A set of machines constituting a directory group –A set of machines acting as file hosts In practice these three roles are shared by all machines
17
Client File Host Member Directory Group Client sees one directory group System architecture (II)
18
The directory group (I) Replicates directories on directory members Directory integrity enforced through a Byzantine-fault-tolerant protocol – Works as long as less than one-third of the hosts misbehave in any manner (“traitor) –Requires a minimum of four hosts to tolerate one misbehaving host
19
The directory group (II) Decisions for all operations that are not determined by the client request are made through a cryptographically secure distributed random number generator Issues leases on files to clients –Promise not to allow any incompatible access to the file during the duration of the lease without notifying the client
20
The directory group (III) Directory groups can split : –Randomly select a group of machines they know –Tell them to form a new directory group –Delegate a portion of their namespace to new group Both user and directory group mutually authenticate themselves
21
The file hosts (I) Farsite stores encrypted replicas of each file to ensure file integrity and file availability Continuously monitors host availability and relocates replicas whenever necessary Does not allow all replicas of a given file to reside on hosts owned by the same user Files that were recently accessed by a client are cached locally (for “roughly one week ”)
22
The file hosts (II) Farsite does not use voting: –Correct replicas are identified by the directory host Farsite does not update at once all replicas of a file: –Would be too slow –Uses instead a background update mechanism
23
Semantic differences Unlike NTFS, Farsite –Puts a limit on the number of clients that can have a file open for write –Allows a directory to be renamed even if there is an open handle on a file in the directory or any of its descendents –Uses background—”lazy”—propagation of directory updates
24
Reliability and availability (I) Trough redundancy –Metadata stored in a directory group of R D members remain accessible if no more than R D - 1 / 3 members fail –Data replicated on R F file hosts remain accessible as long as one of these hosts remains alive
25
Reliability and availability (II) Farsite migrates duties of machines that have been unavailable for a long period of time to new machines ( regeneration ) –More aggressive approach to directory migration than to file-host migration Farsite continuously monitors host availability and relocates replicas whenever necessary Client cache files for a week after last access
26
Security (I) Write access control enforced through Access Control Lists managed by directory group –Requires Byzantine agreement Read access control achieved through strong cryptography –File is encrypted with symmetric file key –File key is encrypted with public keys of all authorized users
27
Security (II) Same technique is applied to directory names –Members of directory group cannot read them To ensure file integrity, Farsite stores a copy of a Merkle hash tree over the file data blocks in the directory group that manages the file’s metadata
28
What is a Merkle hash tree? (I) Consider a file made up of four blocks: A, B, C and D We successively compute: –a =leaf_hash(A), …, d = leaf_hash(D) –p = inner_hash( a, b), q = inner_hash( c, d) –r = inner_hash( p, q) Recomputing r (the root hash) an comparing it with its supposed value will detect any tampering
29
What is a Merkle hash tree? (II) ABCD a=leaf_hash(A)b=leaf_hash(B)d =leaf_hash(D)c=leaf_hash(C) q=inner_hash(c, d)p=inner_hash(a, b) r=inner_hash(p,q)
30
Durability (I) File creations, deletions and renames are not immediately forwarded to directory group –High cost of Byzantine protocol First stored in a log on client –Much as in Coda disconnected mode Log is pushed back to directory group –At fixed intervals –Whenever a lease is recalled
31
Durability (II) When a client reboots, it needs to send its committed updates to the directory group and have them accepted as authentic –Client will generate an authenticator key which it will distribute among members of the directory group –Can use this key to sign each committed update
32
Consistency (I) Directory group uses a lease mechanism: – Data read/write leases – Data read-only leases Concurrent write accesses are handled by redirecting them to a single client machine –Guarantees correctness –Non scalable
33
Consistency (II) Leases have variable granularity –Single file –Entire subtree No good way to handle read/write lease expiration on a disconnected client The fundamental paper on leases is C. G. Gray,.D. R. Cheriton: Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency. SOSP 1989: pp. 202-210
34
Consistency (III) Special name leases for files and directories –A name lease on a directory allows holder to create files and subdirectories under that directory with any non-extant name More special-purpose leases were introduced to implement Windows file sharing semantics
35
Scalability Ensured through – Hint-based pathname translation: Hints are data items that are useful when they are correct and cause no harm when they are incorrect Think of a phone number – Delayed-directory change notification
36
Efficiency Space efficiency: –Almost 50% of disk space could be reclaimed by eliminating duplicate files –Farsite detects files with duplicate contents and co-locates them in same set of file hosts Performance: –Achieved through caching and delaying updates
37
Evaluation Designed to scale up to 10 5 machines –Roughly 300 new machines per day Andrew benchmark two times slower than NTFS Still to do –Implement disk quotas –Have mechanism to measure machine availability
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.