Distributed File Systems Chad Griffith Characteristics Present Work Future Work
Key Characteristics Dispersion of Users and Files Multiplicity of Users and Files
Transparency (Dispersed Users) Login Transparency Uniform login Uniform file system view Access Transparency Uniform file access, local or remote
Dispersed Files Location transparency Location independence
Multiplicity of Users Concurrency Transparency File sharing between multiple concurrent users NO adverse effects from this Transaction based requires appearance of isolation Concurrency control Ensures concurrent execution of a transaction
Multiplicity of Files Files may be replicated for: Redundancy Concurrent access for efficiency Replication transparency Perform atomic updates on replicated files Users only ”see” 1 copy of the file
Other Characteristics Applies to DFS and distributed systems Fault Tolerance Scalability Heterogeneity
Current Works TidyFS (Microsoft) For parallel computations on clusters Emphasizes simplicity and small size Has metadata server, node service, and TinyFS explorer Tighter integration vs generality
Current Works GFS (Google file system) Observance of app workloads and environment Emphasizes large files and datasets Appends new data vs modifying data Co-designed with the applications that are to be run on GFS
Current Works HDFS (HaDoop) Large files and datasets Streaming file access No appending of files yet Portability (more generalized) Master/slave architecture
Current Works Tahoe-LAFS Peer to peer application Pools HD space with friends Auto encryption Open source (GPL license) Central node needed still
Future Works OS Independent DFS Communication independent DFS Can detect file system and type and read from any system Possibly can even learn about new file systems independently or from online accessible database Communication independent DFS File systems and communication systems will be more robust so that files can be accessed over different communications protocols
References Distributed Operating Systems & Algorithms, by Randy Chow and Theodore Johnson, 1997. Http://research.microsoft.com/jump/81486 Http://labs.google.com/papers/gfs.html hadoop.apache.org/