DISTRIBUTED FILE SYSTEM SUMMARY RANJANI SANKARAN
Outline Characteristics of DFS DFS Design and Implementation Transaction and Concurrency Control Data and File Replication Current Work Future Work
DFS Characteristics Dispersion Dispersed Files Location Transparent Location Independent Dispersed Clients login transparency access transparency Multiplicity Multiple Files Replication Transparency Multiple Clients Concurrency Transparency Others (general) Fault Tolerance – crash of server or client, loss of message Scalability – Incremental file system growth
DFS STRUCTURE[3] DFS Root-Top level ; Holds links to shared folders in a Domain DFS Link- Share under the root; Link redirects to shared folder DFS Replicas or Targets- identical shares on 2 servers can be grouped together as Targets under one link.
MAPPING OF LOGICAL AND PHYSICAL FOLDERS[2]
DFS Design and Implementation Problems –File Sharing and File Replication File and File Systems File name -Mapping symbolic name to a unique file id (ufid or file handle) which is the function of directory service. File attributes -ownership, type, size, timestamp, access authorization information. Data Units - Flat / Hierarchical Structure File Access - sequential, direct, indexed- sequential
COMPONENTS IN A FILE SYSTEM directory service name resolution, add and deletion of files authorization service capability and / or access control list file service transaction concurrency and replication management basic read / write files and get / set attributes system service device, cache, and block management
Overview of FS Services DIRECTORY SERVICE – Search, Create, Delete, Rename files, mapping and locating, list a directory, traverse the file system. AUTHORIZATION SERVICE – Authorized access for security ; Read, Write, Append, Execute, Delete, List operations FILE SERVICE – Transaction Service : Basic Service : Read, Write, Open, Close, Delete,Truncate, Seek SYSTEM SERVICE – Replication, Caching,Mapping of addresses etc. SERVICES and SERVERS Servers/Multiple Servers implement Services; Client Server relation ship is relative;
File Mounting Attach a remote named file system to the client’s file system hierarchy at the position pointed to by a path name Once files are mounted, they are accessed by using the concatenated logical path names without referencing either the remote hosts or local devices. Location Transparent Linked information (mount table) is kept till they are unmounted. Different clients may perceive a different FS view – To achieve a global FS view – SA enforces mounting rules – Restrict/Allow mounting –Server’s export file.
Types of Mounting – Explicit mounting: clients make explicit mounting system calls whenever one is desired – Boot mounting: a set of file servers is prescribed and all mountings are performed the client’s boot time – Auto-mounting: mounting of the servers is implicitly done on demand when a file is first opened by a client
Server Registration The mounting protocol is not transparent – the initial mounting requires knowledge of the location of file servers Server registration – File servers register their services, and clients consult with the registration server before mounting – Clients broadcast mounting requests, and file servers respond to client’s requests
Stateful and Stateless File Servers Stateful file Server : file servers maintain state information about clients between requests Stateless file Server : when a client sends a request to a server, the server carries out the request, sends the reply, and then remove from its internal tables all information about the request – Between requests, no client-specific information is kept on the server – Each request must be self-contained: full file name and offset… State information could be: Opened files and their clients File descriptors and file handles Current file position pointers, mounting information Cache or buffer
File Access and Semantics of Sharing File Sharing – Overlapping access :Multiple copies of same file Cache or replication, Space Multiplexing Coherency Control: coherent view of shared files, managing access to replicas, atomic updates. – Interleaving access: Multiple granularities of data access operations Time Multiplexing Simple Read Write, Transaction, Session Concurrency Control: Prevent erroneous /inconsistent results during concurrent access
Semantics of Sharing/Replication Unix Semantics : Currentness : Writes propagated immediately so that reads will return latest value. Transaction Semantics: Consistency: Writes are stored and propagated when consistency constraints are met. Session Semantics:Efficiency:Writes done on a working copy; results made permanent during session close. REPLICATION Write Policies Cache Coherence Control Version Control
Transaction and Concurrency Control Concurrency Control Protocol required to maintain ACID Semantics for Concurrent transactions. Distributed Transaction Processing System: – Transaction Manager: correct execution of local and remote transactions. – Scheduler: schedules operations to avoid conflicts using locks, timestamps and validation managers. – Object Manager: coherency of replicas/caches; interface to the file system.
Transaction and Concurrency Control
Serializability A schedule is Serializable if the result of execution is equivalent to that of a serial schedule. (without cyclic hold-wait deadlock situations, holding conflicting locks etc.). In Transactions, the transaction states must be consistent. Conflicts – write-write: write-read: read-write on a shared object
Interleaving Schedules Sched (1,3) and (2,4) are trying to perform similar operations on data objects C and D. (1,2) and (3,4) order is only valid.
Concurrency Control Protocols Two Phase Locking: – Growing Phase, Shrinking Phase – Sacrifices concurrency and sharing for Serializability – Circular wait(deadlock) to : Write A=100 ; Write B =20 t1 : Read A,Read B 1. Write Sum in C;2.Write diff in D t2 : Read A, Read B 3. Write sum in D;3.Write diff in C Solution : Release locks as soon as possible Problem : Rolling aborts, Commit dependence Solution : Strict 2 Phase Locking Systems
Time Stamp Ordering – Logical timestamps or counters,unique timestamps for Txs. – Larger TS Txs wait for smaller TS Txs;Small TS Txs die and restart when confronting larger TS Txs. – t0 ( 50 ms) < t1 (100 ms)< t2 (200 ms); t0 : write A=100 ; Write B = 20 ; ->Completed t1 : Read A,Read B 1. Write Sum in C;2.Write diff in D t2 : Read A, Read B 3. Read Sum in C;3.Write diff in C
Time Stamp Ordering Concurrency Control RD and WR –Logical TS for last read/write Tmin is the minimum tentative time for pending write.
Optimistic Concurrency Control Allows entire transaction to complete and then validate the transaction before making its effect permanent Execution Phase,Validation Phase, Update Phase Optimistic Concurrency Control mechanism Validation : 2 Phase Commit Protocol by sending validation request to all TMs. Validated updates are committed in Update Phase.
Data and File Replication For Concurrent access and availability. GOAL One-copy Serializability: – The execution of transaction on replicated objects is equivalent to the execution of the same transactions on non-replicated objects – Read Operations : Read-one-primary, Read-one,Read- quorum – Write Operations:Write-one-primary,Write-all,Write-all- available,Write-quorum,Write-gossip Quorum Voting: Gossip Update Propagation Casual Order Gossip Protocol
ARCHITECTURE Client chooses one or more FSA to access data object. FSA acts as front end to replica managers RMs to provide replication transparency. FSA contacts one or more RMs for actual updating and reading of data objects.
Quorum Voting/Gossip Update Propagation Quorum Voting : Uses Read Quorum, Write Quorum – Write-write conflict: 2 * Write quorum > all object copies – Read-write conflict: Write quorum + read quorum > all object copies. Gossip Update Propagation: – Read: if TS fsa <=TS rm, RM has recent data, return it, otherwise wait for gossip, or try other RM – Update :if Ts fsa >TS rm, update. Update TS rm send gossip. Otherwise, process based on application, perform update or reject – Gossip : update RM if gossip carries new updates.
Gossip Updation Protocol Used in Fixed RM Configuration Uses Vector Timestamps, Uses buffer to keep order
Current Work Here are some links to current distributed-file system and related projects: Ceph: (Peta Byte Scale DFS which is Posix Compatible and fault tolerant) GlusterFS: HDFS: HekaFS: OrangeFS: and KosmosFS: MogileFS: Swift (OpenStack Storage): FAST'11 proceedings:
Future Work usability/scalability issues relate to the costs of traversal in Distributed File Systems as traditional model of file traversal might not be suitable for searching /indexing [3] File Systems adding support for their own indexing (Continuous/incremental updates of indexes) NFS family might become increasingly irrelevant for more geographically distributed enterprises. Innovations in the area of multi tenancy and security for Distributed/Cloud Computing
References 1.R. Chow and T. Johnson, Distributed Operating Systems & Algorithms, DFS-Namespaces.html -DFS Namespaces referencehttp:// DFS-Namespaces.html 3. systems -Future of File Systemshttp:// systems 4. ascale-storage.pdf -Issues with DFS at Exascalehttp:// ascale-storage.pdf /openpdfs/maltzahn.pdf - Ceph as a scalable alternative to Hadoop. 08/openpdfs/maltzahn.pdf 6. sop.inria.fr/members/Patrick.Valduriez/pmwiki/Patrick/uploads//Confer ences/dexa2011.pdf - Distributed Data Management in 2020? sop.inria.fr/members/Patrick.Valduriez/pmwiki/Patrick/uploads//Confer ences/dexa2011.pdf 7. analytics -Hadoop might become the future solutionhttp:// analytics
THANKS YOU