R*: An overview of the Architecture By R. Williams et al. Presented by D. Kontos Instructor : Dr. Megalooikonomou
Overview Distributed Database Systems. R* : an experimental DDMS developed at IBM Almaden Research Center in Overview of the architecture: Transaction management Transaction management Object naming, catalogue management Object naming, catalogue management Authorization, communication etc. Authorization, communication etc. Conclusions on the issues arising in a DDBMS
Distributed DBMS (DDBMS) Need for sharing resources, data. Preserve transparency of network communication and data organization. Maximum independence – “site autonomy”. R*: DDBMS consisting of a confederation of voluntarily co-operating sites, each supporting Relational data model communicating via IBM’s CICS.
Architecture aspects Environment and Data Definitions. Object Naming. Distributed Catalogs. Transaction management, commit protocols. Query preparation. Query execution. SQL additions and changes.
Environment and Data Definitions Several database sites communicating via network topology (CICS). Data stored in relations dispersed dispersed replicated replicated partitioned partitioned End user not aware of the data distribution, organized by the DDBMS.
Object Naming Site autonomy – not a global naming system. Network details transparency to the user, programming as simple as possible. Mapping end user name “print names” internal System Wide Names (SWN) BIRTH_SITE BIRTH_SITE e.g. BRUCE at SAN_JOSE accesses table T SAN_JOSE
Distributed Catalogs Distributed Catalog Architecture Each site keeps and maintains catalogs regarding the objects at the database, replicas, fragments stored at the particular site. The “birth” site of each object keeps information about where it is currently stored. Object located through its SWN, catalogs store access paths. Search path: local catalog birth site catalog indicated current site
Transaction management commit protocols Unique sequence transaction number. Starts from the site it was entered Synchronous & asynchronous execution. Synchronous & asynchronous execution. Commit UNIFORM (all abort OR all commit) Two phase commit protocol Coordinator makes the final decision Coordinator makes the final decision Other sites prepared to commit awaiting Lost commit messages detected by time-out.
Query preparation Name resolution Authorization : Each site checks authorization on it’s own local data trusts the remote sites. Global compilation plan by the master, access strategies. Plan distribution, local compilation of parts. Final code generated at the master, two phase compilation. Optimization of access paths included minimization of query execution time.
Query execution Code loaded locally, parallel execution messages for communication. Concurrency control Distributed deadlock detection by periodically checking at each site wait-for information gathered locally or from other sites. Distributed deadlock detection by periodically checking at each site wait-for information gathered locally or from other sites. Deadlock cycle breaker abort transaction. Deadlock cycle breaker abort transaction. Logging and recovery: Resources held only if a transaction fails after entering the second phase of the commit protocol. Resources held only if a transaction fails after entering the second phase of the commit protocol.
SQL additions and changes SQL extended to include the distributed capabilities.
Conclusions November 1981 R* experimental prototype system. Key ingredient autonomy of the sites. Distributed data authorization, compilation, commit etc. Based on a master – apprentices approach, two phase protocols. Transparent network topology, data definition and management. A promising step towards a REAL DISTRIBUTED DBMS.
THANK YOU!! Questions??