Chapter 11 Grid Concurrency Control 11.1 A Grid Database Environment 11.2 An Example 11.3 Grid Concurrency Control (GCC) 11.4 Correctness of GCC 11.5 Features of GCC Protocol 11.6 Summary 11.7 Bibliographical Notes 11.8 Exercises
Grid Concurrency Control Concurrency control protocol helps to maintain the consistency of data in database Concurrency control protocol addresses ‘C’ and ‘I’ of ACID properties Serializability in the most widely accepted correctness criterion Different DB architecture needs different concurrency control protocol, i.e. concurrency control protocol for a centralized DBMS will be different that that of a distributerd DBMS D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.1 A Grid Database Environment Data is geographically distributed in Grid environment. A typical working of database in Grid architecture is shown in the figure A distributed grid DB with 3 sites are shown, DB1, DB2, and DB3 (connected via grid middleware) Transactions can be submitted at any site and may need to access data from all the sites Originator / coordinator is a site where transaction is submitted Transactions T1 and T2 submitted to DB1 and they needs to access data from DB2 and DB3 as well Transaction and site identifiers are suffixed, e.g. T1 will have sub- transactions ST12 & ST13; and T2 will have sub-transactions ST21 and ST22 D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.1 A Grid Database Environment (Cont’d) Data access must be synchronized to maintain correctness of data Global lock tables, global logs etc cannot be implemented in Grid environment Different DB sites may implement different concurrency control procols, e.g. one site may use locking whereas other site may use optimistic concurrency control protocol This situation is unavoidable in Grid architecture due to heterogeneous DB sites D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.2 An Example Following example shows that using traditional concurrency control protocols in the Grid environment may potentially corrupt the data Example Consider four data objects are stored in two databases DB2 and DB3: DB2 = O1 and O2 DB3 = O3 and O4 Two transactions are submitted to the database DB1, as shown below: T1 = r1(O1) r1(O2) w1(O3) w1(O1) C1 T2 = r2(O1) r2(O3) w2(O4) w2(O1) C2 The transactions are submitted to the Grid middleware and the metadata service forms required sub-transactions as follows: Sub-transactions of T1: ST12 = r12(O1) r12(O2) w12(O1) C12(11.1) ST13 = w13(O3) C13 (11.2) Sub-transactions of T2: ST22 = r22(O1) w22(O1) C22(11.3) ST23 = r23(O3) w23(O4) C23 (11.4) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.2 An Example (Cont’d) The sub-transactions are submitted to respective sites, i.e. ST12 and ST22 are submitted to DB2 and ST13 and ST23 are submitted to DB3 As all DB sites are autonomous and hence schedules/histories are created independently. Say DB2 create following history: H2 = r12(O1) r12(O2) w12(O1)C12 r22(O1) w22(O1) C22 (11.5) and DB3 creates following history: H3 = r23(O3) w23(O4) C23 w13(O3) C13 (11.6) From equation 11.5 serializability order: T1 execute before T2 and from equation 11.6 serializability order: T2 executes before T1 Though there is no problem in executing histories H2 and H3 in isolation, but when both histories are combined then serilaizability graph produces a cycle T1 T2 T1 Traditional distributed DB handles this situation by implementing a global management, which is not possible in Grid Databases. Next, Grid Concurrency Control protocol is discussed D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) The above example is the motivation for GCC; where, though individual sites generate serializable schedules, in global view of things the transactions may be ordered incorrectly Functions required by GCC: DB_Accessed(T): takes the global transaction as argument and returns set of databases where sub-transactions of the global transaction are submitted Split_Trans(T): takes the global transaction as argument and returns a set of sub- transactions Active_Trans(DB): takes the database as an argument and returns the set of global transactions having any sub-transaction running in the database Cardinality (Any Set): takes any set, e.g. set of databases or set of sub- transactions and returns the number of elements in the set Append_TS (Subtransaction): takes the sub-transaction as an argument and attaches a unique timestamp to it. Sub-transactions of same global transaction will have same timestamp value D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Grid Serializability Theorem Traditional Conflict Serializability is not sufficient to ensure consistency in Grid database environment Grid serializability theorem is needed to ensure correctness of data Global transactions can be classified in 2 categories: Global transactions with only one sub-transaction and Global transaction having more than one sub-transaction Total order is defined as below: D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) In traditional serializability theory, serial history is considered correct. On the same ground Grid-serial history is considered correct in Grid architecture Grid serial history is defined as below: Condition (1) of definition 11.2 is very strict and does not allow interleaving of operations Hence a more practical approach, Grid Serializable history is used D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Grid serializable history: Grid serializability is analysed by the grid serializability graph If the graph is acyclic the history is Grid serializable D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Grid Serializability graph is defined as below: D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Condition (1) considers local transactions in Grid Serializability graph Condition (2) only considers those global transactions having more than one subtransaction Condition (3) shows the arc between conflicting transactions Grid serializability graph is stored at local sites as there is no global management layer Following types of conflicts are possible: Conflict between global transactions (global-global conflict) Conflict between global transaction and local transaction (global-local conflict) Conflict between local transactions (local-local conflict) Acyclic Grid-serializability graph is used to resolve global-local conflict Total-order is used to resolve global-global conflict D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Based on the Grid serializability graph and total order Grid serializability theorem is as follows: D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Example of Grid serializability graph: In addition to the global transaction (in earlier example), consider additional local transactions as follows: Local Transactions. (LT12 is read as local transaction 1 at database site DB2): LT12 = lr12(O1) lw12(O2) lC12 LT13 = lw13(O3) lC13 Now consider following modified histories: H2 = lr12(O1) r12(O1) r12(O2) w12(O1)C12 r22(O1) w22(O1) lw12(O2) C22 lC12 H3 = r23(O3) w23(O4) lw13(O3) C23 w13(O3) C13 lC13 D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Following figure shows the Grid serializability graph at sites DB2 and DB3 Three possible types of conflicts are discussed below: Global-global conflict: At site DB2, ST12 precedes ST22 (i.e. T1 precedes T2) and at site DB3, ST23 precedes ST13 (i.e. T2 precedes T1). Thus a cycle is formed at different sites. And it may be impossible to identify the cycle without a global management layer. Total order used in Grid serializability avoids formation of cycles are distributed sites Global-local conflict: Can be identified and resolved by local DBMS, e.g. in DB2 ST12 and LT12 Local-local conflict: Can be identified and resolved by local DBMS, similar to traditional DBMS D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Grid Concurrency Control Protocol Has 2 phases: submission & termination Site where transaction is submitted is called originator Split_trans(T) function is used to generate multiple sub-transactions of global transaction Sub-transactions are then submitted to participating sites Unique timestamp is attached to each sub-transactions before submitting Sub-transactions at local databases are executed in total-order A local schedular does not distinguishes between a local transaction and a sub-transaction of global transaction Global transaction with only one sub-transaction does not need to be in total-order as they cannot conflict with other global transaction at more than one site D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
GCC (Cont’d) Submission phase of GCC D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Step-1) Checks if data from multiple sites need to be accessed if data from only originator is required then treat as local transaction If more multiple DB needs to be accessed then the transaction is submitted to metadata services. Split_trans(T) function is used to create sub-transactions Step-2) Global transactions are added to a set which stores all the currently executing global transactions. The set name is Active_Trans Step-3) The middleware appends a timestamp to all sub-transactions before submitting it to respective databases Step-4) If more than one active global transaction exists simultaneously that accesses more than one database, then sub-transactions are executed in total order (according to the timestamp) Step-5) When all sub-transactions of a global transaction finish execution then the global transaction is removed from the Active_Trans set (details in termination phase of GCC) Note: Active_Trans is a set of currently active global transactions and Active_trans(DB) is a function that take DB site as argument and returns active transactions executing in that database D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Termination phase of GCC A global transaction is active till even one of the sub-transaction is executing Steps of termination are as follows: When a sub-transaction finishes execution, the originator is informed Active Transactions, Conflicting Active Transactions and databases access by global transaction set are updated accordingly Check whether the completed sub-transaction is the last sub-transaction of the global transaction if not the last, then sub-transactions waiting in the queue cannot be scheduled if the sub-transaction is the last sub-transaction of the global transaction, then other conflicting sub-transactions can be scheduled. Sub-transactions from the queue then follows the normal submission steps D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Termination phase of GCC D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Revisiting the example of section 11.2 Say, transaction T1’s timestamp is 1 and T2’s timestamp is 2 History, H2, produced by site DB2 is a serial history (equation 11.5) with T1 preceding T2 GCC will not schedule transactions as in H3 (equation 11.6) due to step-4) of the submission phase of GCC. It will always follow the total-order based on timestamp. Hence, sub-transactions of T1 will always be scheduled before sub-transactions of T2. GCC will generate histories H2 (equation 11.5) and H3 (equation 11.6) as follows: H2 = r12(O1) r12(O2) w12(O1)C12 r22(O1) w22(O1) C22 (same as (11.5)) H3 = w13(O3) C13 r23(O3) w23(O4) C23 (corrected execution order by the GCC protocol) Thus both schedules have ordered the transactions in total-order with T1 preceding T2 D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Comparison with traditional concurrency control protocols Operations of a general centralised locking protocol (e.g. centralised two phase locking) in homogeneous distributed DBMS D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Operations of a general distributed locking protocol (e.g. decentralised two phase locking) in homogeneous distributed DBMS D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d) Operations of a general Multi-DBMS protocol D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
Operations of GCC protocol 11.3 Grid Concurrency Control (GCC) (Cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.4 Correctness of GCC Protocol Grid-serializable schedule is considered correct in Grid environment A concurrency control protocol conforming to Theorem 11.1 is Grid serializable and thus is correct Proposition 11.1: All local transactions and global subtransactions submitted to any local scheduler are scheduled in serializable order. Proposition 11.2: Any two global transactions having more than one subtransaction actively executing simultaneously must follow total- order. Based on the proposition 11.1 and 11.2 following theorem can be proved: Theorem 11.2: Every schedule produced by GCC protocol is Grid- serializable. D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.5 Features of GCC Protocol Concurrency control in heterogeneous environment - Does not use global lock table etc. and hence can work in Autonomous, Heterogeneous environment Reducing the load from originator site - As GCC does not use a centralized scheduling schemes, originator sites have reduced load Reducing number of messages in the inter-network - Communication between the originator and other participating sites is reduced But due to absence of global management layer, some of the valid interleaving may not be possible and hence may result in strict schedule D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.6 Summary Global management layer cannot be used in Grid environment GCC protocol maintains the correctness of data in Grid environment GCC protocol can work in heterogeneous environment Optimizing the scheduling process may be hard The focus was to maintain the consistency of data in Grid databases D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
Continue to Chapter 12…