Transactions and Concurrency Control Fall 2007 Himanshu Bajpai
Topics Transaction Services Serialization Concurrency Control Protocols
The Concept of ‘Transaction’ and Transaction services[1] A transaction is the basic logical unit of execution in an information system A transaction is a sequence of operations that must be executed as a whole ACCOUNT A Fred Bloggs $1000 transfer $500 ACCOUNT B Fred Bloggs £0 1. Debit A 2. Credit B
Contd.. vThe database system must ensure that either (1) and (2) happen or that neither happens. Otherwise inconsistency occurs
Requirements for Database Consistency[3] The simultaneous execution of many different application programs must be such that each transaction does not interfere with another transaction. The concurrent execution of transactions must be such that each transaction appears to execute in isolation.
Desirable Properties of Transactions (ACID) v Atomicity: a transaction is an atomic unit of processing and it is either performed entirely or not at all v Consistency Preservation: a transaction's correct execution must take the database from one correct state to another v Isolation: the updates of a transaction must not be made visible to other transactions until it is committed (solves the temporary update problem)
Contd.. v Durability or Permanency: if a transaction changes the database and is committed, the changes must never be lost because of subsequent failure v Serializability: transactions are considered serializable if the effect of running them in an interleaved fashion is equivalent to running them serially in some order
Transaction as a Concurrency Unit[2] vTransactions must be synchronised correctly to guarantee database consistency ACCOUNT A Fred Bloggs $1000 ACCOUNT B Fred Bloggs $0 transfer £ Debit A 2. Credit B ACCOUNT C Fred Bloggs $200 transfer £300 T1T2 1. Debit B 2. Credit C simultaneous ACCOUNT A Fred Bloggs $500 ACCOUNT B Fred Bloggs $200 ACCOUNT C Fred Bloggs $500 Net Result
Serialization[4] A mechanism controls concurrency among database transactions through the use of serial ordering relations. The ordering relations are computed dynamically in response to patterns of use. An embodiment of the present invention serializes a transaction that accesses a resource before a transaction that modifies the resource, even if the accessor starts after the modifier starts or commits after the modifier commits.
Contd… A method of concurrency control for a database transaction in a distributed database system stores an intended use of a database system resource by the database transaction in a serialization graph. A serialization ordering is asserted between the database transaction and other database transactions based on the intended use of the database system resource by the database transaction.
Contd… The serialization ordering is then communicated to a node in the distributed database system that needs to know the serialization ordering to perform concurrency control. Cycles in the serialization graph are detected based on the asserted serialization order and in order to break such cycles and ensure transaction serializability a database transaction is identified that is a member of a cycle in the serialization graph.
Concurrency Control Protocols[4] Concurrency in Transaction Execution v There is a need to ensure that concurrent transactions do not interfere with each others operations v Most DBMS are multi-user systems v Transaction scheduling algorithms Transaction Serializabilty The effect on a database of any number of transactions executing in parallel must be the same as if they were executed one after another
The Need for Concurrency Control v The concurrent execution of transactions may lead, if uncontrolled, to problems such as an inconsistent database v Concurrency control techniques are used to ensure that multiple transactions submitted by various users do not interfere with one another in a way that produces incorrect results
Read and Write Operations of a Transaction v read_item(X): reads a database item named X into a program variable also named X. Execution of the command includes the following steps: u find the address of the disk block that contains item X u copy that disk block into a buffer in the main memory u copy item X from the buffer to the program variable named X v write_item(X): writes the value of program variable X into the database item named X. Execution of the command includes the following steps: u find the address of the disk block that contains item X u copy that disk block into a buffer in the main memory u copy item X from the program variable named X into its current location in the buffer u store the updated block in the buffer back to disk (this step updates the database on disk)
Problems due to the Concurrent Execution of Transactions[5] v The Lost Update Problem v The Temporary Update (uncommitted dependency) Problem v The Incorrect Summary (inconsistent analysis) Problem
The Lost Update Problem vTwo transactions accessing the same database item have their operations interleaved in a way that makes the database item incorrect. T1:T2: read_item(X); X:= X - N; read_item(X); X:= X + M; write_item(X); read_item(Y); write_item(X); Y:= Y + N; write_item(Y); item X has incorrect value because its update from T1 is lost time If transactions T1 and T2 are submitted at approximately the same time and their operations are interleaved then the value of database item X will be incorrect because T2 reads the value of X before T1 changes it in the database and hence the updated database value resulting from T1 is lost.
The Temporary Update Problem vOne transaction updates a database item and then the transaction -for some reason- fails. The updated item is accessed by another transaction before it is changed back to its original value. T1:T2: read_item(X); X:= X - N; write_item(X); read_item(X); X:= X - N; write_item(X); read_item(Y); time transaction T1 fails and must change the value of X back to its old value; meanwhile T2 has read the "temporary" incorrect value of X
The Incorrect Summary Problem vOne transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records. The aggregate function may calculate some values before they are updated and others after. T1:T3: sum:= 0; read_item(A); sum:= sum + A;. read_item(X);. X:= X - N;. write_item(X); read_item(X); sum:= sum + X; read_item(Y); sum:= sum + Y; read_item(Y); Y:= Y + N; write_item(Y); T3 reads X after N is subtracted and reads Y before X is added, so a wrong summary is the result
Schedules of Transactions[3] v A schedule S of n transactions is a sequential ordering of the operations of the n transactions. v A schedule maintains the order of operations within the individual transaction. It is subject to the constraint that for each transaction T participating in S, if operation i is performed in T before operation j, then operation i will be performed before operation j in S. v The serializability theory attempts to determine the 'correctness' of the schedules.
Serial, Nonserial and Serializable Schedules v A schedule S is serial if, for every transaction T participating in S all of T's operations are executed consecutively in the schedule; otherwise it is called nonserial. v A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the same n transactions.
Example of Serial Schedules Schedule A T1:T2: read_item(x) X:= X - N; write_item(X); read_item(Y); Y:=Y + N; write_item(Y); read_item(X); X:= X + M; write_item(X); time
Example of Nonserial Schedules Schedule A T1:T2: read_item(X); X:= X - N; read_item(X); X:= X + M; write_item(X); read_item(Y); write_item(X); Y:=Y + N; write_item(Y); time
The Constrained Write Assumption vThe new value of a data item is dependent only on its old value and thus the concern is only for the read_item(X) and write_item(X) operations. vProblems: (a) the value of the data item may depend on the values of other database items (additionally to its old value) (b) the value of the data item may be independent of any other database items
Example of the Constrained Write Assumption read_item(X);..(includes X:=f(X)). write_item(X); }
The Unconstrained Write Assumption v this is only included for completeness - ‘constrained write’ is used in precedence graphs v The new value of each database item in the set of all items written by a transaction (write set) is dependent on the values of some of the items found in the set of all items read by the transaction (read set)
Testing for Serializability of a Schedule (Under Constrained Write) (1) for each transaction Ti participating in schedule S create a node labelled Ti in the precedence graph; (2) for each case in S where Tj executes a read_item(X) that reads the value of item X written by a write_item(X) command executed by Ti create an edge (Ti -> Tj) in the precedence graph; (3) for each case in S where Tj executes write_item(X) after Ti executes read_item(X) create an edge (Ti -> Tj) in the precedence graph; (4)the schedule S is serializable if and only if the precedence graph has no cycles;
Example Schedule A T1:T2: read_item(X); X:= X - N; write_item(X); read_item(Y); Y:=Y + N; write_item(Y); read_item(X); X:= X + M; write_item(X); T1T2 precedence graph for schedule A (serial) X
Example Schedule A T1:T2: read_item(X); X:= X - N; read_item(X); X:= X + M; write_item(X); read_item(Y); write_item(X); Y:=Y + N; write_item(Y); T1 T2 X precedence graph for schedule A (nonserial)
Methods for Serializability[6] v Protocols that, if followed by every transaction, will ensure serializability of all schedules in which the transactions participate. They may use locking techniques of data items to prevent multiple transactions from accessing items concurrently. v Timestamps are unique identifiers for each transaction and are generated by the system. Transactions can then be ordered according to their timestamps to ensure serializability. v Multiversion Concurrency Control Techniques keep the old values of a data item when that item is updated.
Locking Techniques for Concurrency Control v The concept of locking data items is one of the main techniques used for controlling the concurrent execution of transactions. v A lock is a variable associated with a data item in the database. Generally there is a lock for each data item in the database. v A lock describes the status of the data item with respect to possible operations that can be applied to that item. It is used for synchronising the access by concurrent transactions to the database items.
Types of Locks v Binary (Exclusive) locks have two possible states: locked (lock_item(X) operation) and unlocked (unlock_item(X) operation v Multiple-mode (Shared) locks allow concurrent access to the same item by several transactions. They have three possible states: read locked or shared locked (other transactions are allowed to read the item) write locked or exclusive locked (a single transaction exclusively holds the lock on the item) and unlocked.
Lock Type compatability matrix Y=yes (requests compatible) X - Binary (exclusive) block N= No(requests incompatible) S - Multiple(shared) lock X X S S N N N Y Y Y YYY - -
Two-Phase Locking All locking operations (read_lock, write_lock) precede the first unlock operation in the transactions. Two phases: expanding phase: new locks on items can be acquired but none can be released shrinking phase: existing locks can be released but no new ones can be acquired read_lock(Y); read_lock(X); read_item(Y); unlock(Y); read_item(X); X:=X+Y; write_lock(X); write_item(X); unlock(X); read_lock(X); read_item(X); write_lock(Y); unlock(X); read_item(Y); Y:=X+Y; write_item(Y); unlock(Y); not two-phase locking two-phase locking
Locking Problems v Deadlock: when each of two transactions is waiting for the other to release an item. v Approaches for solution: u deadlock prevention protocol: every transaction must lock all items it needs in advance u deadlock detection (if the transaction load is light or transactions are short and lock only a few items): v Livelock: a transaction cannot proceed for an indefinite period of time while other transactions in the system continue normally. u Solution: fair waiting schemes (i.e. first-come-first-served)
References [1] fall/lectures/07-cc.pdfhttp:// fall/lectures/07-cc.pdf [2] final.ppt final.ppt [3] [4] Transaction Management and Concurrency control by Connolly & Begg. Chapter 19. Third edition
Reference contd… [5] sinfo/v6r0/index.jsp?topic=/com.ibm.websp here.express.doc/info/exp/ae/cejb_cncr.html sinfo/v6r0/index.jsp?topic=/com.ibm.websp here.express.doc/info/exp/ae/cejb_cncr.html [6] yControl.html yControl.html Scott W Ambler, 2006