Overview of Transaction Management
Introduction The concept of transaction is the foundation for concurrent execution and recovery from system failure in a DBMS. What is a Transaction? Any action that reads from and/or writes to a database. Example: Simple SELECT statement to generate a list of table contents A series of related UPDATE statements to change the values of attributes in various tables A series of INSERT statements to add rows to one or more tables A combination of SELECT, UPDATE, and INSERT statements
Acid Properties A DBMS must ensure four important properties of transactions to maintain data in the face of concurrent access and system failures Atomicity: A very important property guaranteed by the DBMS for all transactions is that they are atomic. That is, a user can think of a transaction as carrying out all actions or none. DBMS logs all actions so that it can undo the actions of aborted transactions. Redo actions of committed transactions not yet propagated to disk when system crashes.
Acid Properties Durability: Once the transaction has been successfully completed, its effects should persist even if the system crashes before all its changes are reflected on disk. Serializability (or) Consistency: Ensures that the concurrent execution of several transactions yields consistent results Isolation : Transactions are isolated or protected, from the effects of concurrently scheduling other transactions Data used during execution of a transaction cannot be used by second transaction until first one is completed
Transactions and Schedules The read action of a transaction is denoted by R(A) where A is an object being read, Similarly write action is denoted by W(A) A Schedule represents an actual or potential execution sequence of the actions of a transaction For Ex: The execution order for actions of two transactions T1 and T2 T1 T2 R(A) W(A) R(B) W(B) R(C) W(C)
Transactions and Schedules We move forward in time as we go down from one row to the next A schedule that contains either an abort or a commit for each transaction whose actions are listed in it is called a complete schedule If the actions of different transactions are not interleaved – that is transactions are executed from start to finish, one by one, the schedule is called Serial Schedule Interleaved Execution – Concurrent Execution
Concurrent Execution of Transactions Why Concurrent Execution? While one transaction is waiting for a page to be read in from disk , the CPU can process another transaction, there by reducing the idling time of CPU and disks. Also increases System Throughput ( i.e the average no of transactions completed in a given time). Interleaved execution of a short transaction with a long transaction usually allows short transaction to complete quickly In Serial execution a Short transaction could get stuck behind a long transaction, leading to delays in response time
Concurrent Execution of Transactions A Serializable Schedule over a set S of committed transactions is a schedule whose effect on any consistent database instance is guaranteed to be identical to that of some complete serial schedule over S i.e the database instance that results from executing the given schedule is identical to the database instance that results from executing the transactions in some serial order
A Serializable Schedule T1 T2 R(A) W(A) R(A) W(A) R(B) W(B) R(B) W(A) COMMIT COMMIT
Example Consider two transactions (Xacts): T1: BEGIN A=A+100, B=B-100 END T2: BEGIN A=1.06*A, B=1.06*B END 1st xact transfers $100 from B’s account to A’s 2nd credits both accounts with 6% interest. Assume at first A and B each have $1000. What are the legal outcomes of running T1 and T2? T1 ; T2 (A=1166,B=954) T2 ; T1 (A=1160,B=960) In either case, A+B = $2000 *1.06 = $2120 There is no guarantee that T1 will execute before T2 or vice-versa, if both are submitted together. Sum of balance of A and B should be the same regardless of whether T1/T2 commits/aborts.
Example (Contd.) Consider a possible interleaved schedule: T1: A=A+100, B=B-100 T2: A=1.06*A, B=1.06*B This is OK (same as T1;T2). But what about: T1: A=A+100, B=B-100 T2: A=1.06*A, B=1.06*B T1 followed by T2 … T2 followed by T1 are both ok. In both cases, A+B is 2120 Result: A=1166, B=960; A+B = 2126, bank loses $6 ! The DBMS’s view of the second schedule: T1: R(A), W(A), R(B), W(B) T2: R(A), W(A), R(B), W(B)
Anomalies due to Interleaved Execution Reading Uncommitted Data(WR Conflicts) Unrepeatable Reads ( RW Conflicts) Overwriting Uncommitted Data(WW Conflicts) Overwriting Uncommitted Data (WW Conflicts, “blind write”): T1: W(A), W(B), C T2: W(A), W(B), C
Anomalies with Interleaved Execution Reading Uncommitted Data (WR Conflicts, “dirty reads”): Unrepeatable Reads (RW Conflicts, “Unrepeatable reads”): T1: A=A+100, B=B-100 T2: A=1.06*A, B=1.06*B T1: R(A), W(A), C T2: R(A), W(A), C
Schedules involving aborted Transactions Extended definition of Serializable schedule A Serializable Schedule over a set S of committed transactions is a schedule whose effect on any consistent database instance is guaranteed to be identical to that of some complete serial schedule over the set of committed transactions in S T1: B=B-100 Abort T2: A=1.06*A, B=1.06*B, commit T1: R(A), W(A), Abort T2: R(A), W(A), R(B), W(B), Commit An Unrecoverable Schedule
Concurrency Control with Locking Methods Guarantees exclusive use of a data item to a current transaction Required to prevent another transaction from reading inconsistent data Lock manager Tracks lock requests, grants locks on database objects when they become available
Lock Granularity Indicates the level of lock use Locking can take place at the following levels: Database Table Page Row Field (attribute)
Lock Granularity (continued) Database-level lock Entire database is locked Table-level lock Entire table is locked Page-level lock Entire disk page is locked
Lock Granularity (continued) Row-level lock Allows concurrent transactions to access different rows of the same table, even if the rows are located on the same page Field-level lock Allows concurrent transactions to access the same row, as long as they require the use of different fields (attributes) within that row
Lock Types Binary lock Exclusive lock Shared lock Has only two states: locked (1) or unlocked (0) Exclusive lock Access is specifically reserved for the transaction that locked the object Must be used when the potential for conflict exists Shared lock Concurrent transactions are granted Read access on the basis of a common lock
Lock-Based Concurrency Control Two-phase Locking (Strict 2PL) Protocol: Each Xact must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing. If an Xact holds an X lock on an object, no other Xact can get a lock (S or X) on that object. DBMS internally enforces the above locking protocol. Two phases: acquiring locks, and releasing them also called Growing phase and Shrinking phase
Strict 2PL 2PL allows only serializable schedules but is subjected to cascading aborts. Example: rollback of T1 requires rollback of T2! To avoid Cascading aborts, use Strict 2PL Strict Two-phase Locking (Strict 2PL) Protocol: Same as 2PL, except: A transaction releases no locks until it completes T1: R(A), W(A), Abort T2: R(A), W(A), R(B), W(B)
Two-Phase Locking to Ensure Serializability Defines how transactions acquire and relinquish locks Guarantees serializability, but it does not prevent deadlocks Growing phase, in which a transaction acquires all the required locks without unlocking any data Shrinking phase, in which a transaction releases all locks and cannot obtain any new lock
Two-Phase Locking to Ensure Serializability (continued) Governed by the following rules: Two transactions cannot have conflicting locks No unlock operation can precede a lock operation in the same transaction No data are affected until all locks are obtained—that is, until the transaction is in its locked point
Two-Phase Locking Protocol
Deadlocks Control through Condition that occurs when two transactions wait for each other to unlock data Possible only if one of the transactions wants to obtain an exclusive lock on a data item No deadlock condition can exist among shared locks Control through Prevention Detection Avoidance
The Transaction Log Stores A record for the beginning of transaction For each transaction component (SQL statement) Type of operation being performed (update, delete, insert) Names of objects affected by the transaction (the name of the table) “Before” and “after” values for updated fields Pointers to previous and next transaction log entries for the same transaction The ending (COMMIT) of the transaction
Performance of Locking Lock based schemes use two basic mechanisms: Blocking and Aborting. Both mechanisms involve a performance penalty Blocked transactions hold locks that force other transactions to wait. Delays due to blocking affect throughput Aborting a transaction, wastes the work done thus far by that transaction. A deadlock represents an extreme instance of blocking Fewer than 1% of transactions are involved in a deadlock, and therefore relatively few aborts
Fig : Lock Trashing Throughput Thrashing Active Transactions
Performance of Locking (contd..) The first few transactions are unlikely to conflict, and throughput rises in proportion to the number of active transactions. Delays due to blocking increase with the number of active transactions and through put increases more slowly than the number of active transactions. Then there comes a point where adding another active transaction actually reduces throughput. We say that the system Trashes at this point.
Performance of Locking (contd..) If a system begins to trash, the database administrator should reduce the number of transactions allowed to run concurrently Throughput can be increased in three ways: By locking the smallest sized objects possible. By reducing the time that transaction hold locks By reducing hot spots. A hot spot is a database object that is frequently accessed and modified, and causes a lot of blocking delays.
Transaction Management with SQL Transaction support in SQL is provided by two statements: COMMIT and ROLLBACK In SQL : 1999, two new features are provided to support applications that involve long-running transactions. Save point – For identifying a point in a transaction and selectively rollback operations carried out after this point. Chained Transactions – Minimizes the overhead of requiring to run several transactions one after the other, In chained transactions, a transaction is committed or roll backed, and another transaction is initiated immediately
Introduction to Crash Recovery Recovery Manager Upon recovery from crash: Must bring DB to a consistent transactional state Ensures transaction Atomicity and Durability Undoes actions of transactions that do not commit Redoes lost actions of committed transactions lost during system failures or media failures Recovery Manager maintains log information during normal execution of transactions for use during crash recovery
The Log Log consists of “records” that are written sequentially. Stored on a separate disk from the DB Typically chained together by Xact id Log is often duplexed and archived on stable storage (guaranteed to survive system crashes and media failures) All log related activities are handled transparently by the DBMS. Write Ahead Logging (WAL) Property Log entries describing a change to the database are written to stable storage before the change is made. implemented via a handshake between log manager and the buffer manager. Duplexed: Store at two different disks perhaps at different locations. Xact id: Log sequence number (LSN). Monotonically increasing number.
Stealing Frames and Forcing Pages Two approaches w.r.t writing objects: Steal Approach: Changes made to an object O in the buffer pool by a transaction T be written to disk before T commits. Such writes are executed when another transaction wants to bring in a page. And we say the second transaction steals a frame from T. Force Approach:When a transaction commits, all the changes it has made to objects in buffer pool are immediately forced to disk. From the standpoint of implementing a recovery manager, it is simplest to use a buffer manager with a no-steal, force approach
Stealing Frames and Forcing Pages If no-steal approach is used we do not have to undo the changes of an aborted transaction & If a force approach is used we do not have to redo the changes of committed transactions However these policies have important drawbacks No-steal approach assumes that all pages modified by ongoing transactions can be accommodated in the buffer pool, and in presence of large transactions this assumption is unrealistic. The force approach results in excessive page I/O costs. If a highly used page is updated in succession by 20 transactions, it would be written to disk 20 times. For these reasons most systems use a steal, no-force approach
Logging Continued Log record must go to disk before the changed page! As was true of CC-related activities such as lock/unlock, dealing with deadlocks, etc.
ARIES Recovery ARIES Recovery algorithm is designed to work with a steal, no-force approach. There are 3 phases in ARIES recovery protocol: Analysis: Scan the log forward (from the most recent checkpoint) to identify all Xacts that were active, and all dirty pages in the buffer pool at the time of the crash. Redo: Redoes all updates to dirty pages in the buffer pool, as needed, to ensure that all logged updates are in fact carried out and written to disk. Undo: The writes of all Xacts that were active at the crash are undone (by restoring the before value of the update, as found in the log), working backwards in the log. Finally, the database reflects only the actions of committed transactions Some care must be taken to handle the case of a crash occurring during the recovery process!