Transaction Management: Concurrency Control CS634 Class 18, Apr 9, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.

Transaction Management: Concurrency Control CS634 Class 18, Apr 9, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke

Transaction Execution  Example: Reading Uncommitted Data (Dirty Reads)  We are assuming each transaction is single-threaded  Usually the case in practice, though not universal  And, for simplicity, that operations for the whole DB happen in some order, possibly interleaving the transactions  This is not true in reality: in fact, parallel execution of transactions happens on multi-processors,  But it’s close enough to show the important behaviors 2 T1: R(A), W(A), R(B), W(B) T2: R(A), W(A), R(B), W(B)

Transaction Schedule Notation  Example: Reading Uncommitted Data (Dirty Reads) Another notation: Using subscripts for transaction ids  Arrows mark conflicts, yield arcs in PG: T1->T2, T2->T1 R 1 (A) W 1 (A) R 2 (A) W 2 (A) R 2 (B) W 2 (B) R 1 (B) W 1 (B) Note: commits are not involved in locating conflicts 3 T1: R(A), W(A), R(B), W(B) T2: R(A), W(A), R(B), W(B)

Example: RW Conflicts T1:R(A), R(A), W(A), Commit T2:R(A), W(A), Commit  Unrepeatable Reads  Alternatively: R 1 (A) R 2 (A) W 2 (A) C 2 R 1 (A) W 1 (A) C 1  Again T1->T2, T2->T1, cycle in PG, not conflict serializable  See conflicts reaching across a commit here 4

Conflict Serializable Schedules  Two schedules are conflict equivalent if:  Involve the same actions of the same transactions  Every pair of conflicting actions is ordered the same way  Schedule S is conflict serializable if S is conflict equivalent to some serial schedule  Example: T1->T2 only, and conflict serializable, as shown below R 1 (A) R 1 (B) W 1 (C) R 2 (B) W 2 (A) R 2 (C) R 1 (B) C 1 C 2 R 1 (A) R 1 (B) W 1 (C) R 1 (B) C 1 R 2 (B) W 2 (A) R 2 (C) C 2

Dependency Graph  Dependency graph:  one node per transaction  edge from Ti to Tj if action of Ti precedes and conflicts with action of Tj  Theorem: Schedule is conflict serializable if and only if its dependency graph is acyclic  Equivalent serial schedule given by topological sort of dependency graph

Example  A schedule that is not conflict serializable:  The cycle in the graph reveals the problem. The output of T1 depends on T2, and vice-versa. T1: R(A), W(A), R(B), W(B) T2: R(A), W(A), R(B), W(B) R 1 (A) W 1 (A) R 2 (A) W 2 (A) R 2 (B) 2 W(B) 2 R 1 (B) W 1 (B) T1T2 A B Dependency graph

Strict Two-Phase Locking (Strict 2PL)  Protocol steps  Each transaction must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing.  All locks held are released when the transaction completes  (Non-strict) 2PL: Release locks anytime, but cannot acquire locks after releasing any lock.  Strict 2PL allows only serializable schedules.  It simplifies transaction aborts  (Non-strict) 2PL also allows only serializable schedules, but involves more complex abort processing 8

Strict 2PL Example 9 T1: S(A) R(A) T2: S(A) R(A) X(B) R(B) S(B) R(B) C W(B) C Using subscripted notation: blow-by-blow actions S 1 (A) R 1 (A) S 2 (A) R 2 (A) X 2 (B) R 2 (B) W 2 (B) C 2 R 1 (B) C 1 where S 1 (B) blocked

Aborting Transactions  When Ti is aborted, all its actions have to be undone  if Tj reads an object last written by Ti, Tj must be aborted as well!  cascading aborts can be avoided by releasing locks only at commit  If Ti writes an object, Tj can read this only after Ti commits  In Strict 2PL, cascading aborts are prevented  At the cost of decreased concurrency  No free lunch!  Increased parallelism leads to locking protocol complexity 10

Deadlocks  Cycle of transactions waiting for locks to be released by each other: case of “deadly embrace” T1: X(A) W(A) S(B) [R(B) …] T2:X(B) W(B) S(A) [R(A) …] Using subscripted notation: X 1 (A) W 1 (A) X 2 (B) W 2 (B) … 11

Deadlock Detection  Create a waits-for graph:  Nodes are transactions  Edge from Ti to Tj if Ti is waiting for Tj to release a lock T1: S(A), R(A), S(B) T2: X(B),W(B) X(C) T3: S(C), R(C) T4: X(B) T1T2 T4T3 12 X(A)

More Dynamic Databases  If the set of DB objects changes, Strict 2PL will not ensure serializability  Example:  T1 finds oldest sailor for each of rating=1 and rating=2  T2 does an insertion and a deletion 1. T1 locks all pages with rating = 1, finds oldest sailor (age = 71) 2. Next, T2 inserts a new sailor; rating = 1, age = 96 3. T2 deletes oldest sailor with rating = 2 (age = 80), commits 4. T1 locks all pages with rating = 2, and finds oldest (age = 63)  No serial schedule gives same outcome!

The “Phantom” Problem  T1 implicitly assumes that it has locked the set of all sailor records with rating = 1  Assumption only holds if no sailor records are added while T1 is executing!  Two mechanisms to address the problem  Index locking  Predicate locking

Another phantom example  Table tasks has one row for each worker task, with worker name, task name, number of hours  Rule that no worker has more than 8 hours total  Application A to add a task sums hours for worker, adds task if it fits under 8 hours max  T1 running A sees ‘Joe’ has 6 hours, adds task of 2 hours  Concurrently, T2 running A sees ‘Joe’ has 6 hours, adds task of 1 hour.  Joe ends up with 9 hours of work.  Again, the problem is there is no lock on the set of rows being examined to make a decision

Index Locking  Assume index on the rating field using Alternative (2)  T1 should lock the index page containing the data entries with rating = 1  If there are no records with rating = 1, T1 must lock the index page where such a data entry would be, if it existed!  e.g., lock the page with rating = 0 and beginning of rating=2  If there is no suitable index, T1 must lock all data pages, and lock the file to prevent new pages from being added

Index Locking  Assume index on the rating field using Alternative (2)  Row locking is the industry standard now  T1 should lock all the data entries with rating = 1  If there are no records with rating = 1, T1 must lock the entries adjacent to where data entry would be, if it existed!  e.g., lock the last entry with rating = 0 and beginning of rating=2  If there is no suitable index, T1 must lock the table

Predicate Locking  Grant lock on all records that satisfy some logical predicate  Index locking is a special case of predicate locking  Index supports efficient implementation of the predicate lock  Predicate is specified in WHERE clause  In general, predicate locking is expensive to implement!

Locking for B+ Trees  Naïve solution  Ignore tree structure, just lock pages following 2PL  Very poor performance!  Root node (and many higher level nodes) become bottlenecks  Every tree access begins at the root!  Not needed anyway!  Only row data needs 2PL (contents of tree)  Tree structure also needs protection from concurrent access  But only like other shared data of the server program  Note this modern view is not covered in book  See Graefe, A Survey of B-tree locking techniques (2010)Graefe, A Survey of B-tree locking techniques  B-tree locking is a huge challenge!

Locking vs. Latching  To protect shared data in memory, multithreaded programs use mutex (semaphores)  API: enter_section/leave_section, or lock/unlock  Every Java object contains a mutex, for convenience of Java programming: underlies synchronized methods  Database people call mutexes “latches”  Need background in multi-threaded programming to understand this topic fully  The tree structure needs mutex/latch protection  Latches can be provided by the same lock manager as does 2PL locking—this is assumed in the text  In these slides, will use “lock” in quotes to mean non-2PL lock/latch…

Locking for B+ Trees (contd.)  Searches  Higher levels only direct searches for leaf pages  Insertions  Node on a path from root to modified leaf must be “locked” in X mode only if a split can propagate up to it  Similar point holds for deletions  There are efficient locking protocols that keep the B-tree healthy under concurrent access, and support 2PL on rows

A Simple Tree Locking Algorithm  Search  Start at root and descend  repeatedly, get S “lock” for child then “unlock” parent, end up with S “lock” on leaf page  Get 2PL S lock on row, provide row pointer to caller  Later, caller is done with reading row, arranges release of S “lock”  Insert/Delete  Start at root and descend, obtaining X “locks” as needed  Once child is “locked”, check if it is safe  If child is safe, release all “locks” on ancestors, leaving X “lock” on leaf  Get 2PL X lock on place for new row/old row, insert/delete row, release “lock”  Safe node  If changes will not propagate up beyond this node  Inserts: Node is not full  Deletes: Node is not half-empty  When control gets back to QP, transaction only has 2PL locks on rows

Difference from text  The algorithms described in the text are valid, for example, crabbing down the tree, worrying about full nodes, etc.  What’s different is that the locks for index nodes are shorter lived than described in the text: only 2PL locks on rows are kept until end of transaction, not any locks on index nodes.  Note the admission on pg. 564 that the text’s coverage on this topic is “not state of the art”. Graefe’s paper is.

An Example A B C DE F G H I 20 35 20* 3844 22*23*24*35*36*38*41*44* Do: Search 38* Insert 45* Insert 25* Delete 38* 23

A Variation on Algorithms  Search  As before  Insert/Delete  Set “locks” as if for search, get to leaf, and set 2PL X lock on leaf  If leaf is not safe, release all “locks”, and restart using previous Insert/Delete protocol  This is what happens if the search down the tree happens on a page that is not in buffer—don’t want to hold a latch across a disk i/o (takes too long)

Multiple-Granularity Locks  Hard to decide what granularity to lock  tuples vs. pages vs. files  Shouldn’t have to decide!  Data containers are nested: Tuples Files Pages Database contains

New Lock Modes, Protocol  Allow transactions to lock at each level, but with a special protocol using new intention locks Before locking an item, must set intention locks on ancestors For unlock, go from specific to general (i.e., bottom-up). SIX mode: Like S & IX at the same time. -- ISIX -- IS IX      SX   S X      

Multiple Granularity Lock Protocol  Each transaction starts from the root of the hierarchy  To get S or IS lock on a node, must hold IS or IX on parent node  To get X or IX or SIX on a node, must hold IX or SIX on parent node.  Must release locks in bottom-up order

Examples  T1 scans R, and updates a few tuples:  T1 gets an SIX lock on R, then repeatedly gets an S lock on tuples of R, and occasionally upgrades to X on the tuples.  T2 uses an index to read only part of R:  T2 gets an IS lock on R, and repeatedly gets an S lock on tuples of R.  T3 reads all of R:  T3 gets an S lock on R.  OR, T3 could behave like T2; can use lock escalation to decide which. -- ISIX -- IS IX      SX   S X      

Example ROOT A B C DE F G H I 20 35 20* 3844 22*23*24*35*36*38*41*44* Do: 1) Delete 38* 2) Insert 25* 4) Insert 45* 5) Insert 45*, then 46* 23

Transaction Support in SQL  A transaction is automatically started when user executes a statement or accesses the catalogs  Transaction is either committed (COMMIT) or aborted (ROLLBACK)  New in SQL-99: SAVEPOINT feature SAVEPOINT Actions … ROLLBACK TO SAVEPOINT  SAVEPOINT advantage vs sequence of transactions  Can roll back over multiple savepoints  Lower overhead: no new transaction initiated 31

Setting Transaction Properties in SQL  Access Mode  READ ONLY vs READ WRITE  Isolation Level (decreasing level of concurrency) 32 LevelDirty ReadUnrepeatable Read Phantom READ UNCOMMITTEDPossible READ COMMITTEDNoPossible REPEATABLE READNo Possible SERIALIZABLENo

Syntax SET TRANSACTION ISOLATION LEVEL SERIALIZABLE READ WRITE SET TRANSACTION ISOLATION LEVEL REPEATABLE READ READ ONLY  Note:  READ UNCOMITTED cannot be READ WRITE 33

More on setting transaction properties Embedded SQL EXEC SQL SET TRANSACTION ISOLATION LEVEL SERIALIZABLE JDBC conn.setAutoCommit(false); conn.setTransactionIsolation (Connection.TRANSACTION_ISOLATION_SERIALIZABLE); PL/SQL COMMIT; ROLLBACK; 34

Aborting a Transaction  If a transaction Ti is aborted, all its actions have to be undone. Not only that, if Tj reads an object last written by Ti, Tj must be aborted as well!  Most systems avoid such cascading aborts by releasing a transaction’s locks only at commit time.  If Ti writes an object, Tj can read this only after Ti commits.  In order to undo the actions of an aborted transaction, the DBMS maintains a log in which every write is recorded. This mechanism is also used to recover from system crashes: all active Xacts at the time of the crash are aborted when the system comes back up.

The Log  The following actions are recorded in the log:  Ti writes an object: the old value and the new value.  Log record must go to disk before the changed page!  Ti commits/aborts: a log record indicating this action.  Log records are chained together by Xact id, so it’s easy to undo a specific Xact.  Log is often duplexed and archived on stable storage.  All log related activities (and in fact, all CC related activities such as lock/unlock, dealing with deadlocks etc.) are handled transparently by the DBMS.

Recovering From a Crash  There are 3 phases in the Aries recovery algorithm:  Analysis: Scan the log forward (from the most recent checkpoint) to identify all Xacts that were active, and all dirty pages in the buffer pool at the time of the crash.  Redo: Redoes all updates to dirty pages in the buffer pool, as needed, to ensure that all logged updates are in fact carried out and written to disk.  Undo: The writes of all Xacts that were active at the crash are undone (by restoring the before value of the update, which is in the log record for the update), working backwards in the log. (Some care must be taken to handle the case of a crash occurring during the recovery process!)

Transaction Management: Concurrency Control CS634 Class 18, Apr 9, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.

Similar presentations

Presentation on theme: "Transaction Management: Concurrency Control CS634 Class 18, Apr 9, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Transaction Management: Concurrency Control CS634 Class 18, Apr 9, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.

Similar presentations

Presentation on theme: "Transaction Management: Concurrency Control CS634 Class 18, Apr 9, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke."— Presentation transcript:

Similar presentations

About project

Feedback