Concurrency Control Nate Nystrom CS 632 February 6, 2001
Papers Berenson, Bernstein, Gray, et al., "A Critique of ANSI SQL Isolation Levels", SIGMOD'95 Kung and Robinson, "On Optimistic Methods for Concurrency Control", TODS June 1981 Agrawal, Carey, and Livny, "Models for Studying Concurrency Control Performance: Alternatives and Implications", SIGMOD'85
Concurrency control methods Locking By far, the most popular method Deadlock, starvation Optimistic High abort rates Immediate restart
Isolation Levels Serializability is expensive to enforce Trade correctness for performance Transactions can run at lower isolation levels Repeatable read Read committed Read uncommitted
Basics History: sequence of operations Ex: r 1 (x) r 2 (y) w 1 (y) c 1 w 2 (x) a 2 Dependencies: wr (true), rw (anti), ww (output) H and H' equivalent if H' is reordering of H and H' has same dependencies as H H serializable if serial H' s.t. H H' Concurrent T and T' conflict if both access same item and one writes
ANSI SQL Isolation Levels Defined in terms of proscribed anomalies Read Uncommitted - everything allowed Read Committed - dirty reads Repeatable Read - dirty reads, fuzzy reads Serializable - dirty reads, fuzzy reads, phantoms
Problems Anomalies are ambiguous w 1 (x)... r 2 (x)... (a 1 & c 2 in any order) w 1 (x)... r 2 (x)... ((c 1 | a 1 ) & (c 2 | a 2 ) in any order) First case is strict interpretation (an anomaly), second is loose interpretation (a phenomenon) Anomalies don't prevent some undesirable behavior Ex: Phantom defined to include inserts and updates, but not deletes
Locking T has well-formed writes (reads) if it requests a write lock before writing T has two-phase locking if it does not request any lock after releasing a lock Locks are long duration if held until abort, else short duration Theorem: well-formed two-phase locking guarantees serializability
Locking Isolation Levels 0 has well-formed (i.e., short) writes 1 (read committed) - long duration write locks (read uncommitted) - short read locks, long write locks repeatable read - short predicate read locks, long item read locks, long write locks (serializable) - long read locks, long write locks
Dirty Writes ANSI definitions lack prohibition of dirty writes w 1 (x)... w 2 (x)... ((c 1 | a 1 ) & (c 2 | a 2 ) in any order) With dirty writes allowed, rollback is difficult to implement (with locking CC) Prohibiting dirty writes serializes txns in write order (all ww dependencys go forward)
New Definitions Use loose interpretation Fix definition of phantom to prevent deletes Prohibit dirty writes Read Uncommitted - dirty writes Read Committed - dirty writes, dirty reads Repeatable Read - dirty writes, dirty reads, fuzzy reads Serializable - dirty writes, dirty reads, fuzzy reads, phantoms
More Problems New definitions are too strong Prohibits some serializable histories r 1 (x) w 1 (x) r 1 (y) w 1 (y) r 2 (x) r 2 (y) c 1 c 2 T 2 has dirty reads according to the proposed new definitions Prohibiting dirty writes useful for recovery with locking CC, but not helpful for optimistic CC
Other Isolation Levels Cursor stability Prevent lost updates by adding cursor reads Stronger than read committed Weaker than repeatable read Snapshot isolation Read from/write to a snapshot of the committed data as of the time the transaction started Stronger than read committed Incomparable to repeatable read
Optimistic Concurrency Control Divide transaction into read, validate, and write phases Validation checks if transaction can be inserted into a serializable history Why: lower message cost, little blocking in low contention environments, no deadlock Why not: abort rates can be high, not suitable for interactive, non-restartable, transactions
Validation Assign transaction i a unique number t(i). Validation condition: For all i and for all j with t(i) < t(j), one of the following must hold: 1 i completes write phase before j starts read phase 2 i completes write phase before j starts write phase and WS(i) RS(j) = 3 i completes read phase before j completes read phase and WS(i) (RS(j) WS(j)) =
Validation readwritevalidate readwritevalidate readwritevalidate readwritevalidate readwritevalidate readwritevalidate WS(i) RS(j) = i j WS(i) (RS(j) WS(j)) = i i j j
Transaction numbers What should t(i) be? Unique timestamp assigned at beginning of validation phase Guarantees that i completes read phase before j completes read phase if t(i) < t(j)
Serial Implementation Ensure one of conditions (1) or (2) holds At transaction begin, record start tn At transaction end, record finish tn Validate against all t in [start tn+1, finish tn] by checking if RS intersects WS(t) (2) requires concurrent transactions write phases are serial: put validation, assignment of tn, and write phase in a critical section Various optimizations to reduce size of critical section
Parallel Implementation Ensure one of (1), (2), and (3) hold At transaction end, take snapshot of active set, then add tid to active set Validate outside CS against: All t in [start tn+1, finish tn] by checking if RS intersects WS(t) All t in our snapshot of active by checking if RS or WS intersects WS(t) If valid, perform writes outside CS, assign tn, and remove from active set
Performance Agrawal: previous studies flawed Different performance models contradictions Flawed assumptions Infinite resources Transactions progress at a rate independent of number of concurrent transactions Need a more complete, more realistic model
Logical Queuing Model terminals blocked Q ready Q update Q update delay object Q think?more? think object UPDATE COMMIT RESTART ACCESS BLOCK CC
Experiments Compare locking, optimistic, and immediate-restart CC Low contention (large database) Infinite resources Limited resources (small database) Multiple resources Interactive workloads
Limited Resources Correspondence between disk utilization and throughput when low contention When high contention, correspondence between useful disk utilization and throughput High contention aborts and restarts
Response Time
Multiple Resources
As resources increase, non-blocking CC scales better than blocking Blocking CC thrashes waiting for locks Optimistic CC thrashes on restarts Immediate-restart CC reaches a plateau due to adaptive restart delay
Conclusions Locking has better throughput for medium to high contention environments If resource utilization low enough that waste can be tolerated, immediate- restart and optimistic CC have better throughput Limit multiprogramming level to avoid thrashing due to blocking and restarts