Download presentation
Presentation is loading. Please wait.
Published byChristine Wilkerson Modified over 9 years ago
1
CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Concurrency Control
2
Outline Chapter: “Concurrency Control Techniques” in Elmasri and Navathe Concurrency control via locks – Two-phase locking, other locking schemes, multiple granularity – Detecting and preventing deadlock Concurrency control via timestamps Multi-version concurrency control via locks and timestamps Why? Concurrency a big issue in distributed systems, finance, telecoms… Recommended by the DCS advisory board as a vital topic Programming contest: http://db.in.tum.de/sigmod15contest/http://db.in.tum.de/sigmod15contest/ CS346 Advanced Databases 2
3
Concurrency Control Protocols Concurrency Control Protocols: rules to ensure serializability – Enforce isolation property of transaction processing – Provide database consistency when processing transactions – Resolve conflicts between transactions – Decide which transaction prevails in a conflict Several different types of protocol – Two-phase locking: 1 transaction can access a data item at a time – Timestamps: used to determine which version of an item to use – Multi-version: allow multiple versions of an item to exist – Optimistic: plough on ahead and roll back if needed CS346 Advanced Databases 3
4
Locks and Locking Associate a lock with each data item to limit access A lock is a variable that describes the state of the item – Simplest form: locked, or available Use locks to control access – Only the transaction “holding” the lock can edit the item Rely on operating system/processor support to manage locks – Ensure that only one transaction can grab a lock at a time CS346 Advanced Databases 4
5
Binary Locks Binary locks have two states: locked and unlocked (available) – Denote locked = 1, unlocked = 0 – lock(X) gives current state of lock for item X, 0 or 1 If lock(X)=0, the item can be accessed on request (lock_item(X)) – lock(X) is set to 1 to indicate it is locked If lock(X)=1, then X cannot be accessed by any other transaction – Must wait for the lock to be released, unlock_item(X) A binary lock enforces mutual exclusion on the item X – Rely on a “lock manager” to moderate access to lock(X) lock/unlock must be indivisible units: cannot be preempted – Implemented with a simple bit per item, plus record of lock holder – Keep a list of locked items in a lock table CS346 Advanced Databases 5
6
Enforcing locks For locks to be effective, must enforce the following rules: – A transaction T must hold lock(X) before any read or write to X – T must unlock_item(X) after it is done reading/writing X – T should not request lock(X) if it already holds it! – T cannot unlock_item(X) if it doesn’t hold the lock on X! These can be enforced by the lock manager of the DBMS – Manages the locks, tracks who has which locks CS346 Advanced Databases 6
7
Shared/Exclusive Locks Binary locks can be too restrictive – Only one transaction can access the data item at a time Can allow multiple transactions access to X, if they only read it – Shared access to X [read access] Still only one transaction can access X if it will write it – Exclusive access to X [write access] Three states: read-locked, write-locked, unlocked – Operations: read_lock(X), write_lock(X), unlock(X) CS346 Advanced Databases 7
8
Enforcing shared locks Lock manager’s work is more complex now – Track how many transactions hold a read lock on an item – Lock table may include entries like: Must enforce more rules: 1. A transaction T must hold a lock on X before any read of X 2. T must write_lock(X) before any write operation to X 3. T must unlock_item(X) after it is done reading/writing X 4. T should not request read_lock(X) if it already holds a lock! 5. T should not request write_lock(X) if it already holds a lock! May later relax: upgrade a read lock to a write lock 6. T cannot unlock_item(X) if it doesn’t hold the lock on X! CS346 Advanced Databases 8
9
Lock conversion Sometimes we want to relax conditions 4. and 5. – Allow request for a lock on X when some lock is already held Lock conversion: change the type of the lock held – Upgrade: write_lock(X) when a read_lock is held Succeeds if only 1 read_lock is held, else must wait – Downgrade: read_lock(X) when a write_lock is held Should always be permitted – Need to update the lock table to reflect the change Lock conversion is also possible using primitives: – downgrade = unlock + read_lock – But, someone might ‘steal’ the lock between unlock and read_lock CS346 Advanced Databases 9
10
Guaranteeing serializability Locks alone do not guarantee serializability – Transactions that look reasonable can still have (subtle) bugs Example transactions: Use is made of (the values of) X and Y after their locks are released – Need a protocol to govern the use of locks CS346 Advanced Databases 10
11
CS346 Advanced Databases 11
12
Deadlock and Starvation Deadlock is when each transaction T is waiting for an item that is locked by some other transaction T’ – Because no one can move, nothing happens – Example: T’ 1 wants lock(X), T’ 2 wants lock(Y) Starvation: a transaction T can’t proceed indefinitely while others go ahead normally – Can occur if T is waiting for a lock on a popular item, everyone else gets it first; or if T keeps getting chosen as the victim to abort Starvation can be fixed by ensuring everyone gets a chance – Fix 1: use a “fair” queuing scheme, e.g. first-come, first-served – Fix 2: assign priorities, increase priority of old transactions CS346 Advanced Databases 12
13
Two-Phase Locking A transaction follows two phase locking (2PL) if all lock operations precede the first unlock operation in the transaction – Hence two phases: a 1 st growing phase, then a 2 nd shrinking phase – No locks released in growing phase, none taken in shrinking phase If lock conversion is allowed, then: – Only upgrades in growing phase – Only downgrades in shrinking phase Previous example violates 2PL – T 1 unlocks Y before X is locked – T 2 unlocks X before Y is locked CS346 Advanced Databases 13
14
Two-phase locking example Modified version of the transactions – Meet 2PL requirements – Previous schedule is not allowed T 2 cannot get write_lock(Y) Must wait for T 1 The transactions can deadlock! – E.g. T 2 holds read_lock on Y T 1 holds read_lock on X Both want write_lock on the other – Can’t go on until one drops a read_lock CS346 Advanced Databases 14 unlock(Y)
15
Properties of 2PL If every transaction in a schedule follows 2PL, the schedule is guaranteed to be serializable – Proof omitted from this module – Means no need to test for serializability 2PL may limit the level of concurrency achievable – Means transaction T can’t release a lock if it needs a lock later – Or T must lock an item long before it is needed 2PL does not permit all possible serializable schedules – Some serializable schedules are prohibited by 2PL CS346 Advanced Databases 15
16
2PL variants Version described so far: basic 2PL – Other variants: conservative 2PL, strict 2PL, rigourous 2PL Conservative (static) 2PL based on read set and write set – Recall: read set is all items read, write set is all items written by T – Lock all items in read-set and write-set before transaction starts – If any are unavailable, wait until they all are – A deadlock-free protocol But it’s not always possible to know what is needed to lock – E.g. can only know some needed items by inspecting others CS346 Advanced Databases 16
17
2PL for strict schedules Strict 2PL guarantees strict schedules [see ‘Transaction Processing’] – No write locks released until the transaction commits/aborts – No transaction can read an item written by T unless T has committed – S2PL is not deadlock free Rigourous 2PL (Strong Strict 2PL) also guarantees strict schedules – T does not release any locks until it commits/aborts under SS2PL Contrast different emphasis: – Conservative locks everything at the start (always in shrinking phase) – Rigourous releases all locks at the end (always in expanding phase) The concurrency control system can automate lock requests – E.g. strict 2PL: lock each item as needed, automatically release at end – Place transactions in a queue if they need a currently locked item CS346 Advanced Databases 17
18
Granularity of Locking The notion of a database item can apply to different objects – A single field of a record; Database record; disk block; whole file The size of items is referred to as the data item granularity – Fine granularity: small sizes – Coarse granularity: large sizes Coarse granularity means lower amount of concurrency possible – Suppose a transaction locks a disk block to modify a record – Then other records in the same block are also locked Fine granularity means more items in the database – More overhead for the lock manager, more operations performed Picking the right granularity is a significant design issue – Try to pick a level that matches the needs of transactions CS346 Advanced Databases 18
19
Multiple Granularity Level locking Some systems offer multiple levels of granularity – E.g. lock a single seat on a flight, or a whole plane A granularity hierarchy with a multiple granularity 2PL protocol – Locking becomes more complicated: more cases to consider – Locks for each node in the hierarchy e.g. file lock, record lock – Tricky: obtaining file locks means all record locks must be dropped – Intention locks can be used to check conflicts efficiently CS346 Advanced Databases 19
20
Dealing with deadlock Recall deadlock: no transaction can proceed due to locks – Use a deadlock prevention protocol (not always practical) E.g. Lock all needed items in advance (conservative 2PL) E.g. place a total order on the items in the database – A transaction can only lock items in item order – Nice idea in theory, but not practical in reality More aggressive protocols: abort a transaction to break deadlock – How to pick which transaction to kill? CS346 Advanced Databases 20
21
CS346 Advanced Databases Deadlock Detection Detect deadlock and abort transactions as needed – Can be effective if transactions rarely overlap – I.e. when transactions are short, only lock a few items Since deadlock is from cycles of dependency, create the graph – Wait-for graph: transaction nodes, edges for waiting relationships – Whenever T i wants to lock X held by T j, create edge (T i T j ) – When T j releases locks on items T i was waiting for, delete edge – There is deadlock if and only if there is a cycle in the wait-for graph 21
22
Deadlock Detection When to check for a cycle in the graph? – Every time an edge is added? Could be high overhead – When the number of current transactions is high enough? – When several transactions have been waiting for a while? Victim selection is how to choose which process to abort – Typically prefer younger transactions (less to redo) CS346 Advanced Databases 22
23
Concurrency control by Timestamp ordering Methods seen so far all involve locking (2PL etc.) Timestamp ordering concurrency control doesn’t use any locks – Timestamps are used to determine precedence order of operations – No locks, hence no possibility of deadlock A timestamp is a (unique) identifier, assigned in increasing order – Define the timestamp of transaction T as TS(T) – Either based on a counter, or system time (ensuring no duplicates) CS346 Advanced Databases 23
24
Timestamp ordering algorithm Timestamp ordering: order transactions by their timestamps (TS) – Ensure schedule is equivalent to serial schedule in timestamp order – Resolution of conflicting operations can’t violate timestamp order Each item X is associated with two timestamp values – read_TS(X): TS of youngest transaction to do a successful read of X – write_TS(X): TS of youngest transaction to do a successful write to X Outline of basic timestamp ordering algorithm (TOA) – For each operation, check that timestamp order is not violated – If T violates order, T is aborted and restarted with new timestamp – If T is rolled back, any transaction using writes of T is rolled back – Can cause cascading rollback: needs extra work to avoid CS346 Advanced Databases 24
25
Basic Timestamp Ordering Algorithm Whenever transaction T issues a write_item(X) operation: – If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then Abort and roll back T, reject the operation { A younger transaction has read/written X, violating ordering } – Else execute the write, set write_TS(X) TS(T) Whenever transaction T issues a read_item(X) operation: – If write_TS(X) > TS(T), then Abort and roll back T, reject the operation { A younger transaction has written X, violating ordering } – Else, execute the read, set read_TS (X) max(TS(T), read_TS(X)) When TOA detects conflicting operations, it rejects the later one – Hence, schedules are conflict serializable CS346 Advanced Databases 25
26
Strict Timestamp Ordering Strict TO ensures schedules are strict and conflict serializable – If T issues a read or write operation on X where TS(T) > write_TS(X), T is delayed until transaction T’ that wrote X commits or aborts – Effectively the same as locking X until T’ commits or aborts – No deadlock, as T only waits for T’ if TS(T) > TS(T’) Thomas’s write rule modifies checks on write from basic TO – If read_TS(X) > TS(T), abort and roll back T, reject the read – If write_TS(X) > TS(T), don’t execute write but continue A later transaction has already written X, so write should be lost If this caused a conflict, it would be detected by the above rule – If neither condition holds, do the write and set write_TS(X) = TS(T) CS346 Advanced Databases 26
27
Deadlock avoidance via timestamp protocols Can combine locks with Timestamp-based protocols – Record the (unique) start time of the transaction, TS(T) – Suppose T i tries to access an item X but X is locked by T k Wait-die protocol: if TS(T i ) > TS(T k ), abort (younger) T i – Restart T i later with the original timestamp TS(T i ) – Else, T i is older, and is allowed to wait – The usurper is aborted if it is younger, else it can wait Wound-wait protocol: if TS(T i ) < TS(T k ), abort (younger) T k – Restart T k later with the original timestamp TS(T k ) – Else, T i is allowed to wait – The usurper pre-empts the lock holder if it is older, else it can wait CS346 Advanced Databases 27
28
Timestamp-based protocol properties Both protocols prefer older transactions over younger ones – Older have made more progress, younger have less to lose Both wound-wait and wait-die are deadlock free protocols – Suppose there is a deadlock, then there is a cycle of transactions – All transactions in the cycle are in the ‘wait’ state – Wait-die: can only wait if older than the holder – Wound-wait: can only wait if younger than the holder Can’t be a cycle where everyone is older (younger) than the next – Hence, contradiction: no deadlock possible Wound-wait and wait-die are possibly aggressive: – They abort transactions unnecessarily (wouldn’t lead to deadlock) CS346 Advanced Databases 28
29
Timestamp-free protocols No waiting algorithm: can’t obtain a lock? Abort immediately! – Restart after a time delay – No transaction is ever waiting, so no deadlock – A lot of needless aborting and restarting Cautious waiting algorithm: tries to reduce the waste – T i tries to lock X which is held by T k – If T k is not blocked, T i is blocked and allowed to wait – Else, T k is blocked, abort T i Cautious waiting is deadlock free – Suppose there is a cycle of waiting (blocked) transactions – Consider time at which each transaction became blocked – Can only complete a cycle if some T is blocked by T’ already blocked CS346 Advanced Databases 29
30
Multiversion Concurrency Control We can keep old versions of a data item when it is updated – The appropriate version can be given to a transaction – Choose a version that will maintain serializability Increased space cost: need to keep more versions of the data – Use garbage collection ideas to remove unneeded versions Can be more time efficient: less waiting as a version is available Several realizations possible: – Multiversion based on timestamp ordering – Multiversion two-phase locking using certify locks CS346 Advanced Databases 30
31
Multiversion Based on Timestamp Ordering Several versions of X are kept, X 1, X 2, … X k – For each version, keep the value and two timestamps – read_TS(X i ) : the largest timestamp of a transaction that read X i – write_TS(X i ) : the timestamp of the transaction that wrote X i When a transaction T writes to X, X k+1 is created – read_TS(X k+1 ) = write_TS(X k+1 ) = TS(T) When T reads from X i, read_TS(X i ) max(read_TS(X i ), TS(T)) CS346 Advanced Databases 31
32
Multiversion Based on Timestamp Ordering Rules enforce serializability: 1.If T tries to write X, if X i with highest write_TS(X i ) ≤ TS(T) has read_TS(X i ) > TS(T), then abort T and roll back else create new X k with read_TS(X k ) = write_TS(X k ) = TS(T) 2.If T tries to read X, find X i with highest write_TS(X i ) ≤ TS(T) Return value of X i to T, update read_TS(X i ) max(read_TS(X i ),TS(T)) Reads are always successful (rule 2.) Writes may cause abort (rule 1.) if T tries to write a version that should have been read by a later transaction – Rollback of T can cause cascading rollbacks – So T cannot commit until all T’ that wrote X that T reads also commit CS346 Advanced Databases 32
33
Multiversion 2PL using certify locks Add an extra type of lock: certify – Modes are read-locked, write-locked, certify-locked and unlocked Allow transactions to read while T holds the write lock – Two versions of an item: one edited version and one committed – Transactions can read the committed version while T edits – T’s writes do not affect the committed version To commit the edited version, T must obtain certify lock – Certify is not compatible with read locks: they must be dropped – When certify is acquired, the new version replaces the old one Avoids cascading aborts as only committed versions can be read – Deadlock still possible, and can be handled by previous methods CS346 Advanced Databases 33
34
Optimistic Concurrency Control (OCC) Methods discussed so far check before an operation is allowed – E.g. whether it is locked, or whether timestamps agree – This can represent a significant overhead during transactions Optimistic concurrency control has no checking during execution – Updates are applied to local copies of the data items A validation phase checks if any updates violate serializability – If OK, transaction commits and database updates from local copies – Else, transaction aborts and is restarted later Three phases for transaction T in this OCC protocol – Read phase: T reads committed data items, updates local copies – Validation phase: Check T’s updates don’t violate serializability – Write phase: if successful, apply the updates to the database CS346 Advanced Databases 34
35
Optimistic Concurrency Control OCC performs all checks together for more efficiency – Works well if transactions don’t overlap much in general – A lot of interference leads to a lot of aborts and restarts – “Optimistic” since we assume the former case holds Validation checks transaction T i against other transactions – The checks require timestamps, read sets and write sets – For each T k that is either committed or also in validation: T k must complete its write phase before T i starts its read phase T i starts its write phase after T k computes its write phase and there are no items in read_set(T i ) write_set(T k ) (read set(T i ) write set(T i )) write set(T k ) = 0 and T k completes its read phase before T i completes its read phase – Else, could be overlap, so abort T i CS346 Advanced Databases 35
36
Locks on indexes May want to apply locking to more complex database objects – Indexes are a good example: hierarchical, often changing Directly applying locking ideas doesn’t work well – Every update wants to lock the root: no concurrent access Make use of knowledge about index structure – Read locks for parent nodes can be dropped after the child is found – If insertion affects a non-full leaf node, only lock on leaf is needed – Drop locks on parents of non-full internal nodes Modify index data structures to make them more “lock-friendly” – E.g. the B-link tree add more links between internal nodes – Links make it easier to find data if tree is updated during a search CS346 Advanced Databases 36
37
Insertion, deletion and phantom records Insertion: when a new item is inserted into the database – The item is given a new unique name by the system – A lock is created (if needed), given to creating transaction – Or read and write timestamps are set to TS of creating transaction Deletion: a transaction tries to delete an item X – Locks: deleting transaction must hold exclusive (write) lock on X – Timestamps: ensure no later transaction has read or written X Phantom problem: when a new record X is created by T – If X meets a condition that T’ is applying to but is missed by T’ – E.g. if T’ is accessing all employees with DNO=5, and X is in dept 5 – Can be hard to detect: X appeared after T’ searched – Possible solution: lock index during T’ to delay insertion of X CS346 Advanced Databases 37
38
Summary Concurrency control via locks – Two-phase locking, shared and exclusive locks – Conservative, strict, rigourous 2PL, multiple granularity locking – Detecting and preventing deadlock via wait-for graphs Concurrency control via timestamps – Wait-die and wound-wait protocols, cycle-freeness – Timestamp ordering and Thomas’s write rule Multiversion concurrency control via locks and timestamps Optimistic concurrency control: db.in.tum.de/sigmod15contest/ Chapter: “Concurrency Control Techniques” in Elmasri Navathe CS346 Advanced Databases 38
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.