Relational Database Systems 4

Relational Database Systems 4
Instructor: Prof. James Cheng Acknowledgement: The slides are extracted from and modified based on the slides provided by Prof. Sourav S. Bhowmick from Nanyang Technological University.

Topics to be covered ER model Relational Algebra SQL
Storage and Index Structures Query Processing and Query Optimization Transaction Management

The Road Ahead Two Issues Topics
Data must be protected from system failure. How to maintain integrity of data when the system fails? Data must not be corrupted when several error-free queries or database modifications are being done at once. How do we handle this? Topics Recovery Concurrency control

Recovery Type of Crash Prevention Wrong data entry Constraints and
Data cleaning Disk crashes Redundancy: e.g. RAID, archive Fire, theft, bankruptcy… Buy insurance, Change jobs… System failures DATABASE RECOVERY Most frequent

When system crashes, what happens to the state?
Transactions What is it? The processes that query and modify the database. Like any program, executes a number of steps in sequence Several of these steps may modify the database State of a transaction Each transaction has a state Represents what has happened so far in the transaction Current place in the transaction’s code being executed Values of the local variables When system crashes, what happens to the state? Internal state is lost – Don’t know which parts executed and which didn’t

Preliminaries Assumption 1 Assumption 2 Database State
DB composed of elements Usually 1 element = 1 block Can be smaller (=1 record) or larger (=1 relation) Assumption 2 Each transaction reads/writes some elements Database State A value for each of its elements Consistent state Satisfy all constraints of the database schema Consistent DB DB in consistent state

Transaction Properties (ACID)
Atomicity A transaction is either performed in its entirety (can be in steps) or not performed at all (all or nothing) Consistency If a transaction T starts with DB in consistent state and T executes in isolation, then T leaves in consistent state Isolation The concurrent execution of transactions results in a state that would be obtained if transactions were executed serially Durability Changes applied to the database by a committed transaction must persist in the database

Primitive Operations of Transactions
START TRANSACTION READ(A,t); t := t*2; WRITE(A,t); READ(B,t); WRITE(B,t); COMMIT; Operation Semantics READ(X, t) Copy database element X to transaction’s local variable t WRITE(X, t) copy transaction local variable t to element X INPUT(X) read element X to memory buffer OUTPUT(X) write element X to disk TRANSACTIONS MANAGER BUFFER MANAGER

Buffer Manager READ/WRITE INPUT/OUTPUT DISK
Page requests from higher-level code Files and access methods Buffer pool manager Buffer pool Disk page Main Memory Free frame Disk space manager INPUT/OUTPUT DISK Disk = collection of blocks 1 page corresponds to 1 disk block Data must be in the RAM for DBMS to operate on it. Buffer pool = table of <frame#, pageid> pairs

Example Action t Mem A Mem B Disk A Disk B READ(A,t) t := t*2
WRITE(A, t) READ(B, t) WRITE(B, t) OUTPUT(A) OUTPUT(B) 8 8 8 8 16 8 8 8 16 16 8 8 8 16 8 8 8 Crash! 16 16 8 8 8 16 16 16 8 8 Crash! 16 16 16 16 8 16 16 16 16 16 What to do? Either set both A and B to 8 or both should be advanced to 16 DB in inconsistent state!! (loses atomicity)

Solution: Use a Log Log How it helps?
File that records every single action of each transaction Multiple transactions run concurrently, log records are interleaved How it helps? After a system crash, use log to: Redo some transactions that did commit Undo other transactions that did not commit Three types of logs Undo, redo, undo/redo

Undo Logging Idea Log Records <START T>
Repairs the database state by undoing the effects of transactions that may not have completed before the crash Log Records <START T> Transaction T has begun <COMMIT T> T has committed (successful completion of transaction) <ABORT T> T has aborted <T,X,v> -- Update record (in memory, not disk) T has updated element X, and its old value was v

Log Entries Action t Mem A Mem B Disk A Disk B Log READ(A,t) t := t*2
WRITE(A, t) READ(B, t) WRITE(B, t) OUTPUT(A) OUTPUT(B) <START T> 8 8 8 8 16 8 8 8 16 16 8 8 <T, A, 8> 8 16 8 8 8 16 16 8 8 8 Crash! 16 16 16 8 8 <T, B, 8> 16 16 16 16 8 Crash! 16 16 16 16 16 <COMMIT T>

Undo-Logging Rules First Case Second Case
We UNDO both changes: A=8, B=8 The transaction is atomic, since none of its actions has been executed We don’t undo anything The transaction is atomic, since both it’s actions have been executed Undo Rules U1: If T modifies X, then <T,X,v> must be written to disk before OUTPUT(X) U2: If T commits, then its COMMIT log record must be written to disk only after all database elements changed by T have been written to disk, but as soon thereafter possible

Order of writing to the disk
Undo Logging Action t Mem A Mem B Disk A Disk B Log READ(A,t) t := t*2 WRITE(A, t) READ(B, t) WRITE(B, t) OUTPUT(A) OUTPUT(B) <START T> 8 8 8 8 A and B cannot be copied to disk until log records for the changes are on disk 16 8 8 8 16 16 8 8 <T, A, 8> 8 16 8 8 8 Order of writing to the disk The log records indicating changed database elements The changed database elements themselves The COMMIT log record 16 16 8 8 8 Flush the log again to make sure the <COMMIT T> record of the log appears in the disk 16 16 16 8 8 <T, B, 8> FLUSH LOG 16 16 16 16 8 16 16 16 16 16 <COMMIT T> FLUSH LOG

Recovery Using Undo Logging
When System Crashes Certain DB changes made by a given T were written to disk, while other changes made by T never reached the disk Run Recovery Manager – uses the log to restore the DB in consistent state Task 1 Decide for each transaction T whether it is completed or not <START T>….<COMMIT T>…. = yes (Rule U2) <START T>….<ABORT T>……. = yes <START T>……………………… = no Task 2 Undo all modifications by incomplete transactions

Recovery Using Undo Logging
Recovery Manager Read log from the end Remember all T for which it has seen a <COMMIT T> record or an <ABORT T> record If it sees <T, X, v> then (a) If T’s commit record has been seen=> do nothing (b) Otherwise, change the value of X in the DB to v Write a log record <ABORT T> for each incomplete T that was not previously aborted Flush the log All the undo commands are idempotent If we perform them a second time, no harm done if there is a system crash during recovery, simply restart recovery from scratch

Undo Logging Action t Mem A Mem B Disk A Disk B Log READ(A,t) t := t*2
WRITE(A, t) READ(B, t) WRITE(B, t) OUTPUT(A) OUTPUT(B) <START T> 8 8 8 8 16 8 8 8 16 16 8 8 <T, A, 8> 8 16 8 8 8 16 Case 1: It is possible that log record containing the COMMIT got flushed to disk . Case 2: If the COMMIT record never reached the disk then T is incomplete 16 8 8 8 CRASH 16 16 16 8 8 <T, B, 8> FLUSH LOG COMMIT record was not written, T is incomplete CRASH 16 16 16 16 8 16 16 16 16 16 CRASH <COMMIT T> FLUSH LOG CRASH

When do we stop reading the log?
Checkpointing When do we stop reading the log? We cannot stop until we reach the beginning of the log file This is impractical Solution Checkpointing the log periodically. Steps Observation Stop accepting new transactions Wait until all current transactions complete (COMMIT) Flush log to disk Write a <CKPT> log record, flush Resume transactions Any Ts executed prior to the checkpoint will have finished No need to undo these T during recovery When we encounter <CKPT> => We have seen all incomplete Ts

Example Limitations Log Record <START T1> <T1, A, 5>
<T2,B, 10> <T2,C, 15> <T1, D, 20> <COMMIT T1> <COMMIT T2> <CKPT> <START T3> <T3, E, 25> <T3, F, 30> Limitations Database freezes during checkpoint Active transactions may take a long time to commit or abort Would like to checkpoint while database is operational Solution: Nonquiescent checkpointing CRASH

Nonquiescent Checkpointing
Steps Log Record <START T1> <T1, A, 5> <START T2> <T2,B, 10> <START CKPT (T1, T2)> <T2, C, 15> <START T3> <T1, D, 20> <COMMIT T1> <T3, E, 25> <COMMIT T2> <END CKPT> <T3, F, 30> 1. Write a <START CKPT(T1,…,Tk)> where T1,…,Tk are all active transactions. 2. Flush log to disk 3. Continue normal operation 4. When all of T1,…,Tk have completed, write <END CKPT> 5. Flush log to disk CRASH All incomplete Ts began after the previous <START CKPT> CRASH

Redo Logging Idea Log Records
Repairs the database state by redoing the effects of committed transactions before the crash Ignore incomplete transactions Log Records <START T>, <COMMIT T>, <ABORT T> <T,X,v> -- Update record (in memory, not disk) T has updated element X, and its new value is v Redo Rules U1: If T modifies X, then both <T,X,v> and <COMMIT T> must be written to disk before OUTPUT(X)

Order of writing to the disk
Redo Logging Action t Mem A Mem B Disk A Disk B Log READ(A,t) t := t*2 WRITE(A, t) READ(B, t) WRITE(B, t) OUTPUT(A) OUTPUT(B) <START T> 8 8 8 8 16 8 8 8 All log records involving the changes of T appear in disk before A & B are written to disk. 16 16 8 8 <T, A, 16> Order of writing to the disk The log records indicating changed database elements The COMMIT log record The changed database elements themselves 8 16 8 8 8 16 16 8 8 8 16 16 16 8 8 <T, B, 16> <COMMIT T> FLUSH LOG 16 16 16 16 8 16 16 16 16 16

Recovery Using Redo Logging
Recovery Manager Identify the committed transactions Read log from the beginning If it sees <T, X, v> then (a) If T is not a committed transaction => do nothing (b) If T is committed, write value v for DB element X Write a log record <ABORT T> for each incomplete T Flush the log

Redo Logging Action t Mem A Mem B Disk A Disk B Log READ(A,t) t := t*2
WRITE(A, t) READ(B, t) WRITE(B, t) OUTPUT(A) OUTPUT(B) <START T> 8 8 8 8 16 8 8 8 16 16 8 8 <T, A, 16> 8 16 8 8 8 16 16 8 8 8 CRASH 16 16 16 8 8 <T, B, 16> CRASH <COMMIT T> FLUSH LOG CRASH 16 16 16 16 8 CRASH 16 16 16 16 16 CRASH

Checkpointing Steps Log Record <START T1> <T1, A, 5>
<COMMIT T1> <T2,B, 10> <START CKPT (T2)> <T2, C, 15> <START T3> <T3, D, 20> <END CKPT> <COMMIT T2> <COMMIT T3> Write a <START CKPT(T1,…,Tk)> where T1,…,Tk are all active transactions Flush to disk all blocks of committed transactions (dirty blocks), while continuing normal operation When all blocks have been written, write <END CKPT> and flush the log. CRASH CRASH CRASH <ABORT T3> to the log <ABORT T2> <ABORT T3>

Undo/Redo Logging Rules
Undo Logging Redo Logging Requires that data be written to disk immediately after a transaction finishes Increase number of disk I/Os Requires to keep all modified blocks in buffers until the transaction commits and the log records have been flushed Increases average number of buffers needed by transactions Undo/Redo Logging Rules Increased flexibility at the expense of maintaining more information on the log UR1: Before modifying a DB element X on disk because of changes made by some T, it is necessary that the update record <T, X, v , w> appear on disk

Undo/Redo Logging Action t Mem A Mem B Disk A Disk B Log READ(A,t)
t := t*2 WRITE(A, t) READ(B, t) WRITE(B, t) OUTPUT(A) OUTPUT(B) <START T> 8 8 8 8 16 8 8 8 16 16 8 8 <T, A, 8,16> Flexibility of OUTPUT Can OUTPUT whenever we want: before/after COMMIT 8 16 8 8 8 16 16 8 8 8 16 16 16 8 8 <T, B, 8,16> FLUSH LOG 16 16 16 16 8 <COMMIT T> 16 16 16 16 16

Recovery Using Undo/Redo Logging
Recovery Manager Redo all committed transaction, top-down Undo all uncommitted transactions, bottom-up

Recovery Using Undo/Redo Logging
Action t Mem A Mem B Disk A Disk B Log READ(A,t) t := t*2 WRITE(A, t) READ(B, t) WRITE(B, t) OUTPUT(A) OUTPUT(B) <START T> 8 8 8 8 16 8 8 8 16 16 8 8 <T, A, 8,16> 8 16 8 8 8 16 16 8 8 8 16 16 16 8 8 <T, B, 8,16> FLUSH LOG CRASH 16 16 16 16 8 CRASH <COMMIT T> CRASH 16 16 16 16 16

Concurrency Control Motivation Who is responsible?
Multiple transactions are running concurrently T1, T2, … They read/write some common elements A1, A2, … DB state can be inconsistent even if T1, T2..individually preserves correctness of the state & no system failure Who is responsible? Concurrency Control The process of assuring that transactions preserve consistency when executing simultaneously Scheduler of the DBMS What could go wrong if we don’t have it? Dirty reads (including inconsistent reads), Unrepeatable reads, Lost updates

Dirty Read & Inconsistent Read
WRITE(A) ABORT(A) T2 READ(A) Dirty Read T2 READ(A) READ(B) T1 A := 20; B := 20; WRITE(A) WRITE(B) Inconsistent Read

Unrepeatable Read & Lost Update
WRITE(A) T2 READ(A) Unrepeatable Read T2 READ(A); A := A*1.3 WRITE(A); T1 READ(A) A := A+5 WRITE(A) Lost Update

Schedule What is it? T1 T2 READ(A, t)
Given multiple transactions, a schedule is a sequence of interleaved actions from all transactions T1 READ(A, t) t := t+100 WRITE(A, t) READ(B, t) WRITE(B, t) T2 READ(A, s) s := s*2 WRITE(A, s) READ(B, s) WRITE(B, s) DB consistent state: A = B

Serial Schedule What is it?
B 25 READ(A, t) t := t+100 WRITE(A, t) 125 READ(B, t) WRITE(B, t) READ(A, s) s := s*2 WRITE(A, s) 250 READ(B, s) WRITE(B, s) A schedule is serial if its actions consists of all the actions of one transaction, then all the actions of another transaction, and so on Final state is not independent of the order of transactions

Serializable Schedule
Motivation T1 T2 A B 25 READ(A, t) t := t+100 WRITE(A, t) 125 READ(A, s) s := s*2 WRITE(A, s) 250 READ(B, t) WRITE(B, t) READ(B, s) WRITE(B, s) Every serial schedule will preserve DB consistency Are there any other schedules that also are guaranteed to preserve consistency? Serializable Schedule A schedule is serializable if it is equivalent to a serial schedule NOT SERIAL BUT THE EFFECT IS THE SAME

A Non-Serializable Schedule
T1 T2 A B 25 READ(A, t) t := t+100 WRITE(A, t) 125 READ(A, s) s := s*2 WRITE(A, s) 250 READ(B, s) WRITE(B, s) 50 READ(B, t) WRITE(B, t) 150 This sort of behavior that concurrency control mechanism must avoid!

Ignore Transaction Semantics
Avoid unrealism Sometimes transactions’ actions can commute accidentally because of specific updates Serializability is undecidable! Scheduler should not look at transaction details Assumption Assume worst case updates Only care about read r(A) and write w(A) Not the actual values involved

Notations T1 READ(A, t) t := t+100 WRITE(A, t) READ(B, t) T2
WRITE(B, t) T1: r1(A); w1(A); r1(B); w1(B) T2 READ(A, s) s := s*2 WRITE(A, s) READ(B, s) WRITE(B, s) T2: r2(A); w2(A); r2(B); w2(B)

Conflict-Serializability
Commercial systems Schedulers in commercial systems enforce a condition called conflict serializability Stronger than general notion of serializability Based on the idea of a conflicts Conflict A pair of consecutive actions in a schedule such that, if their order is interchanged, then the behavior of at least one of the transactions involved can change

Conflicts No Conflicts Conflicts ri(X); rj(Y) ri(X); wi(Y)
Two actions by same transactions ri(X); wi(Y) ri(X); wj(Y) for X != Y Two writes by Ti, Tj to same element wi(X); wj(X) wi(X); rj(Y) for X != Y Read/write by Ti, Tj to same element wi(X); rj(X) ri(X); wj(X) wi(X); wj(Y) for X != Y

Conflict-Serializability
Conflict-Serializable A schedule is conflict serializable if it can be transformed into a serial schedule by a series of swappings of adjacent non-conflicting actions. r1(A); w1(A); r2(A); w2(A); r1(B); w1(B); r2(B); w2(B); r1(A); w1(A); r2(A); r1(B); w2(A); w1(B); r2(B); w2(B); r1(A); w1(A); r1(B); r2(A); w2(A); w1(B); r2(B); w2(B); r1(A); w1(A); r1(B); r2(A); w1(B); w2(A); r2(B); w2(B); r1(A); w1(A); r1(B); w1(B); r2(A); w2(A); r2(B); w2(B);

The Precedence Graph Test
How to Test? Is a schedule conflict-serializable ? Precedence Condition Given a schedule S, T1 takes precedence over T2 (T1 <S T2), if there are actions A1 of T1 and A2 of T2, such that: A1 is ahead of A2 in S Both A1 and A2 involve the same database element; and At least one of A1 and A2 is a write-action Precedence Graph Summarize the precedence using precendence graph These are exactly the conditions under which we cannot swap A1 and A2

The Precedence Graph Test
Nodes are transactions There is an edge from node Ti to Tj if Ti <S Tj The test: if the graph has no cycles, then it is conflict serializable S: r2(A); r1(B); w2(A); r3(A); w1(B); w3(A); r2(B); w2(B); T1 <S T2 A1 is ahead of A2 in S Both A1 and A2 involve the same database element; and At least one of A1 and A2 is a write-action 1 2 3 Conflict serializable

Precedence Graph T1 <S T2
S: r2(A); r1(B); w2(A); r2(B); r3(A); w1(B); w3(A); w2(B); T1 <S T2 A1 is ahead of A2 in S Both A1 and A2 involve the same database element; and At least one of A1 and A2 is a write-action 1 2 3 Not conflict serializable

Conflict Serializability
Conflict-Serializability is not necessary for Serializability A serializable schedule need not be conflict-serializable, even under the “worst case update” assumption S1: w1(Y); w1(X); w2(Y); w2(X); w3(X); S2: w1(Y); w2(Y); w2(X); w1(X); w3(X); Equivalent, but can’t swap X has the same value after either S1 or S2 Y has the same value after either S1 or S2 S1 is serial => S2 is serializable

Locks How does the scheduler ensure serializability?
Time stamps Validation The idea of locking Each element has a unique lock Each transaction must first acquire the lock before reading/writing that element If the lock is taken by another transaction, then wait The transaction must release the lock(s)

Characteristics of Locks
Consistency of Transactions A transaction can only read or write an element if it previously was granted a lock on that element and hasn’t released the lock yet If a transaction locks an element, it must later unlock that element Legality of Schedules No two transactions may have locked the same element without one having first released the lock. Notation li(A) = transaction Ti acquires lock for element A ui(A) = transaction Ti releases lock for element A

Example Legal schedule as T1 & T2 never hold a lock on
B 25 l1(A), r1(A); A := A+100 w1(A), u1(A); 125 l2(A); r2(A); A := A*2 w2(A); u2(A); 250 l2(B); r2(B); B := B*2 w2(B); u2(B); 50 l1(B), r1(B); B := B+100 w1(B), u1(B); 150 Legal schedule as T1 & T2 never hold a lock on A/B at the same time But not serializable!

Maintaining Conflict-Serializability
25 l1(A), r1(A); A := A+100 w1(A); l1(B); u1(A); 125 l2(A); r2(A); A := A*2 w2(A); 250 l2(B) DENIED r1(B); B := B+100 w1(B), u1(B); l2(B); u2(A); r2(B); B := B*2 w2(B); u2(B); Forcing T2 to wait results in consistent DB How can we guarantee a legal schedule of consistent transactions is conflict-serializable?

Two-Phase Locking (2PL)
2PL Rule In every transaction, all lock actions precede all unlock actions Two phases First phase – Locks are obtained Second phase – Locks are relinquished A transaction that obeys 2PL condition is called a two-phase-locked transaction (2PL transaction)

Example Does not obey 2PL condition T1 T2 A B 25 l1(A), r1(A);
A := A+100 w1(A), u1(A); 125 l2(A); r2(A); A := A*2 w2(A); u2(A); 250 l2(B); r2(B); B := B*2 w2(B); u2(B); 50 l1(B), r1(B); B := B+100 w1(B), u1(B); 150 Does not obey 2PL condition

2PL Transactions T1 T2 A B 25 l1(A), r1(A); A := A+100
w1(A); l1(B); u1(A); 125 l2(A); r2(A); A := A*2 w2(A); 250 l2(B) DENIED r1(B); B := B+100 w1(B), u1(B); l2(B); u2(A); r2(B); B := B*2 w2(B); u2(B);

A Risk of Deadlock T1 T2 A B 25 l1(A), r1(A); l2(B); r2(B); A := A+100
B := B*2 w1(A); 125 w2(B); 50 l1(B) DENIED l2(A) DENIED Not possible to allow both T1 and T2 to proceed, since if we do so the final DB state cannot possibly have A = B

Lock Modes Background Types of Locks
Too simple to be a practical scheme! T must take a lock on a database element X even if it only wants to read X and not write it. No reasons why several transactions can’t read X at the same time! Types of Locks Shared lock (For READ) Exclusive lock (For WRITE) Update lock Initially like shared lock Later may be upgraded to exclusive lock

Shared and Exclusive Locks
Locking scheduler principle For any DB element X there can be either one exclusive lock on X, or no exclusive locks but any number of shared locks If we want to write X, we need to have an exclusive lock on X If we want to read X, we may have a shared or exclusive lock Notation sli(A) = transaction Ti requests a shared lock for element A xli(A) = transaction Ti requests an exclusive lock for element A

Consistency of Transactions In any transaction Ti, ri(X) must be preceded by sli(X) or xli(X), with no intervening ui(X) In any transaction Ti, wi(X) must be preceded by xli(X), with no intervening ui(X) 2PL of Transactions In any 2PL transaction Ti, no action sli(X) or xli(X) can be preceded by an action ui(Y), for any Y

Legality of Schedules An element may either be locked exclusively by one transaction or by several in shared mode, but not both If xli(X) appears in a schedule, then there cannot be a following xlj(X) or slj(X) for some jI, without an intervening ui(X) If sli(X) appears in a schedule, then there cannot be a following xlj(X) for jI, without an intervening ui(X)

Example T1 T2 sl1(A); r1(A); sl2(A); r2(A); sl2(B); r2(B);
xl1(B) DENIED u2(A); u2(B); xl1(B); r1(B); w1(B); u1(A); u1(B); Legality of schedule If sli(X) appears in a schedule, then there cannot be a following xlj(X) for jI, without an intervening ui(X)

Compatibility Matrix What is it?
Describe lock management policies involving several lock modes Lock requested S X Lock held in mode Yes No

Upgrading Locks Idea T1 T2 sl1(A); r1(A); sl2(A); r2(A);
sl2(B); r2(B); sl1(B); r1(B); xl1(B) DENIED u2(A); u2(B); xl1(B); w1(B); u1(A); u1(B); T1 T2 sl1(A); r1(A); sl2(A); r2(A); sl2(B); r2(B); xl1(B) DENIED u2(A); u2(B); xl1(B); r1(B); w1(B); u1(A); u1(B); Let T wants to read and write a new value of X First to take a shared lock on X Later, when T was ready to write the new value, upgrade the lock to exclusive

Update Locks DEADLOCK Update Lock
sl1(A) sl2(A) xl1(A) DENIED xl2(A) DENIED DEADLOCK Update Lock An update lock uli(X) gives Ti only the privilege to read X, not to write X Only the update lock can be upgraded to a write lock later A read lock cannot be updated An update lock can be granted on X even if sli(X) exists Once uli(X) is given no other modes can be taken on X

Update Locks Lock requested S X U Lock held in mode Yes No T1 T2
ul1(A); r1(A); ul2(A) DENIED xl1(A); w1(A); u1(A); ul2(A); r2(A); xl2(A); w2(A); u2(A); T1 T2 sl1(A) sl2(A) xl1(A) DENIED xl2(A) DENIED

Architecture of Locking Scheduler
Question How does a scheduler use these locking schemes? Inserts appropriate lock actions ahead of all DB access operations From transactions READ(A); WRITE(B); COMMIT(T);….. Assumptions If T aborts/commits, it is notified by Trans Manager. Releases all locks held by T. Notifies Part 2 for waiting transactions Transactions themselves do not request locks. Job of the scheduler to insert lock actions Transactions do not release locks Scheduler releases them when the transaction manager tells that the transaction is committed/aborted Scheduler (Part 1) Lock Table LOCK(A); READ(A); ….. Scheduler (Part 2) Determines the next transaction or transactions that can now be given a lock on X Determines if T is delayed as a lock has not been granted and maintains a waiting list READ(A); WRITE(B);

Lock Table What is it? Purpose
A relation that associates database elements with locking information about that element An element that is not locked doesn’t appear in the table Size is proportional to the number of locked elements Purpose When a lock is requested, check the lock table Grant, or add the transaction to the element’s wait list • When a lock is released, re-activate a transaction from its wait list • When a transaction aborts, release all its locks • Check for deadlocks occasionally

Lock Table Summary of the most stringent conditions that a transaction requesting a new lock on A faces There is at least one transaction waiting for a lock on A All those transactions that either currently hold locks on A or waiting for a lock on A A Group mode: U Waiting: Yes List: Links all entries for a particular transaction Support efficient commits or aborts Tran Mode Wait? Tnext Next T1 S no T2 U no T3 X yes

Lock Requests X S A Group mode: U Waiting: Yes List: Tran Mode Wait?
Tnext Next T1 S no T2 U no T3 X yes

Handling Unlocks Wait = ‘yes’
Group mode: U Waiting: Yes List: Wait = ‘yes’ Grant one or more locks from the list of requested locks First-come-first-served: Grant the lock request that has been waiting the longest Priority to shared locks: First grant all S waiting. Then grant U. Only grant X if no others waiting. Priority to upgrading: If there is a transaction with a U lock waiting to upgrade it to an X, grant that first. T1 S no T2 U no T3 X yes T unlocks A T’s entry on the list for A is deleted If the lock held by T is not the same as Group mode => no change Otherwise examine the entire list to update group mode

Timestamp Basics What is timestamp?
Each transaction receives a unique timestamp TS(T) Issued in ascending order What is timestamp? The system’s clock A unique counter, incremented by the scheduler Main Invariant The timestamp order defines the serialization order of the transaction For any two conflicting actions, ensure that their order is the serialized order

Basics Requirement RT(X) WT(X) C(X)
Associate with each database element X two timestamps and an additional bit RT(X) WT(X) Read time of X Highest timestamp of a transaction that has read X Write time of X Highest timestamp of a transaction that has written X C(X) Commit bit of X True if and only if the most recent transaction to write X has already committed. To prevent dirty read

Physically Unrealizable Behaviors
Scheduler’s Job Assumes the timestamp order of transactions is also the serial order in which they must appear to execute Assigns TS and updating RT, WT, C for the DB element Check whenever read/write occurs, what happens in real time could have happened if each transaction had executed instantaneously at the moment of its timestamp If not, the behavior is physically unrealizable Problems Read too late Write too late

Read & Write Too Late Read too late Write too late Write too late
T tries to read DB element X but TS(T) < WT(X) Solution: Rollback T T want to write DB element X but TS(T) < RT(X) Solution: Rollback T U read X T writes X U writes X T reads X T start U start Write too late TS(T) >= RT(X) but WT(X) >TS(T) Solution: Don’t write X at all! T start U start

Dirty Read Problem Read dirty data Write dirty data
T tries to read DB element X but WT(X) < TS(T) Solution: If C(X) is false then T has to wait for it to be true T want to write DB element X but WT(X) > TS(T) Solution: If C(X) is false then T has to wait for it to be true U writes X T reads X U writes X T writes X U aborts U start T start U aborts T commits T start U start

Rules for TS-Based Scheduling
The Scheduler Can response to a read/write request from T in the following ways: Grant the request Abort T and restart T with a new timestamp (rollback) Delay T and later decide whether to abort T or to grant the request Scheduler receives a request rT(X) TS(T) >= WT(X) TS(T) < WT(X) Read is physically unrealizable Rollback T C(X) is true C(X) is false Grant the request Delay T until C(X) becomes true, or the transaction that wrote X aborts TS(T) >RT(X) RT(X) := TS(T) don’t change TS(T)

Scheduler receives a request wT(X) TS(T) >= RT(X) TS(T) >= WT(X) TS(T) < RT(X) TS(T) >= RT(X) TS(T) < WT(X) Write is physically realizable and must be performed Write is physically unrealizable, but there is already a later value in X Write is physically unrealizable T must be rolled back Write the new value for X Set WT(X) := TS(T) Set C(X) := false C(X) is true C(X) is false The previous writer of X is committed, and we simply ignore the write by T We allow T to proceed and make no change to the DB Delay T until C(X) becomes true, or the transaction that wrote X aborts

Scheduler receives a request to abort T or decides to rollback T Scheduler receives a request to commit T Find all the DB elements X written by T Set C(X) := true Any transaction that was waiting on an element X that T wrote must repeat its attempt to read or write, and see whether the action is now legal after T’s writes are cancelled If any transactions are waiting for X to be committed, these transactions are allowed to proceed

Example T1 T2 T3 A B C 200 150 175 RT=0 WT=0 r1(B); RT= r2(A); r3(C);
w1(B); WT= w1(A); w2(C); Abort; w3(A); TS(T) >= RT(X) TS(T) < WT(X) Write is physically realizable, but there is already a later value in X The previous writer of X is committed, and we simply ignore the write by T We allow T to proceed and make no change to the DB C(X) is true TS(T) >= RT(X) TS(T) >= WT(X) Write is physically realizable and must be performed Write the new value for X Set WT(X) := TS(T) Set C(X) := false TS(T) >= WT(X) Grant the request Delay T until C(X) becomes true, or the transaction that wrote X aborts C(X) is true C(X) is false RT(X) := TS(T) don’t change TS(T) TS(T) >RT(X) Write is physically unrealizable T must be rolled back TS(T) < RT(X)

Multiversion Timestamping
The idea When transaction T requests r(X) but WT(X) > TS(T), then T must rollback Keep multiple versions of X: Xt, Xt-1, Xt-2, … where TS(Xt) >TS(Xt-1) >TS(Xt-2) > … Let T read an older version, with appropriate timestamp Scheduler receives a request rT(X) TS(T) >= WT(X) TS(T) < WT(X) Read is physically unrealizable Rollback T C(X) is true C(X) is false Grant the request Delay T until C(X) becomes true, or the transaction that wrote X aborts TS(T) >RT(X) RT(X) := TS(T) don’t change TS(T)

Details When wT(X) occurs When rT(X) occurs Notes
Create a new version Xt where t = TS(T) Find most recent version Xt such that t <= TS(T) Notes WT(Xt) = t and it never changes RT(Xt) must still be maintained to check legality of writes Can delete Xt if we have a later version Xt1 and all active transactions T have TS(T) > t1

Example T1 T2 T3 T4 A 150 200 175 225 RT=0 WT=0 r1(A); RT=150 w1(A);
Abort; r4(A); RT=225 T3 could be allowed to read, Even though it is not the “current” value of A

Example T1 T2 T3 T4 A 150 200 175 225 RT=0 WT=0 r1(A); RT=150 w1(A);
Abort; r4(A); RT=225 Three versions of A T3 doesn’t have to abort, because it can read an earlier version of A

Timestamps vs Locking Locks TS Read vs Read/Write Commercial Systems
Great => many conflicts Poor => few conflicts Poor => many conflicts Great => few conflicts Read vs Read/Write Timestamp is superior in situations where either most transactions are read-only Locking is better for read/write environment Commercial Systems Scheduler divides the transactions into read-only transactions and read/write transactions Read/write transactions => using 2PL Read only transactions => using multiversion timestamping

Deadlocks Deadlock Detection and Prevention Techniques
Each set of transactions is waiting for a resource (e.g., lock) currently held by another transaction in the set None can make progress Detection and Prevention Techniques Timeout Waits-for Graph Ordering elements Timestamps-based

Deadlock Detection by Timeout
The idea At least one of the transactions need to be aborted and restarted (roll back) Releases its locks or other resources Put a limit on how long a transaction may be active, and if a transaction exceeds this time, roll it back

Deadlock Detection by Waits-For Graph
Structure A node for each transaction that currently holds any lock or waiting for one An edge from node T to U if there is some DB element A s.t.: U holds a lock on A, T is waiting for a lock on A, T cannot get desired lock on A unless U first releases it Detection& Prevention Detection of deadlock Occurrence of a cycle Prevention: Refuse to allow an action that creates a cycle Roll back any transaction that may cause a cycle

Roll back any transaction that causes a cycle
Example T1 T2 T3 T4 l1(A);r1(A); l2(C);r2(C); l3(B);r3(B); l4(D);r4(D); l2(A); Denied l3(C); Denied l4(A); Denied l1(B); Denied T1 l1(A); r1(A); l1(B); w1(B); u1(A); u1(B); T2 l2(C); r2(C); l2(A); w2(A); u2(C); u2(A); Roll back any transaction that causes a cycle T3 l3(B); r3(B); l3(C); w3(C); u3(B); u3(C); T4 l4(D); r4(D); l4(A); w4(A); u4(D); u4(A); T4 T3 T2 T1

Deadlock Detection by Timestamps
Limitations of Waits-for Graph Can be large Analyzing it for cycles each time a transaction has to wait for a lock can be time-consuming. Solution Policies Associate with each T a timestamp Only for deadlock detection Not same as the timestamp in concurrency control If T is rolled back then it starts with new, later concurrency timestamp, Timestamp for deadlock never changes Wait-Die Scheme Wound-Wait Scheme

Wait-Die Scheme Wait-Die Scheme
l1(A);r1(A); l2(A); Dies l3(B);r3(B); l4(A); Dies l3(C); w3(C); u3(B); u3(C); l1(B);w1(B); u1(A);u1(B); l4(A); l4(D); l2(A); Waits r4(D); w4(A); u4(A); u4(D); l2(A); l2(C); r2(C); w2(A); u2(A); u2(C); Wait-Die Scheme A transaction T is waiting for a lock that is held by transaction U If T is older than U (TS(T) < TS(U)) then T is allowed to wait If U is older than T, then T “dies”; it is rolled back T2 and T4 start again. T2 is still older than T4 Assume T4 restarts first TS(T1) < TS(T2) < TS(T3) < TS(T4)

T3 relinquishes its lock and rolls back
Wound-Wait Scheme T1 T2 T3 T4 l1(A);r1(A); l2(A); Waits l3(B);r3(B); l4(A); Waits l1(B);w1(B); Wounded u1(A);u1(B); l2(A); l2(C); r2(C); w2(A); u2(A); u2(C); l4(A); l4(D); r4(D); w4(A); u4(A); u4(D); l3(C); w3(C); u3(B); u3(C); Wound-Wait A transaction T is waiting for a lock that is held by transaction U If T is older than U (TS(T) < TS(U)) then it “wounds” U If U is older than T, then T waits for the lock(s) held by U T3 relinquishes its lock and rolls back T3 restarts If by the time the wound takes effect, U has already finished and released the locks => no roll back Wound is fatal U must roll back and relinquish to T the lock(s) that T needs from U

Comparison Commonality Differences
In both schemes, older transactions kill of newer transactions No starvation Guarantees every transaction eventually completes Wound-wait Rollback is rare Rolled back transactions have done more work by acquiring locks Wait-die Roll backs more transactions Rolled back transactions tend to do little work as they are still in lock-gathering stage Waits-for vs TS Waits-for graph minimizes the number of times we must abort a transaction due to deadlock TS-based schemes may roll back even when there is no deadlock

Relational Database Systems 4

Similar presentations

Presentation on theme: "Relational Database Systems 4"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Relational Database Systems 4

Similar presentations

Presentation on theme: "Relational Database Systems 4"— Presentation transcript:

Similar presentations

About project

Feedback