CSIS 7102 Spring 2004 Lecture 5 : Non-locking based concurrency control (and some more lock-based ones, too) Dr. King-Ip Lin.

Slides:



Advertisements
Similar presentations
CM20145 Concurrency Control
Advertisements

Concurrency Control III. General Overview Relational model - SQL Formal & commercial query languages Functional Dependencies Normalization Physical Design.
1 Shivnath Babu Concurrency Control (II) CS216: Data-Intensive Computing Systems.
Database Systems (資料庫系統)
Unit 9 Concurrency Control. 9-2 Wei-Pang Yang, Information Management, NDHU Content  9.1 Introduction  9.2 Locking Technique  9.3 Optimistic Concurrency.
Chapter 16 Concurrency. Topics in this Chapter Three Concurrency Problems Locking Deadlock Serializability Isolation Levels Intent Locking Dropping ACID.
1 Concurrency Control Chapter Conflict Serializable Schedules  Two actions are in conflict if  they operate on the same DB item,  they belong.
1 Lecture 11: Transactions: Concurrency. 2 Overview Transactions Concurrency Control Locking Transactions in SQL.
Transaction Management: Concurrency Control CS634 Class 17, Apr 7, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
Concurrency Control II
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 2: Enforcing Serializable Schedules Professor Chen Li.
Cs4432concurrency control1 CS4432: Database Systems II Lecture #23 Concurrency Control Professor Elke A. Rundensteiner.
Concurrency Control Enforcing Serializability by Locks
Cs4432concurrency control1 CS4432: Database Systems II Lecture #22 Concurrency Control: Locking-based Protocols Professor Elke A. Rundensteiner.
Concurrency Control Amol Deshpande CMSC424. Approach, Assumptions etc.. Approach  Guarantee conflict-serializability by allowing certain types of concurrency.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Concurrency Control Chapter 17 Sections
Presented by Dr. Greg Speegle.  Concurrency Control  Multiple Versions  Version number timestamp of writing transaction  Read last committed value.
Lecture 12 Transactions: Isolation. Transactions What’s hard? – ACID – Concurrency control – Recovery.
V. Megalooikonomou Concurrency Control – Deadlocks (based on slides by C. Faloutsos at CMU and on notes by Silberchatz,Korth, and Sudarshan) Temple University.
CSC271 Database Systems Lecture # 32.
Lecture 11 Recoverability. 2 Serializability identifies schedules that maintain database consistency, assuming no transaction fails. Could also examine.
1 Supplemental Notes: Practical Aspects of Transactions THIS MATERIAL IS OPTIONAL.
Sekolah Tinggi Ilmu Statistik (STIS) 1 Dr. Said Mirza Pahlevi, M.Eng.
Data and Database Administration Chapter 12. Outline What is Concurrency Control? Background Serializability  Locking mechanisms.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology April 1, 2004 CONCURRENCY CONTROL Lecture based on [GUW, ,
Session - 13 CONCURRENCY CONTROL CONCURRENCY TECHNIQUE Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
Transaction Processing: Concurrency and Serializability 10/4/05.
Transaction Management
Copyright © 2004 Pearson Education, Inc.. Chapter 18 Concurrency Control Techniques.
1 Concurrency Control. 2 Transactions A transaction is a list of actions. The actions are reads (written R T (O)) and writes (written W T (O)) of database.
1 IT420: Database Management and Organization Transactions 31 March 2006 Adina Crăiniceanu
Concurrency Control in Distributed Databases. By :- Rishikesh Mandvikar rmandvik[at]engr.smu.edu May 1, 2004.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 10 Transaction Management.
Distributed Transactions
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Module Coordinator Tan Szu Tak School of Information and Communication Technology, Politeknik Brunei Semester
1 Concurrency Control II: Locking and Isolation Levels.
Page 1 Concurrency Control Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
Chapter 20 Transaction Management Thomas Connolly, Carolyn Begg, Database System, A Practical Approach to Design Implementation and Management, 4 th Edition,
1 Concurrency Control Lecture 22 Ramakrishnan - Chapter 19.
Transaction Management Overview. Transactions Concurrent execution of user programs is essential for good DBMS performance. – Because disk accesses are.
6.830 Lecture 14 Two-phase Locking Recap Optimistic Concurrency Control 10/28/2015.
Timestamp-based Concurrency Control
Multidatabase Transaction Management COP5711. Multidatabase Transaction Management Outline Review - Transaction Processing Multidatabase Transaction Management.
Lecture 3 Concurrency control techniques
Concurrency Control Techniques
C. Faloutsos Concurrency control - deadlocks
Concurrency Control via Validation
Multi-User Databases Chapter 9.
Transaction Properties
Outline Introduction Background Distributed DBMS Architecture
11/29/2018.
Chapter 10 Transaction Management and Concurrency Control
Concurrency Control Chapter 17
Basic Two Phase Locking Protocol
6.830 Lecture 12 Transactions: Isolation
6.830 Lecture 14 Two-phase Locking Recap Optimistic Concurrency Control 10/28/2015.
ENFORCING SERIALIZABILITY BY LOCKS
Concurrency Control Chapter 17
Lecture 22: Intro to Transactions & Logging IV
Transaction management
Transactions and Concurrency
Temple University – CIS Dept. CIS661 – Principles of Data Management
Database Management System
Submitted to Dr. Badie Sartawi Submitted by Nizar Handal Course
CONCURRENCY Concurrency is the tendency for different tasks to happen at the same time in a system ( mostly interacting with each other ) .   Parallel.
Concurrency control (OCC and MVCC)
Database Systems (資料庫系統)
Database Systems (資料庫系統)
Presentation transcript:

CSIS 7102 Spring 2004 Lecture 5 : Non-locking based concurrency control (and some more lock-based ones, too) Dr. King-Ip Lin

Table of contents Limitation of locking techniques Timestamp ordering View serializability Optimistic concurrency control Graph-based locking Multi-version schemes

The story so far Two-phase locking (2PL) as a protocol to ensure conflict serializability Once a transaction start releasing locks, cannot obtain new locks Ensure that the conflict cannot go both direction Deadlock handling in 2PL The phantom problem Multi-granularity locking Intention locks Improving concurrency while maintaining correctness Levels of isolation Not every transaction need 2PL to be correct Ability to define which isolation level for a transaction to be run Enable even higher concurrency

Limitation of lock-based techniques Lock-based techniques ensure correctness However, it tends to be a bit “pessimistic” Some schedules that are serializable will not be allowed under the locking protocol.

Limitation of lock-based techniques Example: A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) A2 <- A2 + k Write(Y, A2) A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) Is this schedule serializable?

Limitation of lock-based techniques However, 2PL does not allow it A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) A2 <- A2 + k Write(Y, A2) Blocked (T1 already has X-lock); T2 cannot proceed A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2)

Limitation of lock-based techniques Why does 2PL block this operation? There is a conflict between T1 and T2 If we allow T2 to go on, there is a potential danger that T2 can finish before T1 resumes, which leads to a non-serializable schedule Thus, 2PL decide to “play safe”

Limitation of lock-based techniques But is 2PL “playing TOO safe”? A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) A2 <- A2 + k Write(Y, A2) Schedule may still be serializable if we allow this A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) Only if we allow this to go before T1 resume, then the schedule becomes unserializable

Limitation of lock-based techniques In some cases, 2PL is playing too safe Can we allow for more concurrency? (e.g. allow some conflicting operation to go ahead, until we can determine that a schedule is not serializable) One method: dynamically keep track of serializability graph Check before each operation to see if a cycle will appear Not practical A more practical approach: predefine allowable conflict operations, so that a cycle is never formed Timestamps

Timestamp ordering Timestamp (TS): a number associated with each transaction Not necessarily real time Can be assigned by a logical counter Unique for each transaction Should be assigned in an increasing order for each new transaction

Timestamp ordering Timestamps associated with each database item Read timestamp (RTS) : the largest timestamp of the transactions that read the item so far Write timestamp (WTS) : the largest timestamp of the transactions that write the item so far After each successful read/write of object O by transaction T the timestamp is updated RTS(O) = max(RTS(O), TS(T)) WTS(O) = max(WTS(O), TS(T))

Timestamp ordering Given a transaction T If T wants to read(X) If TS(T) < WTS(X) then read is rejected, T has to abort Else, read is accepted and RTS(X) updated. Why is RTS(X) not checked? For a write-read conflict, which direction does this protocol allow?

Timestamp ordering If T wants to write(X) If TS(T) < RTS(X) then write is rejected, T has to abort If TS(T) < WTS(X) then write is rejected, T has to abort Else, allow the write, and update WTS(X) accordingly For a read-write/write-write conflict, which direction does this protocol allow?

Timestamp ordering -- example Consider the two transactions A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) A2 <- A2 + k Write(Y, A2) A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) T1 (TS = 10) T2 (TS = 20) Initially all RTS and WTS = 0

Timestamp ordering -- example Consider the following schedule TS(T1) > WTS(X) = 0, read allowed; RTS(X)  10 A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) A2 <- A2 + k Write(Y, A2) A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) TS(T1) > WTS(X) = 0; TS(T1) = RTS(X) = 10; write allowed; WTS(X)  10 RTS(X) : WTS(X) : RTS(Y) : WTS(Y) : 10 10 T2 (TS = 20) T1 (TS = 10)

Timestamp ordering -- example Consider the following schedule TS(T2) > WTS(X) = 10, read allowed; RTS(X)  20 A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) A2 <- A2 + k Write(Y, A2) A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) TS(T2) = RTS(X) = 20 TS(T2) > WTS(X) = 10, write allowed; WTS(X)  20 RTS(X) : WTS(X) : RTS(Y) : WTS(Y) : 20 10 20 T2 (TS = 20) T1 (TS = 10)

Timestamp ordering -- example Consider the following schedule A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) A2 <- A2 + k Write(Y, A2) A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) RTS(X) : WTS(X) : RTS(Y) : WTS(Y) : 20 10 Similarly, at the end of this step T2 (TS = 20) T1 (TS = 10)

Timestamp ordering -- example Consider the following schedule A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) A2 <- A2 + k Write(Y, A2) A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) TS(T2) > WTS(Y) = 10, read allowed; RTS(Y)  20 RTS(X) : WTS(X) : RTS(Y) : WTS(Y) : 20 10 20 T2 (TS = 20) T1 (TS = 10) TS(T2) = RTS(Y) = 20 TS(T2) > WTS(Y) = 10, write allowed; WTS(Y)  20

Timestamp ordering -- example Now,consider the following schedule TS(T1) > WTS(X) = 0, read allowed; RTS(X)  10 A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) A2 <- A2 + k Write(Y, A2) A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) TS(T1) > WTS(X) = 0; TS(T1) = RTS(X) = 10; write allowed; WTS(X)  10 RTS(X) : WTS(X) : RTS(Y) : WTS(Y) : 10 10 T2 (TS = 20) T1 (TS = 10)

Timestamp ordering -- example Consider the following schedule TS(T2) > WTS(X) = 10, read allowed; RTS(X)  20 A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) A2 <- A2 + k Write(Y, A2) A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) TS(T2) = RTS(X) = 20 TS(T2) > WTS(X) = 10, write allowed; WTS(X)  20 RTS(X) : WTS(X) : RTS(Y) : WTS(Y) : 20 10 20 T2 (TS = 20) T1 (TS = 10)

Timestamp ordering -- example Consider the following schedule A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) A2 <- A2 + k Write(Y, A2) A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) TS(T2) > WTS(Y) = 0, read allowed; RTS(Y)  20 TS(T2) = RTS(Y) = 20 TS(T2) > WTS(Y) = 0, write allowed; WTS(X)  20 RTS(X) : WTS(X) : RTS(Y) : WTS(Y) : 20 20 T2 (TS = 20) T1 (TS = 10)

Timestamp ordering -- example Consider the following schedule A1 <- Read(X) A1 <- A1 – k Write(X, A1) A2 <- Read(Y) A2 <- A2 + k Write(Y, A2) A1 <- Read(X) A1 <- A1* 1.01 Write(X, A1) A2 <- Read(Y) A2 <- A2 * 1.01 Write(Y, A2) RTS(X) : WTS(X) : RTS(Y) : WTS(Y) : 20 TS(T1) < WTS(Y) = 20, read rejected; T1 aborts! T2 (TS = 20) T1 (TS = 10)

Timestamp ordering Thus, in timestamp ordering, conflicts are allowed from transactions with smaller timestamps to larger timestamps In other words, serializability graph will have only this kind of edges Thus, no cycles transaction with smaller timestamp with larger

Timestamp ordering – good & bad Advantages of timestamp ordering No waiting for transaction Thus, no deadlocks Disadvantages Schedule may not be recoverable (see previous example) Why? Long transaction may be aborted more often

Timestamp ordering – overcoming disadvantages Solution for recoverability Forcing all writes at the end of transactions; as well as making writes atomic (no other transaction can access any written item until all are written) Block (only) reading of dirty items (using locks) Use idea of commit dependency (discussed later) Solution for starvation Assign new timestamp for aborted transaction Temporary block short transactions to allow long transaction to go on (tricky to implement)

Locks -- implementation Various support need to implement locking OS support – lock(X) must be an atomic operation in the OS level i.e. support for critical sections Implementation of read(X)/write(X) – automatically add code for locking Lock manager – module to handle and keep track of locks

Thomas’ write rule Write-write conflict may be acceptable in many cases Suppose T1 do a write(X) and then T2 do a write(X) and there is no transaction accessing X in between Then T2 only overwrite a value that is never being used In such case, it can be argued that such a write is acceptable

Thomas’ write rule In timestamp ordering, it is referred as the Thomas write rule: If a transaction T issue a write(X): If TS(T) < RTS(X) then write is rejected, T has to abort Else If TS(T) < WTS(X) then write is ignored Else, allow the write, and update WTS(X) accordingly A schedule allowed by Thomas write rule may not be conflict serializable, but is known to be view serializable.

View serializability Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if the following three conditions are met: 1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then transaction Ti must, in schedule S´, also read the initial value of Q. 2. For each data item Q if transaction Ti executes read(Q) in schedule S, and that value was produced by transaction Tj (if any), then transaction Ti must in schedule S´ also read the value of Q that was produced by transaction Tj . 3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule S must perform the final write(Q) operation in schedule S´.

View serializability View equivalence is also based purely on reads and writes alone. Roughly speaking, for two view equivalent schedules, each corresponding read(X) read the same value (including initial read) Strictly speaking, it is stronger, as it is required to be the value produced by the same transaction The final value of each X has to be written by the same corresponding transaction(s)

View serializability A schedule is view serializable if it is view equivalent to a serial schedule Conflict serializable  view serializable But NOT vice versa This schedule is view serializable to the schedule (T1, T2, T3) but not conflict serializable (R-W conflict T1->T2, W-W conflict T2->T1) Read(X) Write(X) T1 T2 T3

View serializability Blind writes: writes that write values not based on previous reads View serializability = conflict serializability + blind writes Currently, view serializability is not very practical Determining whether a schedule is view serializable is NP-complete Read(X) Write(X) T1 T2 T3 Blind writes

Optimistic concurrency control Timestamp ordering is more optimistic then 2PL It does not block operation Enable conflict in one direction to proceed immediately It still has limitation Need care to handle recoverability Overhead in maintain timestamps (and space) It is still a waste of time if we have very few conflicts Can we be even more optimistic

Optimistic concurrency control Most optimistic point-of-view: Assume no problem and let transaction execute But before commit, do a final check Only when a problem is discovered, then one aborts Basis for optimistic concurrency control

Optimistic concurrency control Each transaction T is divided into 3 phases: Read and execution: T reads from the database and execute. However, T only writes to temporary location (not to the database iteself) Validation: T checks whether there is conflict with other transaction, abort if necessary Write : T actually write the values in temporary location to the database Each transaction must follow the same order

Optimistic concurrency control Each transaction T is given 3 timestamps: Start(T): when the transaction starts Validation(T): when the transaction enters the validation phase Finish(T) : when the transaction finishes Goal: to ensure the transaction following a serial schedule based on Validation(T)

Optimistic concurrency control Given two transaction T1 and T2 and Validation(T1) < Validation(T2) Case 1 : Finish(T1) < Start(T2) Read Valid Write T1 : Start(T1) Valid(T1) Finish(T1) Read Valid Write T2 : Start(T2) Valid(T2) Finish(T2) Time Here, no problem of serializability

Optimistic concurrency control Case 2 : Finish(T1) < Validation(T2) Read Valid Write T1 : Start(T1) Valid(T1) Finish(T1) Potential conflict Read Valid Write Start(T2) Valid(T2) Finish(T2) T2 : Time If T2 does not read anything T1 writes, then no problem

Optimistic concurrency control Case 3 : Validation(T2) < Finish(T1) Read Valid Write T1 : Start(T1) Valid(T1) Finish(T1) Potential conflict Read Valid Write Start(T2) Valid(T2) Finish(T2) T2 : Time If T2 does not read or writes anything T1 writes, then no problem

Optimistic concurrency control For any transaction T, check for all transaction T’ such that Validation(T’) < Validation(T) that If Finish(T’) > Start(T) then if T reads any element that T’ writes, then abort If Finish(T’) > Validation(T) then if T writes any element that T’ writes, then abort Otherwise, commit

Optimistic concurrency control Advantages: No blocking No overhead during execution Do have overhead for validation No cascade rollbacks (why?) Disadvantages: Potential starvation for long transaction Large amount of aborts if high concurrency

Graph-based locking 2 phased locking make no assumption about behavior of transactions If we have some assumptions/knowledge about how data is accessed, we can make use of it to find more efficient/optimistic locking techniques

Graph-based locking Suppose we make the following assumptions There is an partial ordering of the database items such that if X < Y, then a transaction must access X before it access Y (regardless whether the transaction uses X or not) The graph formed by the partial order is a tree Only X-locks are allowed

Graph-based locking A transaction T must follow the following rules The first lock by T can be of any item After that, an item X can be locked only when T has a lock on the parent of X Unlock can be done at anytime, but... … once an item is unlocked, it cannot be relocked

Graph-based locking Example of valid actions: Lock(B), Lock(E), Lock(D), Unlock(B), Unlock(E), Lock(G),Unlock(D), Unlock(G) Lock(D), Lock(H), Unlock(D), Unlock(H)

Graph-based locking Advantages Disadvantages No deadlocks No need to be 2-phase Earlier release on locks, thus higher concurrency Disadvantages One may have to lock things that it does not need Example, from last slide, if T needs D and J, then it must lock H also. Schedule may be unrecoverable

Graph-based locking Solution for non-recoverability Hold X-locks until end of transaction But reduce concurrency significantly If one can tolerate cascade aborts, then use notion of commit dependency For every item that is written (but not yet committed) record the transaction T that perform the write If a transaction T’ read such data, then we declare T’ has a commit dependency on T T’ cannot commit until T commits T’ must abort if T aborts.

Multi-version schemes Consider a write-read conflict in a 2PL scheme T1 obtained a X-lock on an item, and T2 has to wait Why T2 wait? Potential conflict that goes both ways Unsure of whether the value written by T1 is trustworthy (as T1 has not committed yet) What if we kept the old values of the item so that T2 can choose the appropriate version of the values to read?  multi-version concurrency control

Multi-version timestamp ordering Each data item Q has a sequence of versions <Q1, Q2,...., Qm>. Each version Qk contains three data fields: Content -- the value of version Qk. W-timestamp(Qk) -- timestamp of the transaction that created (wrote) version Qk R-timestamp(Qk) -- largest timestamp of a transaction that successfully read version Qk when a transaction Ti creates a new version Qk of Q, Qk's W-timestamp and R-timestamp are initialized to TS(Ti). R-timestamp of Qk is updated whenever a transaction Tj reads Qk, and TS(Tj) > R-timestamp(Qk).

Multi-version timestamp ordering Suppose that transaction Ti issues a read(Q) or write(Q) operation. Let Qk denote the version of Q whose write timestamp is the largest write timestamp less than or equal to TS(Ti). If transaction Ti issues a read(Q), then the value returned is the content of version Qk. If transaction Ti issues a write(Q), and if TS(Ti) < R- timestamp(Qk), then transaction Ti is rolled back. Otherwise, if TS(Ti) = W-timestamp(Qk), the contents of Qk are overwritten, otherwise a new version of Q is created. Reads always succeed; a write by Ti is rejected if some other transaction Tj that (in the serialization order defined by the timestamp values) should read Ti's write, has already read a version created by a transaction older than Ti.