CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 11 Instructor: Haifeng YU.

Slides:



Advertisements
Similar presentations
CM20145 Concurrency Control
Advertisements

1 Shivnath Babu Concurrency Control (II) CS216: Data-Intensive Computing Systems.
Database Systems (資料庫系統)
1 Concurrency Control Chapter Conflict Serializable Schedules  Two actions are in conflict if  they operate on the same DB item,  they belong.
1 Lecture 11: Transactions: Concurrency. 2 Overview Transactions Concurrency Control Locking Transactions in SQL.
Transaction Management: Concurrency Control CS634 Class 17, Apr 7, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database System Principles 18.7 Tree Locking Protocol CS257 Section 1 Spring 2012 Dhruv Jalota ID: 115.
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
Chapter 15: Transactions Transaction Concept Transaction Concept Concurrent Executions Concurrent Executions Serializability Serializability Testing for.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.
Cs4432concurrency control1 CS4432: Database Systems II Lecture #23 Concurrency Control Professor Elke A. Rundensteiner.
Cs4432concurrency control1 CS4432: Database Systems II Lecture #22 Concurrency Control: Locking-based Protocols Professor Elke A. Rundensteiner.
Lock-Based Concurrency Control
Lecture 11 Recoverability. 2 Serializability identifies schedules that maintain database consistency, assuming no transaction fails. Could also examine.
Quick Review of Apr 29 material
Concurrent Transactions Even when there is no “failure,” several transactions can interact to turn a consistent state into an inconsistent state.
Introduction to Computability Theory
Concurrency Control and Recovery In real life: users access the database concurrently, and systems crash. Concurrent access to the database also improves.
©Silberschatz, Korth and Sudarshan16.1Database System Concepts 3 rd Edition Chapter 16: Concurrency Control Lock-Based Protocols Timestamp-Based Protocols.
Distributed Systems Fall 2010 Transactions and concurrency control.
Granularity of Locks and Degrees of Consistency in a Shared Data Base John LaFontaine Haixuan Sun.
CS 582 / CMPE 481 Distributed Systems Concurrency Control.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Quick Review of May 1 material Concurrent Execution and Serializability –inconsistent concurrent schedules –transaction conflicts serializable == conflict.
Transaction Management and Concurrency Control
©Silberschatz, Korth and Sudarshan15.1Database System ConceptsTransactions Transaction Concept Transaction State Implementation of Atomicity and Durability.
Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it.
Transaction Processing: Concurrency and Serializability 10/4/05.
Transaction Management
Transactions or Concurrency Control. Introduction A program which operates on a DB performs 2 kinds of operations: –Access to the Database (Read/Write)
Transactions. Definitions Transaction (program): A series of Read/Write operations on items in a Database. Example: Transaction 1 Read(C) Read(A) Write(A)
Concurrency. Correctness Principle A transaction is atomic -- all or none property. If it executes partly, an invalid state is likely to result. A transaction,
Database Management Systems I Alex Coman, Winter 2006
1 Concurrency Control. 2 Transactions A transaction is a list of actions. The actions are reads (written R T (O)) and writes (written W T (O)) of database.
18.7 The Tree Protocol Andy Yang. Outline Introduction Motivation Rules for Access to Tree-Structured Data Why the Tree Protocol Works.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
CIS 720 Concurrency Control. Locking Atomic statement –Can be used to perform two or more updates atomically Th1: …. ;……. Th2:…………. ;…….
Jim Anderson Comp 122, Fall 2003 Single-source SPs - 1 Chapter 24: Single-Source Shortest Paths Given: A single source vertex in a weighted, directed graph.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
CS 162 Discussion Section Week 9 11/11 – 11/15. Today’s Section ●Project discussion (5 min) ●Quiz (10 min) ●Lecture Review (20 min) ●Worksheet and Discussion.
Concurrency Control Lectured by, Jesmin Akhter, Assistant professor, IIT, JU.
Chapter 11 Concurrency Control. Lock-Based Protocols  A lock is a mechanism to control concurrent access to a data item  Data items can be locked in.
Concurrency control In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it is possible.
Concurrency Server accesses data on behalf of client – series of operations is a transaction – transactions are atomic Several clients may invoke transactions.
Chapter 16 Recovery Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 24: Single-Source Shortest Paths Given: A single source vertex in a weighted, directed graph. Want to compute a shortest path for each possible.
©Silberschatz, Korth and Sudarshan15.1Database System Concepts Chapter 15: Transactions Transaction Concept Transaction State Implementation of Atomicity.
II.I Selected Database Issues: 2 - Transaction ManagementSlide 1/20 1 II. Selected Database Issues Part 2: Transaction Management Lecture 4 Lecturer: Chris.
Optimistic Methods for Concurrency Control By: H.T. Kung and John Robinson Presented by: Frederick Ramirez.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 5 Instructor: Haifeng YU.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.
Multidatabase Transaction Management COP5711. Multidatabase Transaction Management Outline Review - Transaction Processing Multidatabase Transaction Management.
Lecture 9- Concurrency Control (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Program Correctness. The designer of a distributed system has the responsibility of certifying the correctness of the system before users start using.
1 Controlled concurrency Now we start looking at what kind of concurrency we should allow We first look at uncontrolled concurrency and see what happens.
Jinze Liu. ACID Atomicity: TX’s are either completely done or not done at all Consistency: TX’s should leave the database in a consistent state Isolation:
Distributed Transactions What is a transaction? (A sequence of server operations that must be carried out atomically ) ACID properties - what are these.
Chapter 13 Managing Transactions and Concurrency Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
6/18/2016Transactional Information Systems3-1 Part II: Concurrency Control 3 Concurrency Control: Notions of Correctness for the Page Model 4 Concurrency.
Transaction Management
Transaction Management and Concurrency Control
Concurrency Control.
Transactions.
Chapter 15 : Concurrency Control
Distributed Database Management Systems
Distributed Transactions
Lecture 22: Intro to Transactions & Logging IV
Transaction management
CPSC-608 Database Systems
Presentation transcript:

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 11 Instructor: Haifeng YU

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 22 Today’s Roadmap  Back to parallel systems  Some simplified exploration on concurrency control in database systems  Every database is a parallel system   Define “sequential consistency” in databases: Serializability  Two phase locking protocol to ensure serializability  Define “linearizability” in databases: External consistency  Two phase locking ensures external consistency as well

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 23 Database is Just an Abstract Data Type  Abstract data type: A piece of data with allowed operations on the data  Integer X, read(), write()  Stack, push(), pop()  By definition, a database is a shared abstract data type  Accessed by multiple users  Processes may perform various operations (called transactions) on the database  Database consistency specifies what behavior is allowed when it is accesses by multiple processes

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 24 Transactions  Operations are called transactions in database context  There can be infinite numbers of different kinds of transactions (database is more flexible than for example, a stack!)  Each transaction may contain  StartTransaction();  CommitTransaction();  AbortTransaction();  Read(x);  Write(y, value);  The term operation in the textbook refers to Read() and Write(). To avoid confusion, we will call them primitive operations. StartTransaction(); seatBooked = false; read(number of available seats on a flight); if (number > 0) { number--; write back number; seatBooked = true; } CommitTransaction(); An Example Transaction

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 25 The Scheduler and Concurrency Control Scheduler Database Transaction1 Transaction2 Transaction3 Read/Write Start/Commit /Abort/Read/ Write The job of the scheduler is concurrency control (i.e., ensuring the consistency of the database when it is accessed by multiple processes)

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 26 The Scheduler and Concurrency Control Scheduler Database Transaction1 Transaction2 Transaction3 Read/Write Start/Commit /Abort/Read/ Write Scheduler itself is multi- threaded. May submit reads/writes to database in parallel. We assume that the database ensures sequential consistency for these reads/writes.

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 27 Carry Over Definitions from Lecture 3  A history H is a sequence of invocations and responses of transactions ordered by wall clock time  Sequential history  Legal sequential history  Equivalency between two histories  Process order  A history H is sequentially consistent if it is equivalent to some legal sequential history S that preserves process order

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 28 Serializability and Sequential Consistency  Most databases uses serializability as the definition of consistency – A customized version of sequential consistency specially designed for databases  Same as sequential consistency except the following caveats  Caveat 1:  When defining serializability, we assume that all transactions are from different processes (no process issues two transactions)  What does it mean: Process order is empty  Why reasonable: In DB applications, this is usually the case  Why helpful: Simplifies the design of the scheduler and give it more flexibility to improve performance  Corner cases: A user issues two transactions sequentially to the database, the second transaction may not see the effects of the first.  This does not violate serializability but most implementations of the scheduler will not have such behavior

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 29 Serializability and Sequential Consistency  Caveat 2:  In sequential consistency, each operation is executed by a single process (each operation is sequential)  Transactions are complex enough that we should allow parallel reads/writes in a transaction (as in the book).  Each transaction is itself a parallel system!  But we will assume here that each transaction is sequential for this lecture (makes no significant difference in terms of the results)  Read the book if you are interested in extended to parallel transactions

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 210 Serializability and Sequential Consistency  Caveat 3: Definition of equivalency  Two histories are equivalent if they have the same set of events  Same events imply all responses are the same  For transactions, responses include all the values written into the database in the transaction and all the values output to the user  Transactions may be so complex that we cannot easily make the judgment: Consider the following transaction UpdateX() { StartTransaction(); tmp = Read(X); tmp = (4*tmp^2 + 5*tmp +1) Write(X, tmp); CommitTransaction(); }

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 211 process 0process 1 tmp = Read(X); (1) tmp = 4tmp^2+5tmp+1 tmp = Read(X); (1) tmp = 4tmp^2+5tmp+1 Write(X, tmp); (10) initially x = 1; A legal sequential history will have a final x value of 451. The history on the right is not sequentially consistent. process 0process 1 tmp = Read(X); (-0.5) tmp = 4tmp^2+5tmp+1 tmp = Read(X); (-0.5) tmp = 4tmp^2+5tmp+1 Write(X, tmp); (-0.5) Initially x = -0.5; A legal sequential history will have a final x value of (- 0.5 is the root of the equation tmp = 4tmp^2+5tmp+1) The history on the right is sequentially consistent. Whether it is sequentially consistent depends on the value of x (and insights of the code!)

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 212 Serializability and Sequential Consistency  Caveat 3: Definition of equivalency  Schedulers are not as smart as we are to figure that out  So we are going to be more pessimistic and define conflict equivalency  Two primitive operations are conflicting if:  They are both writes are they write the same data item  One is read and the other is write, and they read/write the same data item  Two histories H and H’ are conflict equivalent iff  They contain the same set of transactions  For any two conflicting primitive operations p1 and p2, p1 is before p2 in H  p1 is before p2 in H’  Conflict equivalency  equivalency (assuming transactions are deterministic) (why?)  The reverse is not true (by the earlier example)

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 213 Serializability  A history H is serializable if it is conflict equivalent to some legal sequential history S  (For comparison: A history H is sequentially consistent if it is equivalent to some legal sequential history S that preserves process order.)  Different from linearizability: Serializability does not need to preserve operation partial order.  A later transaction may not see the effects of an earlier transaction.  Possible in most commercial databases.  But the chance is small due to the actual way of implementing the scheduler. (You actually need to spend some effort to increase such chance.)

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 214 Serialization Graph  A serialization graph SG(H) of a history H is a directed graph where:  Each transaction is a vertex in the graph  A directed edge from W to V exists iff W has a primitive operation p1 and V has a primitive operation p2 where p1 is before p2 in H and p1 and p2 conflict  Example history:  R(x)(by T1) R(x)(by T2) W(x)(by T1) W(y) (by T2) W(y)(by T1) R(x)(by T3) W(x)(byT3) T1T2 T3 SG(H) may or may not be transitive

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 215 Serializability Theorem  Theorem: A history H is serializable iff SG(H) is acyclic.  If SG(H) is acyclic, then H is serializable:  Without loss of generality, let T1 T2 … be the topological sorting of the vertices in SG(H). Let S be the sequential history obtained by executing T1 T2 … sequentially. By definition, S is a legal sequential history. We need to show H is conflict equivalent to S.  Prove by contradiction. Assume H is not, then there exist W (containing primitive operation p1) and V (containing primitive operation p2) where p1 and p2 are ordered differently in H and S. Without loss of generality, suppose p1 is before p2 in H. Then there must be an edge from W to V in SG(H) and p1 will be before p2 in S as well. Contradiction.

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 216 Serializability Theorem  Theorem: A history H is serializable iff SG(H) is acyclic.  If H is serializable then SG(H) is acyclic:  Prove by contradiction and assume that SG(H) has a cycle of T1 T2 …Tk T1. History H is conflict equivalent to some sequential history S. Because T1 has an edge to T2 in SG(H), it means T1 has an operation p1 and T2 has an operation p2 where p1 and p2 conflicts and p1 is before p2 in H. Since S is conflict equivalent to H, p1 must be before p2 as well. Since S is a serial history, T1 must be before T2 in S.  By same arguments, T2 is before T3 in S, T3 is before T4 in S, …. Tk is before T1 in S. This is impossible, however, because S is a serial history.

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 217 Serialization Graph and Theorem  Serialization graph gives us a systematic way to determine whether a history is serializable  Determination can always be done in polynomial number of steps  But for sequential history:  We did not have a systematic way  In some case, we have to enumerate all serial histories to compare – exponential number of steps  Why we did not discuss these before for sequential consistency?  Can you derive a similar theorem for sequential consistency?  So we can always make the determination in polynomial number of steps

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 218 Ensuring Serializability: All About Performance  The scheduler can protect the entire database using a single critical section  Essentially produces a sequential history  Not efficient – readers (i.e. query transactions) should be able to access the database concurrently  The scheduler can protect the entire database using a Reader/Writer lock (c.f. the Reader/Writer problem in Lecture 2)  Query transactions obtain reader lock  Update transactions obtain writer lock  But databases are large and each transaction only touches a small portion

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 219 Ensuring Serializability: All About Performance  Partition the database and use separate reader/writer locks for each partition  In the extreme, each partition is a data item AcquireReaderLock(x); AcquireWriterLock(y); Read(x); do some computation; Write(y, value); ReleaseReaderLock(x); ReleaseWriterLock(y); Locking individual data items for a transaction

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 220 Ensuring Serializability: All About Performance AcquireReaderLock(x); AcquireWriterLock(y); Read(x); do some computation; Write(y, value); ReleaseReaderLock(x); ReleaseWriterLock(y); Locking individual data items for a transaction  But the performance is still not very good  We may overestimate the set of data items that a transaction needs to access  We hold the locks for too long (imagine that the computation is solving some time-consuming problem

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 221 Ensuring Serializability: All About Performance AcquireReaderLock(x); Read(x); ReleaseReaderLock(x); do some computation; AcquireWriterLock(y); Write(y, value); ReleaseWriterLock(y);  Lock the data items only when we use them  This won’t work (even intuitively) process 0process 1 AcquireReaderLock(x); Read(x); ReleaseReaderLock(x); AcquireWriterLock(x); Write(x); ReleaseWriterLock(x); do some computation; AcquireWriterLock(y); Write(y); ReleaseWriterLock(y); AcquireWriterLock(y); Write(y); ReleaseWriterLock(y);

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 222 Ensuring Serializability: All About Performance  Prove that the history is not serialiazable using the serialization theorem  It is impossible here to prove that it is not sequentially consistent process 0process 1 AcquireReaderLock(x); Read(x); ReleaseReaderLock(x); AcquireWriterLock(x); Write(x); ReleaseWriterLock(x); do some computation; AcquireWriterLock(y); Write(y); ReleaseWriterLock(y); AcquireWriterLock(y); Write(y); ReleaseWriterLock(y);

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 223  Two-phase locking:  A transaction must acquire lock for data item v before reading or writing v  A transaction cannot obtain any further locks once it releases any lock  Growing phase following by shrinking phase  May result in deadlock  Side note: A transaction may “upgrade” a reader lock to a writer lock. This is considered new lock acquire as well.  In the previous example, process 0 will not release the lock on x until the end. A Widely Used Protocol: Two-phase Locking

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 224 Correctness of Two-phase Locking  Lemma 1: Let H be a history produced by two-phase locking. Suppose that SG(H) contains an edge from W to V. Then there exists some data item x such that W unlocks x before V locks x in H.  Proof: By definition of SG(H), if there is an edge from W to V, it means that there exist two primitive operations p1 (in W) and p2 (in V) such that they are conflicting and p1 is before p2 in H. Let x be the data item that p1 and p2 read or write.  By two-phase locking rule, W needs to lock x before p1 occurs and V needs to lock x before p2 occurs. The only possibility is that V locks x after W unlocks x.

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 225 Correctness of Two-phase Locking  Lemma 2: Let H be a history produced by two-phase locking. Suppose that SG(H) contains the path T_1  T_2  …  T_n. Then there exist data items x and y (x and y do not need to be distinct) such that T_1 unlocks x before T_n locks y in H.  Proof: Use an induction on n. Lemma 1 proves the case for n = 2. Assume the lemma hold for n-1 and we will prove it stills hold for n.  By the inductive assumption, we know that there exist x and z such that T_1 unlocks x before T_{n-1} locks z in H. Because there is an edge from T_{n-1} to T_n in SG(H), Lemma 1 tells us that we can find a data item y such that T_{n-1} unlocks y before T_n locks y.  By two-phase locking rule, T_{n-1} can only unlock y after it locks z. (key step!) Thus T_1 unlocks x before T_n locks y.

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 226 Correctness of Two-phase Locking  Theorem: Every history H generated by two-phase locking is serializable.  Prove by contradiction and assume H is not. Then SG(H) contains a cycle T_1  T_2  …  T_n  T_1. By Lemma 2, we can find data items x and y such that T_1 unlocks x before T_1 locks y. By this violates two-phase locking rule.

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 227 Linearizability in Databases  (From Lecture 3) A history H is linearizable if 1. It is equivalent to some legal sequential history S, and 2. The operation partial order induced by H is a subset of the operation partial order induced by S  Same as for sequential consistency, we will customize the definition for database context  Caveat 1: Assume that all transactions are from different processes  Caveat 2: Transactions may be parallel (we do not consider these)  Caveat 3: Conflict equivalent instead of equivalent

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 228 Linearizability in Databases  For databases, linearizability is sometime called external consistency. A history is externally consistent if: 1. It is conflict equivalent to some legal sequential history S, and 2. The operation partial order induced by H is a subset of the operation partial order induced by S  Two-phase locking actually ensures external consistency  C.f. slide 8, “most implementations of the scheduler will not have such behavior” that violates external order

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 229 Two-Phase Locking Preserves External Consistency  Theorem: Any history H generated by two-phase locking is externally consistent.  Proof: For each transaction T in H, we define its linearization point to be the time immediately after it acquires the last lock. Obviously, by two-phase locking rule, T has not released any locks at its linearization point. (This is where we leverage the two-phase locking property.) We construct a legal sequential history S to be all transactions ordered by their linearization points.

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 230 Two-Phase Locking Preserves External Consistency  Claim 1: The operation (transaction) partial order induced by H is a subset of the operation (transaction) partial order induced by S  Proof: Suppose W  V belongs to the transaction partial order induced by H. This means that W finishes before V starts. Obviously W finishes acquiring all locks before V finishes acquiring all locks. Thus W’s serialization point is before V’s, and W is before V in S.

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 231 Two-Phase Locking Preserves External Consistency  Claim 2: H is conflict equivalent to S.  H and S contain the same set of transactions (obvious)  For any two conflicting primitive operations p1 and p2, p1 is before p2 in H  p1 is before p2 in S  Proof: It is sufficient to prove that p1 is before p2 in H  p1 is before p2 in S (why?)  Let x be the data item accessed by both p1 and p2. Let W be the transaction containing p1 and V be the transaction containing p2. Because p1 is before p2 in H, W must unlock x before V locks x. We have:  W will be before V in S, and thus p1 will be before p2 in S W’s serialization pointW unlocks xV’s serialization pointV locks x X XX X

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 232 Summary  Define “sequential consistency” in databases: Serializability  Two phase locking protocol to ensure serializability  Define “linearizability” in databases: External consistency  Two phase locking ensures external consistency as well