Download presentation
Presentation is loading. Please wait.
Published byPatricia Henry Modified over 9 years ago
1
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 11 Instructor: Haifeng YU
2
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 22 Today’s Roadmap Back to parallel systems Some simplified exploration on concurrency control in database systems Every database is a parallel system http://research.microsoft.com/~philbe/ccontrol/ Define “sequential consistency” in databases: Serializability Two phase locking protocol to ensure serializability Define “linearizability” in databases: External consistency Two phase locking ensures external consistency as well
3
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 23 Database is Just an Abstract Data Type Abstract data type: A piece of data with allowed operations on the data Integer X, read(), write() Stack, push(), pop() By definition, a database is a shared abstract data type Accessed by multiple users Processes may perform various operations (called transactions) on the database Database consistency specifies what behavior is allowed when it is accesses by multiple processes
4
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 24 Transactions Operations are called transactions in database context There can be infinite numbers of different kinds of transactions (database is more flexible than for example, a stack!) Each transaction may contain StartTransaction(); CommitTransaction(); AbortTransaction(); Read(x); Write(y, value); The term operation in the textbook refers to Read() and Write(). To avoid confusion, we will call them primitive operations. StartTransaction(); seatBooked = false; read(number of available seats on a flight); if (number > 0) { number--; write back number; seatBooked = true; } CommitTransaction(); An Example Transaction
5
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 25 The Scheduler and Concurrency Control Scheduler Database Transaction1 Transaction2 Transaction3 Read/Write Start/Commit /Abort/Read/ Write The job of the scheduler is concurrency control (i.e., ensuring the consistency of the database when it is accessed by multiple processes)
6
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 26 The Scheduler and Concurrency Control Scheduler Database Transaction1 Transaction2 Transaction3 Read/Write Start/Commit /Abort/Read/ Write Scheduler itself is multi- threaded. May submit reads/writes to database in parallel. We assume that the database ensures sequential consistency for these reads/writes.
7
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 27 Carry Over Definitions from Lecture 3 A history H is a sequence of invocations and responses of transactions ordered by wall clock time Sequential history Legal sequential history Equivalency between two histories Process order A history H is sequentially consistent if it is equivalent to some legal sequential history S that preserves process order
8
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 28 Serializability and Sequential Consistency Most databases uses serializability as the definition of consistency – A customized version of sequential consistency specially designed for databases Same as sequential consistency except the following caveats Caveat 1: When defining serializability, we assume that all transactions are from different processes (no process issues two transactions) What does it mean: Process order is empty Why reasonable: In DB applications, this is usually the case Why helpful: Simplifies the design of the scheduler and give it more flexibility to improve performance Corner cases: A user issues two transactions sequentially to the database, the second transaction may not see the effects of the first. This does not violate serializability but most implementations of the scheduler will not have such behavior
9
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 29 Serializability and Sequential Consistency Caveat 2: In sequential consistency, each operation is executed by a single process (each operation is sequential) Transactions are complex enough that we should allow parallel reads/writes in a transaction (as in the book). Each transaction is itself a parallel system! But we will assume here that each transaction is sequential for this lecture (makes no significant difference in terms of the results) Read the book if you are interested in extended to parallel transactions
10
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 210 Serializability and Sequential Consistency Caveat 3: Definition of equivalency Two histories are equivalent if they have the same set of events Same events imply all responses are the same For transactions, responses include all the values written into the database in the transaction and all the values output to the user Transactions may be so complex that we cannot easily make the judgment: Consider the following transaction UpdateX() { StartTransaction(); tmp = Read(X); tmp = (4*tmp^2 + 5*tmp +1) Write(X, tmp); CommitTransaction(); }
11
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 211 process 0process 1 tmp = Read(X); (1) tmp = 4tmp^2+5tmp+1 tmp = Read(X); (1) tmp = 4tmp^2+5tmp+1 Write(X, tmp); (10) initially x = 1; A legal sequential history will have a final x value of 451. The history on the right is not sequentially consistent. process 0process 1 tmp = Read(X); (-0.5) tmp = 4tmp^2+5tmp+1 tmp = Read(X); (-0.5) tmp = 4tmp^2+5tmp+1 Write(X, tmp); (-0.5) Initially x = -0.5; A legal sequential history will have a final x value of -0.5. (- 0.5 is the root of the equation tmp = 4tmp^2+5tmp+1) The history on the right is sequentially consistent. Whether it is sequentially consistent depends on the value of x (and insights of the code!)
12
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 212 Serializability and Sequential Consistency Caveat 3: Definition of equivalency Schedulers are not as smart as we are to figure that out So we are going to be more pessimistic and define conflict equivalency Two primitive operations are conflicting if: They are both writes are they write the same data item One is read and the other is write, and they read/write the same data item Two histories H and H’ are conflict equivalent iff They contain the same set of transactions For any two conflicting primitive operations p1 and p2, p1 is before p2 in H p1 is before p2 in H’ Conflict equivalency equivalency (assuming transactions are deterministic) (why?) The reverse is not true (by the earlier example)
13
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 213 Serializability A history H is serializable if it is conflict equivalent to some legal sequential history S (For comparison: A history H is sequentially consistent if it is equivalent to some legal sequential history S that preserves process order.) Different from linearizability: Serializability does not need to preserve operation partial order. A later transaction may not see the effects of an earlier transaction. Possible in most commercial databases. But the chance is small due to the actual way of implementing the scheduler. (You actually need to spend some effort to increase such chance.)
14
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 214 Serialization Graph A serialization graph SG(H) of a history H is a directed graph where: Each transaction is a vertex in the graph A directed edge from W to V exists iff W has a primitive operation p1 and V has a primitive operation p2 where p1 is before p2 in H and p1 and p2 conflict Example history: R(x)(by T1) R(x)(by T2) W(x)(by T1) W(y) (by T2) W(y)(by T1) R(x)(by T3) W(x)(byT3) T1T2 T3 SG(H) may or may not be transitive
15
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 215 Serializability Theorem Theorem: A history H is serializable iff SG(H) is acyclic. If SG(H) is acyclic, then H is serializable: Without loss of generality, let T1 T2 … be the topological sorting of the vertices in SG(H). Let S be the sequential history obtained by executing T1 T2 … sequentially. By definition, S is a legal sequential history. We need to show H is conflict equivalent to S. Prove by contradiction. Assume H is not, then there exist W (containing primitive operation p1) and V (containing primitive operation p2) where p1 and p2 are ordered differently in H and S. Without loss of generality, suppose p1 is before p2 in H. Then there must be an edge from W to V in SG(H) and p1 will be before p2 in S as well. Contradiction.
16
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 216 Serializability Theorem Theorem: A history H is serializable iff SG(H) is acyclic. If H is serializable then SG(H) is acyclic: Prove by contradiction and assume that SG(H) has a cycle of T1 T2 …Tk T1. History H is conflict equivalent to some sequential history S. Because T1 has an edge to T2 in SG(H), it means T1 has an operation p1 and T2 has an operation p2 where p1 and p2 conflicts and p1 is before p2 in H. Since S is conflict equivalent to H, p1 must be before p2 as well. Since S is a serial history, T1 must be before T2 in S. By same arguments, T2 is before T3 in S, T3 is before T4 in S, …. Tk is before T1 in S. This is impossible, however, because S is a serial history.
17
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 217 Serialization Graph and Theorem Serialization graph gives us a systematic way to determine whether a history is serializable Determination can always be done in polynomial number of steps But for sequential history: We did not have a systematic way In some case, we have to enumerate all serial histories to compare – exponential number of steps Why we did not discuss these before for sequential consistency? Can you derive a similar theorem for sequential consistency? So we can always make the determination in polynomial number of steps
18
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 218 Ensuring Serializability: All About Performance The scheduler can protect the entire database using a single critical section Essentially produces a sequential history Not efficient – readers (i.e. query transactions) should be able to access the database concurrently The scheduler can protect the entire database using a Reader/Writer lock (c.f. the Reader/Writer problem in Lecture 2) Query transactions obtain reader lock Update transactions obtain writer lock But databases are large and each transaction only touches a small portion
19
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 219 Ensuring Serializability: All About Performance Partition the database and use separate reader/writer locks for each partition In the extreme, each partition is a data item AcquireReaderLock(x); AcquireWriterLock(y); Read(x); do some computation; Write(y, value); ReleaseReaderLock(x); ReleaseWriterLock(y); Locking individual data items for a transaction
20
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 220 Ensuring Serializability: All About Performance AcquireReaderLock(x); AcquireWriterLock(y); Read(x); do some computation; Write(y, value); ReleaseReaderLock(x); ReleaseWriterLock(y); Locking individual data items for a transaction But the performance is still not very good We may overestimate the set of data items that a transaction needs to access We hold the locks for too long (imagine that the computation is solving some time-consuming problem
21
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 221 Ensuring Serializability: All About Performance AcquireReaderLock(x); Read(x); ReleaseReaderLock(x); do some computation; AcquireWriterLock(y); Write(y, value); ReleaseWriterLock(y); Lock the data items only when we use them This won’t work (even intuitively) process 0process 1 AcquireReaderLock(x); Read(x); ReleaseReaderLock(x); AcquireWriterLock(x); Write(x); ReleaseWriterLock(x); do some computation; AcquireWriterLock(y); Write(y); ReleaseWriterLock(y); AcquireWriterLock(y); Write(y); ReleaseWriterLock(y);
22
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 222 Ensuring Serializability: All About Performance Prove that the history is not serialiazable using the serialization theorem It is impossible here to prove that it is not sequentially consistent process 0process 1 AcquireReaderLock(x); Read(x); ReleaseReaderLock(x); AcquireWriterLock(x); Write(x); ReleaseWriterLock(x); do some computation; AcquireWriterLock(y); Write(y); ReleaseWriterLock(y); AcquireWriterLock(y); Write(y); ReleaseWriterLock(y);
23
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 223 Two-phase locking: A transaction must acquire lock for data item v before reading or writing v A transaction cannot obtain any further locks once it releases any lock Growing phase following by shrinking phase May result in deadlock Side note: A transaction may “upgrade” a reader lock to a writer lock. This is considered new lock acquire as well. In the previous example, process 0 will not release the lock on x until the end. A Widely Used Protocol: Two-phase Locking
24
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 224 Correctness of Two-phase Locking Lemma 1: Let H be a history produced by two-phase locking. Suppose that SG(H) contains an edge from W to V. Then there exists some data item x such that W unlocks x before V locks x in H. Proof: By definition of SG(H), if there is an edge from W to V, it means that there exist two primitive operations p1 (in W) and p2 (in V) such that they are conflicting and p1 is before p2 in H. Let x be the data item that p1 and p2 read or write. By two-phase locking rule, W needs to lock x before p1 occurs and V needs to lock x before p2 occurs. The only possibility is that V locks x after W unlocks x.
25
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 225 Correctness of Two-phase Locking Lemma 2: Let H be a history produced by two-phase locking. Suppose that SG(H) contains the path T_1 T_2 … T_n. Then there exist data items x and y (x and y do not need to be distinct) such that T_1 unlocks x before T_n locks y in H. Proof: Use an induction on n. Lemma 1 proves the case for n = 2. Assume the lemma hold for n-1 and we will prove it stills hold for n. By the inductive assumption, we know that there exist x and z such that T_1 unlocks x before T_{n-1} locks z in H. Because there is an edge from T_{n-1} to T_n in SG(H), Lemma 1 tells us that we can find a data item y such that T_{n-1} unlocks y before T_n locks y. By two-phase locking rule, T_{n-1} can only unlock y after it locks z. (key step!) Thus T_1 unlocks x before T_n locks y.
26
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 226 Correctness of Two-phase Locking Theorem: Every history H generated by two-phase locking is serializable. Prove by contradiction and assume H is not. Then SG(H) contains a cycle T_1 T_2 … T_n T_1. By Lemma 2, we can find data items x and y such that T_1 unlocks x before T_1 locks y. By this violates two-phase locking rule.
27
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 227 Linearizability in Databases (From Lecture 3) A history H is linearizable if 1. It is equivalent to some legal sequential history S, and 2. The operation partial order induced by H is a subset of the operation partial order induced by S Same as for sequential consistency, we will customize the definition for database context Caveat 1: Assume that all transactions are from different processes Caveat 2: Transactions may be parallel (we do not consider these) Caveat 3: Conflict equivalent instead of equivalent
28
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 228 Linearizability in Databases For databases, linearizability is sometime called external consistency. A history is externally consistent if: 1. It is conflict equivalent to some legal sequential history S, and 2. The operation partial order induced by H is a subset of the operation partial order induced by S Two-phase locking actually ensures external consistency C.f. slide 8, “most implementations of the scheduler will not have such behavior” that violates external order
29
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 229 Two-Phase Locking Preserves External Consistency Theorem: Any history H generated by two-phase locking is externally consistent. Proof: For each transaction T in H, we define its linearization point to be the time immediately after it acquires the last lock. Obviously, by two-phase locking rule, T has not released any locks at its linearization point. (This is where we leverage the two-phase locking property.) We construct a legal sequential history S to be all transactions ordered by their linearization points.
30
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 230 Two-Phase Locking Preserves External Consistency Claim 1: The operation (transaction) partial order induced by H is a subset of the operation (transaction) partial order induced by S Proof: Suppose W V belongs to the transaction partial order induced by H. This means that W finishes before V starts. Obviously W finishes acquiring all locks before V finishes acquiring all locks. Thus W’s serialization point is before V’s, and W is before V in S.
31
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 231 Two-Phase Locking Preserves External Consistency Claim 2: H is conflict equivalent to S. H and S contain the same set of transactions (obvious) For any two conflicting primitive operations p1 and p2, p1 is before p2 in H p1 is before p2 in S Proof: It is sufficient to prove that p1 is before p2 in H p1 is before p2 in S (why?) Let x be the data item accessed by both p1 and p2. Let W be the transaction containing p1 and V be the transaction containing p2. Because p1 is before p2 in H, W must unlock x before V locks x. We have: W will be before V in S, and thus p1 will be before p2 in S W’s serialization pointW unlocks xV’s serialization pointV locks x X XX X
32
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 232 Summary Define “sequential consistency” in databases: Serializability Two phase locking protocol to ensure serializability Define “linearizability” in databases: External consistency Two phase locking ensures external consistency as well
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.