Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS346: Advanced Databases

Similar presentations


Presentation on theme: "CS346: Advanced Databases"— Presentation transcript:

1 CS346: Advanced Databases
Graham Cormode Transaction Processing

2 Outline Chapter: “Introduction to Transaction Processing Concepts and Theory” in Elmasri and Navathe Introduce the concepts of transactions and concurrency control ACID properties: atomicity, consistency, independence, durability System logging, commit points, and failure recovery Schedules, serializability, and conflicts Transactions in SQL CS346 Advanced Databases

3 Why? Transaction processing is an important part of databases
Jim Gray won the Turing award for his work on transactions A meeting of theory and practice Theory explains how to produce effective sequences of transactions Simple protocols to schedule transactions in practice An introduction to the topic of concurrency control and locking Of importance in distributed systems and managing distributed data CS346 Advanced Databases

4 Transaction Processing
Transaction Processing Systems: Airline reservations Banking/credit card processing Ecommerce / online purchasing / auctions Stock markets Common requirements Many concurrent users making concurrent requests High availability, fast response time Don’t sell the same item (seat, share) to two different people! Based on the idea of atomic transactions A transaction either succeeds or is declined CS346 Advanced Databases

5 Transaction Processing Concepts
Many examples of databases so far look like single user systems In practice, most databases are multiuser Hundreds or thousands (or more) users submitting transactions Make use of wide availability of parallelism in modern systems Multithreaded execution per core Multiple cores per CPU Multiple CPUs per system (cluster) Will study management of concurrent access to shared resources Here, data items are shared resources CS346 Advanced Databases

6 Transactions Transaction: the logical unit of database processing
Including at least one insertion, deletion or retrieval operation May form part of a program, or specified via SQL A complex program may be broken into many basic transactions Programmer may explicitly specify start and end of a transaction Distinguish between read-only and read-write transactions Read-only seem easier, but still need a consistent view of data CS346 Advanced Databases

7 Databases and items Transaction processing adopts a very simple model of a database Here: a database is a collection of named data items The granularity of the database is the size of a single data item Can work at the level of a single database record A higher level item: a single disk block or whole file Lower level items: individual field (attribute) of a record Data item may correspond to a basic concept, e.g. a seat on a flight Each data item has a unique name (identifier) used internally E.g. the disk block address – not used by the programmer CS346 Advanced Databases

8 Basic Data Operations With this simplified database model, the basic operations are Read(X): read the item named X into local memory Write(X): write the item named X from local memory These cover the various substeps of data access Map from the name (X) to the relevant disk block containing X Moving data to/from disk via OS calls and buffers etc. Managing cache memory to speed up operations Transactions are formed by a sequence of read/write operations CS346 Advanced Databases

9 Example Transactions All operations of a transaction must complete successfully for the transaction to be successful Read (write) set: the set of items read (written) by a transaction Read set (a) = {X, Y}. Read set (b) = {X} Write set (a) = {X,Y}. Write set (b) = {X} Need concurrency control and recovery What happens if we try to run (a) and (b) at the same time? CS346 Advanced Databases

10 Need for concurrency control
E.g. airline booking: don’t sell the same seat twice! Previous transaction (a): move N reservations from X to Y Transaction (b): reserve M sets on flight corresponding to X Bad things can happen with concurrency due to interleaving Lost updates Temporary updates Incorrect aggregation Unrepeatable reads CS346 Advanced Databases

11 Lost Updates Lost updates: when two transactions are interleaved
If T1 and T2 run as shown, the update to X from T1 is lost E.g. if X=80 to begin, N = 5 and M = 4, this order results in X = 84 rather than X=79 CS346 Advanced Databases

12 Temporary Update (Dirty Read)
Happens if a value is read in the middle of a transaction that fails If a transaction fails (T1), it is rolled back to the previous state Meanwhile, another transaction may update the intermediate value Value of X read by T2 is dirty data as it has not been committed Hence this is sometimes called the dirty read problem CS346 Advanced Databases

13 Incorrect Summary One transaction computes an aggregate while another updates Can include some values before update, others after update Generates a result that doesn’t correspond to before or after In example, the correct result is same before and after T1 CS346 Advanced Databases

14 Unrepeatable read Concurrency can cause problems with read-only transaction Suppose a transaction reads item X twice at different times The value of X is changed by another transaction in between The first transaction gets different values for the same item! Can arise in booking transactions: check availability, then update CS346 Advanced Databases

15 Transaction recovery Transactions should be “all or nothing”: called atomic A transaction either complete successfully (and correctly): commit Or has no effect on the database or other transactions: abort If a transaction fails after some operations, it must be undone Roll-back the earlier operations Many possible reasons for transaction failure CS346 Advanced Databases

16 Reasons for transaction failure
Computer failure: disk error, memory read error, crash Transaction/system error: divide by zero, integer overflow May also have out-of-bounds parameters, program bug Local errors or exceptions during the transaction E.g., can’t find the referenced item E.g., insufficient funds for balance transfer Concurrency control enforcement System may decide to abort a transaction to ensure correctness May need to abort to resolve “deadlock” between transactions Disk failure: data on disk has got corrupted Physical problems: fire, theft, flood, operator error PEBCAK: Problem exists between chair and keyboard CS346 Advanced Databases

17 Transaction states To ensure transaction atomicity, system needs to track the state The recovery manager needs to keep track of each operation Transactions can be in one of a number of states Active state (after starting execution, can read and write) Partially committed state after it has finished operations Need to reach a point where system failure would still leave the data in a consistent state Committed state: transaction is completed, a commit point is made Failed state: if a check fails or transaction is aborted May have to roll back some writes Terminated state: the transaction leaves the system Failed or aborted transactions may be started (afresh) later CS346 Advanced Databases

18 System Log To recover from transaction failures, the system keeps a log Track all transaction operations that affect the database The system log is a sequential, append-only file kept on disk So more likely to survive system failure/crash Use memory to buffer the most recent updates Write out buffers to disk when they are full Ensure buffers are flushed to disk at a commit point Periodically back up the log to archival storage The log consists of a sequence of log records CS346 Advanced Databases

19 System log records [start_transaction, T]: T is a unique transaction id [write_item, T, X, old_value, new_value] Transaction T affects item X Technically, only old_value needed for rollback [read_item, T, X]: (read entry not strictly needed for rollback) May be included for other purposes e.g. auditing [commit, T]: T has successfully completed, and can be committed [abort, T]: T has been aborted CS346 Advanced Databases

20 Failure recovery Recovering from failure means either undoing or redoing steps Undo: undo each WRITE operation Trace backwards through the log and write the old_value Redo: repeat each WRITE operation using new_value Needed if a failure means the writes may not have all completed Ensures that all operations have been applied successfully CS346 Advanced Databases

21 Commit Points Commit points mark successful completion of transactions
All operations of transaction T have been executed successfully AND the effect of all operations is recorded in the log The transaction is then committed and is permanently recorded Write a [commit, T] record in the log If a system failure occurs: Find all transactions T that have started but not committed Roll back their associated operations to undo their effect May have to redo some transactions to ensure correctness CS346 Advanced Databases

22 ACID properties of transaction processing
Atomicity: a transaction is an atomic unit of processing It is either performed completely, or not at all Controlled by the transaction recovery subsystem of the DBMS Consistency: transactions should preserve database consistency If a transaction is done fully, it should keep DB in a consistent state Responsibility of the programmers, integrity constraints Isolation: effect should be independent of other transactions It should be as if it is the only transaction executing Enforced by the concurrency control system Durability: changes made must persist in the database Changes made by a transaction should not be lost by any failure Enforced by the transaction recovery subsystem CS346 Advanced Databases

23 Schedules of operations
The order of execution of operations is called the schedule Schedule S orders the operations of n transactions T1, T2, ... Tn Operations from different transactions can be interleaved Operations from the same transaction must be in order S is a total order: for any two operations, one is before the other The main concern is the interleaving of read and write operations Notation: b, r, w, e, c, a for begin, read, write, end, commit, abort Can omit begin and end without loss of clarity Use transaction id (number) as a subscript for each operation E.g. S = r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y) CS346 Advanced Databases

24 Conflicts Two operations in a schedule conflict if:
They belong to different transactions They access the same item X At least one operation is a write_item(X) Example: S = r1(X); r2 (X); w1(X); r1(Y); w2(X); w1(Y) r1(X) and w2(X) are in conflict; r2(X) and w1(X) are in conflict r1(X) and r2 (X) do not conflict with each other (why?) w2(X) and w1(Y) do not conflict (why?) r1(X) and w1(X) do not conflict (why?) Two operations conflict if swapping them results in a different outcome E.g. swapping r1(X) and w2(X) can change value of X read by T1 CS346 Advanced Databases

25 Complete Schedule A schedule S of n transactions is a complete schedule if: The operations in S are exactly those of T1, T2, ... Tn including a commit or abort operation as the last in each transaction So no active transactions at the end of the schedule Every pair of transactions in Ti is in the same order in S as it is in Ti The order of every pair of conflicting operations is specified in S Don’t have to specify the order of nonconflicting operations Hence can be a partial order on the operations In live systems, schedules are rarely complete New transactions are always starting Define the Committed projection of a schedule S, C(S) Only operations in S belonging to committed transactions CS346 Advanced Databases

26 Recoverability of Schedules
Some schedules are more easy to recover from than others Attempt to characterize schedules that are easily recoverable Recoverable schedule: once T is committed, never have to undo T Helps ensure the durability property Nonrecoverable schedules should not be allowed by DBMS Formal definition for schedule S to be recoverable: T reads transaction T’ if item X is first written by T’, later read by T S is recoverable if no transaction T in S commits until all transactions T’ read by T have committed first AND T’ must not have been aborted before T reads X CS346 Advanced Databases

27 Recoverable schedules
There is always a way to recover a recoverable schedule But it may still be quite complex to do so Example: S = r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1; Schedule S is recoverable by the previous definition Note: S suffers from lost updates: this does not affect recoverability The following schedule is not recoverable – why? r1(X); w1(X); r2(X); r1(Y); w2(X); c2 ; a1 Possible fixes to the schedule: Postpone the commit c2: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2 Abort both: r1(X); w1(X); r2(X); r1(Y); w2(X); a1 ; a2 CS346 Advanced Databases

28 Cascading rollback and strict schedule
Cascading rollback is when an uncommitted transaction has to be rolled back because it read from a transaction that failed E.g. T2 in previous example r1(X); w1(X); r2(X); r1(Y); w2(X); a1 ; a2 Try to avoid cascading rollback – can be quite time consuming Can define a cascadeless schedule: Every transaction only reads from committed transactions E.g. move back the r2(X): r1(X); w1(X); r1(Y); w1(Y); c1; r2(X); w2(X); c2 A strict schedule is the most restrictive type Don’t read or write X until the last transaction to write X commits Simple to undo writes: just restore the old value of X If not strict, undoing write of aborted transaction is not enough CS346 Advanced Databases

29 Relation between concepts
Ordering: Strict  Cascadeless  Recoverable Strict: Don’t read or write X after T has written X, until T commits Cascadeless: Transactions only read from committed transactions Recoverable: Transactions commit only after transactions they have read from commit CS346 Advanced Databases

30 Serializability Recoverability did not consider correctness (isolation) Serializability is concerned with this property There are some simple approaches to serializability Consider two transactions T1 and T2 submitted at same time Either do T1 entirely, before T2, or vice-versa Not great: limits throughput, can cause blocking CS346 Advanced Databases

31 Serial and non-serial schedules
A schedule S is serial if for every transaction T in S, all operations in T are executed sequentially; else, it is nonserial CS346 Advanced Databases

32 Serial Schedules Only one transaction is active at an time in a serial schedule If transactions are independent, every serial schedule is correct Serial schedules limit concurrency by prohibiting interleaving Must wait for all I/O to finish... very slow Serial schedules are consider unacceptable in practice Accept schedules that are equivalent to serial ones in effect Which schedules on the previous slide are equivalent to serial? Which suffer from the lost update problem? Serializable schedule: one that is equivalent to a serial one Consider this to be our definition of “correctness” Need to define equivalence of schedules! There are several alternate definitions with different properties CS346 Advanced Databases

33 Conflict serializable
Two ops in a schedule conflict if: They are in different transactions They access the same item X At least one op is a write_item(X) Conflict serializable Result equivalent: if they produce the same final state May happen by chance given a particular initial state Conflict equivalent is the most commonly used definition The order of any two conflicting operations is the same in both S is conflict serializable if it is equivalent to some serial schedule S’ That is, the nonconflicting operations can be reordered to make S’ A & D are conflict equivalent: r2(X) follows w1(X) in both r1(Y); w1(Y) in D doesn’t conflict with T2 so can be moved earlier CS346 Advanced Databases

34 Testing for conflict serializability
Create a serialization graph of the read and write operations A directed graph with nodes T1, ... Tn Directed edge (Tj  Tk) if an op in Tj precedes a conflicting op in Tk S is serializable if and only if its serialization graph has no cycles CS346 Advanced Databases

35 Serialization Graph Graphs for schedules A, B, C, D:
Two ops in a schedule conflict if: They are in different transactions They access the same item X At least one op is a write_item(X) Serialization Graph Graphs for schedules A, B, C, D: Showing the name of the item causing the edges as its label Can create an equivalent serial schedule S’ from the graph of S When there is an edge (Ti  Tj), Ti must appear before Tj in S’ Else, resolve ordering arbitrarily [make total order from partial order] CS346 Advanced Databases

36 Serializability Example
Two ops in a schedule conflict if: They are in different transactions They access the same item X At least one op is a write_item(X) CS346 Advanced Databases

37 Serializability In practice, it is hard to check for serializability
Interleaving of concurrent operations controlled by the OS DBMS can’t specify the exact order of execution of parallel tasks Don’t want to check for serializability after the fact Instead, design schedules based on protocols These ensure that all realized schedules are serializable Not feasible to mark start and end of schedules in live systems Consider only the committed projection of a schedule I.e. the operations from committed transactions Most common technique is two-phase locking (2PL) [later] Prevent transactions that could interfere with each other Other protocols: timestamp ordering, optimistic concurrency control CS346 Advanced Databases

38 View Equivalence View equivalence is a weaker notion than conflict equivalence Based on the view of the data witnessed by each schedule Schedules S and S’ are said to be view equivalent if: Both S and S’ include all operations of the same set of transactions For any operation ri(X) in S, if the read value of X was written by operation wj(X), the same condition must hold for S’ If wk(Y) is the last operation to write to Y in S, then wk(Y) must also be the last to write to Y in S’ Read operations see the same view in both schedules The final write is the same in both, so the same state is reached S is view serializable if it is view equivalent to a serial schedule CS346 Advanced Databases

39 View and Conflict Serializability
The constrained write assumption (no blind writes): CWA: If every wi(X) in Ti is preceded by ri(X) Implies computation of new value of X depends on the old value A blind write is when X is written without reading it first View and conflict serializability coincide if CWA holds Unconstrained write assumption: blind writes are allowed E.g. r1(X); w2(X); w1(X); w3(X); c1; c2; c3 is view serializable to r1(X); w1(X); c1; w2(X); c2; w3(X); is not conflict serializable to any serial schedule Testing for view serializability is NP-hard CS346 Advanced Databases

40 Venn diagram of schedules
CS346 Advanced Databases

41 Other types of schedule equivalence
In some situations, can relax the definition of equivalence Such as debit-credit transactions (e.g. bank account updates) All transactions add or subtract to the value of a data item Can have correct schedules that are not serializable Because addition and subtraction commute Consider two transactions that want to move money T1: r1(X); X  X – 10; w1(X); r1(Y); Y  Y + 10; w1(Y); T2: r2(Y); Y  Y – 20; w2(Y); r2(X); X  X + 20; w2(X); Schedule S: r1(X); w1(X); r2(Y); w2(Y); r1(Y); w1(Y); r2(X); w2(X) S is not serializable but is correct because of transaction semantics CS346 Advanced Databases

42 Transaction Support in SQL
SQL allows the definition of atomic transactions No explicit begin_transaction, but must COMMIT or ROLLBACK The access mode of the transaction can be specified READ ONLY or READ WRITE (default) The diagnostic area keeps error data on n previous statements The isolation level defines how strict to be with transactions SERIALIZABLE (default) Lower levels: REPEATABLE READ, READ COMMITTED, READ UNCOMMITTED May allow a transaction to read a value that has not been committed, or read a value twice in a transaction and get two values CS346 Advanced Databases

43 Sample SQL transaction
So you will know what one looks like: EXEC SQL whenever sqlerror go to UNDO;  EXEC SQL SET TRANSACTION READ WRITE DIAGNOSTICS SIZE 5 ISOLATION LEVEL SERIALIZABLE;  EXEC SQL INSERT INTO EMPLOYEE (FNAME, LNAME, SSN, DNO, SALARY) VALUES ('Robert','Smith',' ',2,35000); EXEC SQL UPDATE EMPLOYEE SET SALARY = SALARY * 1.1 WHERE DNO = 2; EXEC SQL COMMIT; GOTO THE_END;   UNDO: EXEC SQL ROLLBACK; THE_END: ... CS346 Advanced Databases

44 Summary Saw the concepts of transactions and concurrency control
ACID properties: atomicity, consistency, independence, durability System logging, commit points, and failure recovery Schedules, serializability, and conflicts Multiple definitions of serializability, and checking Transactions in SQL Chapter: “Introduction to Transaction Processing Concepts and Theory” in Elmasri and Navathe CS346 Advanced Databases


Download ppt "CS346: Advanced Databases"

Similar presentations


Ads by Google