CS346: Advanced Databases

Slides:



Advertisements
Similar presentations
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
Advertisements

1 Integrity Ioan Despi Transactions: transaction concept, transaction state implementation of atomicity and durability concurrent executions serializability,
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
Transactions (Chapter ). What is it? Transaction - a logical unit of database processing Motivation - want consistent change of state in data Transactions.
Chapter 17 Introduction to Transaction Processing Concepts and Theory Copyright © 2004 Pearson Education, Inc.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 17 Introduction to Transaction Processing Concepts and Theory.
Concurrency Control and Recovery In real life: users access the database concurrently, and systems crash. Concurrent access to the database also improves.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
ACS-4902 R McFadyen 1 Chapter 17 Introduction to Transaction Processing Concepts and Theory 17.1, 17.2, 17.3, 17.5, 17.6.
Transaction Processing: September 27, Database Access For TP, represent database as a collection of named items. Read(X) - read database item X.
Concurrency control using transactions 1Transactions.
ICS (072)Transaction Processing Concepts and Theory 1 Introduction to Transaction Processing Concepts and Theory Chapter 17 Dr. Muhammad Shafique.
Quick Review of May 1 material Concurrent Execution and Serializability –inconsistent concurrent schedules –transaction conflicts serializable == conflict.
1 Transaction Management Overview Yanlei Diao UMass Amherst March 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Chapter 8 : Transaction Management. u Function and importance of transactions. u Properties of transactions. u Concurrency Control – Meaning of serializability.
Transaction Management
1 Introduction to Transaction Processing (1)
1 BASIC TRANSACTION CONCEPTS A Transaction: logical unit of database processing that includes one or more access operations (read -retrieval, write – insert.
Database Systems Chapter 17 ITM 354 Dr. Rick Kazman.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 21 Introduction to Transaction Processing Concepts and Theory.
Transaction Processing
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 1) Academic Year 2014 Spring.
INTRODUCTION TO TRANSACTION PROCESSING CHAPTER 21 (6/E) CHAPTER 17 (5/E)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 21 Introduction to Transaction Processing Concepts and Theory.
Chapter 17 Introduction to Transaction Processing Concepts and Theory Copyright © 2004 Pearson Education, Inc.
Transaction Processing Concepts
1 Transaction Management Overview Chapter Transactions  A transaction is the DBMS’s abstract view of a user program: a sequence of reads and writes.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 18.
1 Database Systems CS204 Lecture 21 Transaction Processing I Asma Ahmad FAST-NU April 7, 2011.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter Introduction to Transaction.
Transactions Sylvia Huang CS 157B. Transaction A transaction is a unit of program execution that accesses and possibly updates various data items. A transaction.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 10 Transaction Management.
The Concept of Transaction Processing A Transaction: logical unit of database processing that includes one or more access operations (read - retrieval,
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Lecture 21 Ramakrishnan - Chapter 18.
Transaction Processing Concepts. 1. Introduction To transaction Processing 1.1 Single User VS Multi User Systems One criteria to classify Database is.
Quick revision on Transaction Processing Concepts By: Dr. Yousry Taha Copyright 2010.
Transaction Lectured by, Jesmin Akhter, Assistant professor, IIT, JU.
1 Transaction Management Overview Chapter Transactions  Concurrent execution of user programs is essential for good DBMS performance.  Because.
1 Chapter 4 Introduction to Transaction Processing Concepts and Theory Adapted from the slides of “Fundamentals of Database Systems” (Elmasri et al., 2003)
Chapter 17 Introduction to Transaction Processing Concepts and Theory Copyright © 2004 Pearson Education, Inc.
TRANSACTION MANAGEMENT R.SARAVANAKUAMR. S.NAVEEN..
Transactions. What is it? Transaction - a logical unit of database processing Motivation - want consistent change of state in data Transactions developed.
Sekolah Tinggi Ilmu Statistik (STIS) 1 Dr. Said Mirza Pahlevi, M.Eng.
CSCI Transaction Processing Concepts 1 TRANSACTION PROCESSING CONCEPTS Dr. Awad Khalil Computer Science Department AUC.
II.I Selected Database Issues: 2 - Transaction ManagementSlide 1/20 1 II. Selected Database Issues Part 2: Transaction Management Lecture 4 Lecturer: Chris.
The Relational Model1 Transaction Processing Units of Work.
Transaction Processing The main reference of this presentation is the textbook and PPT from : Elmasri & Navathe, Fundamental of Database Systems, 4th edition,
1 CSE 480: Database Systems Lecture 24: Concurrency Control.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
Transactions.
1 Controlled concurrency Now we start looking at what kind of concurrency we should allow We first look at uncontrolled concurrency and see what happens.
CSE314 Database Systems Introduction To Transaction Processing Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Lec 8 Introduction to Transaction Processing Concepts and Theory Copyright © 2004 Pearson Education, Inc.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Lecture 5 Introduction to Transaction Processing Concepts and Theory.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
Transaction Management
Chap 20 – Transaction Processing
Ch 21: Transaction Processing
Introduction to Transaction Processing Concepts and Theory
Introduction to Transaction Processing Concepts and Theory
Transactions Sylvia Huang CS 157B.
Chapter 10 Transaction Management and Concurrency Control
12/4/2018.
1 Introduction to Transaction Processing (1)
2/23/2019.
Transaction management
Lec 9: Introduction to Transaction Processing Concepts and Theory
Presentation transcript:

CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Transaction Processing

Outline Chapter: “Introduction to Transaction Processing Concepts and Theory” in Elmasri and Navathe Introduce the concepts of transactions and concurrency control ACID properties: atomicity, consistency, independence, durability System logging, commit points, and failure recovery Schedules, serializability, and conflicts Transactions in SQL CS346 Advanced Databases

Why? Transaction processing is an important part of databases Jim Gray won the Turing award for his work on transactions A meeting of theory and practice Theory explains how to produce effective sequences of transactions Simple protocols to schedule transactions in practice An introduction to the topic of concurrency control and locking Of importance in distributed systems and managing distributed data CS346 Advanced Databases

Transaction Processing Transaction Processing Systems: Airline reservations Banking/credit card processing Ecommerce / online purchasing / auctions Stock markets Common requirements Many concurrent users making concurrent requests High availability, fast response time Don’t sell the same item (seat, share) to two different people! Based on the idea of atomic transactions A transaction either succeeds or is declined CS346 Advanced Databases

Transaction Processing Concepts Many examples of databases so far look like single user systems In practice, most databases are multiuser Hundreds or thousands (or more) users submitting transactions Make use of wide availability of parallelism in modern systems Multithreaded execution per core Multiple cores per CPU Multiple CPUs per system (cluster) Will study management of concurrent access to shared resources Here, data items are shared resources CS346 Advanced Databases

Transactions Transaction: the logical unit of database processing Including at least one insertion, deletion or retrieval operation May form part of a program, or specified via SQL A complex program may be broken into many basic transactions Programmer may explicitly specify start and end of a transaction Distinguish between read-only and read-write transactions Read-only seem easier, but still need a consistent view of data CS346 Advanced Databases

Databases and items Transaction processing adopts a very simple model of a database Here: a database is a collection of named data items The granularity of the database is the size of a single data item Can work at the level of a single database record A higher level item: a single disk block or whole file Lower level items: individual field (attribute) of a record Data item may correspond to a basic concept, e.g. a seat on a flight Each data item has a unique name (identifier) used internally E.g. the disk block address – not used by the programmer CS346 Advanced Databases

Basic Data Operations With this simplified database model, the basic operations are Read(X): read the item named X into local memory Write(X): write the item named X from local memory These cover the various substeps of data access Map from the name (X) to the relevant disk block containing X Moving data to/from disk via OS calls and buffers etc. Managing cache memory to speed up operations Transactions are formed by a sequence of read/write operations CS346 Advanced Databases

Example Transactions All operations of a transaction must complete successfully for the transaction to be successful Read (write) set: the set of items read (written) by a transaction Read set (a) = {X, Y}. Read set (b) = {X} Write set (a) = {X,Y}. Write set (b) = {X} Need concurrency control and recovery What happens if we try to run (a) and (b) at the same time? CS346 Advanced Databases

Need for concurrency control E.g. airline booking: don’t sell the same seat twice! Previous transaction (a): move N reservations from X to Y Transaction (b): reserve M sets on flight corresponding to X Bad things can happen with concurrency due to interleaving Lost updates Temporary updates Incorrect aggregation Unrepeatable reads CS346 Advanced Databases

Lost Updates Lost updates: when two transactions are interleaved If T1 and T2 run as shown, the update to X from T1 is lost E.g. if X=80 to begin, N = 5 and M = 4, this order results in X = 84 rather than X=79 CS346 Advanced Databases

Temporary Update (Dirty Read) Happens if a value is read in the middle of a transaction that fails If a transaction fails (T1), it is rolled back to the previous state Meanwhile, another transaction may update the intermediate value Value of X read by T2 is dirty data as it has not been committed Hence this is sometimes called the dirty read problem CS346 Advanced Databases

Incorrect Summary One transaction computes an aggregate while another updates Can include some values before update, others after update Generates a result that doesn’t correspond to before or after In example, the correct result is same before and after T1 CS346 Advanced Databases

Unrepeatable read Concurrency can cause problems with read-only transaction Suppose a transaction reads item X twice at different times The value of X is changed by another transaction in between The first transaction gets different values for the same item! Can arise in booking transactions: check availability, then update CS346 Advanced Databases

Transaction recovery Transactions should be “all or nothing”: called atomic A transaction either complete successfully (and correctly): commit Or has no effect on the database or other transactions: abort If a transaction fails after some operations, it must be undone Roll-back the earlier operations Many possible reasons for transaction failure CS346 Advanced Databases

Reasons for transaction failure Computer failure: disk error, memory read error, crash Transaction/system error: divide by zero, integer overflow May also have out-of-bounds parameters, program bug Local errors or exceptions during the transaction E.g., can’t find the referenced item E.g., insufficient funds for balance transfer Concurrency control enforcement System may decide to abort a transaction to ensure correctness May need to abort to resolve “deadlock” between transactions Disk failure: data on disk has got corrupted Physical problems: fire, theft, flood, operator error PEBCAK: Problem exists between chair and keyboard CS346 Advanced Databases

Transaction states To ensure transaction atomicity, system needs to track the state The recovery manager needs to keep track of each operation Transactions can be in one of a number of states Active state (after starting execution, can read and write) Partially committed state after it has finished operations Need to reach a point where system failure would still leave the data in a consistent state Committed state: transaction is completed, a commit point is made Failed state: if a check fails or transaction is aborted May have to roll back some writes Terminated state: the transaction leaves the system Failed or aborted transactions may be started (afresh) later CS346 Advanced Databases

System Log To recover from transaction failures, the system keeps a log Track all transaction operations that affect the database The system log is a sequential, append-only file kept on disk So more likely to survive system failure/crash Use memory to buffer the most recent updates Write out buffers to disk when they are full Ensure buffers are flushed to disk at a commit point Periodically back up the log to archival storage The log consists of a sequence of log records CS346 Advanced Databases

System log records [start_transaction, T]: T is a unique transaction id [write_item, T, X, old_value, new_value] Transaction T affects item X Technically, only old_value needed for rollback [read_item, T, X]: (read entry not strictly needed for rollback) May be included for other purposes e.g. auditing [commit, T]: T has successfully completed, and can be committed [abort, T]: T has been aborted CS346 Advanced Databases

Failure recovery Recovering from failure means either undoing or redoing steps Undo: undo each WRITE operation Trace backwards through the log and write the old_value Redo: repeat each WRITE operation using new_value Needed if a failure means the writes may not have all completed Ensures that all operations have been applied successfully CS346 Advanced Databases

Commit Points Commit points mark successful completion of transactions All operations of transaction T have been executed successfully AND the effect of all operations is recorded in the log The transaction is then committed and is permanently recorded Write a [commit, T] record in the log If a system failure occurs: Find all transactions T that have started but not committed Roll back their associated operations to undo their effect May have to redo some transactions to ensure correctness CS346 Advanced Databases

ACID properties of transaction processing Atomicity: a transaction is an atomic unit of processing It is either performed completely, or not at all Controlled by the transaction recovery subsystem of the DBMS Consistency: transactions should preserve database consistency If a transaction is done fully, it should keep DB in a consistent state Responsibility of the programmers, integrity constraints Isolation: effect should be independent of other transactions It should be as if it is the only transaction executing Enforced by the concurrency control system Durability: changes made must persist in the database Changes made by a transaction should not be lost by any failure Enforced by the transaction recovery subsystem CS346 Advanced Databases

Schedules of operations The order of execution of operations is called the schedule Schedule S orders the operations of n transactions T1, T2, ... Tn Operations from different transactions can be interleaved Operations from the same transaction must be in order S is a total order: for any two operations, one is before the other The main concern is the interleaving of read and write operations Notation: b, r, w, e, c, a for begin, read, write, end, commit, abort Can omit begin and end without loss of clarity Use transaction id (number) as a subscript for each operation E.g. S = r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y) CS346 Advanced Databases

Conflicts Two operations in a schedule conflict if: They belong to different transactions They access the same item X At least one operation is a write_item(X) Example: S = r1(X); r2 (X); w1(X); r1(Y); w2(X); w1(Y) r1(X) and w2(X) are in conflict; r2(X) and w1(X) are in conflict r1(X) and r2 (X) do not conflict with each other (why?) w2(X) and w1(Y) do not conflict (why?) r1(X) and w1(X) do not conflict (why?) Two operations conflict if swapping them results in a different outcome E.g. swapping r1(X) and w2(X) can change value of X read by T1 CS346 Advanced Databases

Complete Schedule A schedule S of n transactions is a complete schedule if: The operations in S are exactly those of T1, T2, ... Tn including a commit or abort operation as the last in each transaction So no active transactions at the end of the schedule Every pair of transactions in Ti is in the same order in S as it is in Ti The order of every pair of conflicting operations is specified in S Don’t have to specify the order of nonconflicting operations Hence can be a partial order on the operations In live systems, schedules are rarely complete New transactions are always starting Define the Committed projection of a schedule S, C(S) Only operations in S belonging to committed transactions CS346 Advanced Databases

Recoverability of Schedules Some schedules are more easy to recover from than others Attempt to characterize schedules that are easily recoverable Recoverable schedule: once T is committed, never have to undo T Helps ensure the durability property Nonrecoverable schedules should not be allowed by DBMS Formal definition for schedule S to be recoverable: T reads transaction T’ if item X is first written by T’, later read by T S is recoverable if no transaction T in S commits until all transactions T’ read by T have committed first AND T’ must not have been aborted before T reads X CS346 Advanced Databases

Recoverable schedules There is always a way to recover a recoverable schedule But it may still be quite complex to do so Example: S = r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1; Schedule S is recoverable by the previous definition Note: S suffers from lost updates: this does not affect recoverability The following schedule is not recoverable – why? r1(X); w1(X); r2(X); r1(Y); w2(X); c2 ; a1 Possible fixes to the schedule: Postpone the commit c2: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2 Abort both: r1(X); w1(X); r2(X); r1(Y); w2(X); a1 ; a2 CS346 Advanced Databases

Cascading rollback and strict schedule Cascading rollback is when an uncommitted transaction has to be rolled back because it read from a transaction that failed E.g. T2 in previous example r1(X); w1(X); r2(X); r1(Y); w2(X); a1 ; a2 Try to avoid cascading rollback – can be quite time consuming Can define a cascadeless schedule: Every transaction only reads from committed transactions E.g. move back the r2(X): r1(X); w1(X); r1(Y); w1(Y); c1; r2(X); w2(X); c2 A strict schedule is the most restrictive type Don’t read or write X until the last transaction to write X commits Simple to undo writes: just restore the old value of X If not strict, undoing write of aborted transaction is not enough CS346 Advanced Databases

Relation between concepts Ordering: Strict  Cascadeless  Recoverable Strict: Don’t read or write X after T has written X, until T commits Cascadeless: Transactions only read from committed transactions Recoverable: Transactions commit only after transactions they have read from commit CS346 Advanced Databases

Serializability Recoverability did not consider correctness (isolation) Serializability is concerned with this property There are some simple approaches to serializability Consider two transactions T1 and T2 submitted at same time Either do T1 entirely, before T2, or vice-versa Not great: limits throughput, can cause blocking CS346 Advanced Databases

Serial and non-serial schedules A schedule S is serial if for every transaction T in S, all operations in T are executed sequentially; else, it is nonserial CS346 Advanced Databases

Serial Schedules Only one transaction is active at an time in a serial schedule If transactions are independent, every serial schedule is correct Serial schedules limit concurrency by prohibiting interleaving Must wait for all I/O to finish... very slow Serial schedules are consider unacceptable in practice Accept schedules that are equivalent to serial ones in effect Which schedules on the previous slide are equivalent to serial? Which suffer from the lost update problem? Serializable schedule: one that is equivalent to a serial one Consider this to be our definition of “correctness” Need to define equivalence of schedules! There are several alternate definitions with different properties CS346 Advanced Databases

Conflict serializable Two ops in a schedule conflict if: They are in different transactions They access the same item X At least one op is a write_item(X) Conflict serializable Result equivalent: if they produce the same final state May happen by chance given a particular initial state Conflict equivalent is the most commonly used definition The order of any two conflicting operations is the same in both S is conflict serializable if it is equivalent to some serial schedule S’ That is, the nonconflicting operations can be reordered to make S’ A & D are conflict equivalent: r2(X) follows w1(X) in both r1(Y); w1(Y) in D doesn’t conflict with T2 so can be moved earlier CS346 Advanced Databases

Testing for conflict serializability Create a serialization graph of the read and write operations A directed graph with nodes T1, ... Tn Directed edge (Tj  Tk) if an op in Tj precedes a conflicting op in Tk S is serializable if and only if its serialization graph has no cycles CS346 Advanced Databases

Serialization Graph Graphs for schedules A, B, C, D: Two ops in a schedule conflict if: They are in different transactions They access the same item X At least one op is a write_item(X) Serialization Graph Graphs for schedules A, B, C, D: Showing the name of the item causing the edges as its label Can create an equivalent serial schedule S’ from the graph of S When there is an edge (Ti  Tj), Ti must appear before Tj in S’ Else, resolve ordering arbitrarily [make total order from partial order] CS346 Advanced Databases

Serializability Example Two ops in a schedule conflict if: They are in different transactions They access the same item X At least one op is a write_item(X) CS346 Advanced Databases

Serializability In practice, it is hard to check for serializability Interleaving of concurrent operations controlled by the OS DBMS can’t specify the exact order of execution of parallel tasks Don’t want to check for serializability after the fact Instead, design schedules based on protocols These ensure that all realized schedules are serializable Not feasible to mark start and end of schedules in live systems Consider only the committed projection of a schedule I.e. the operations from committed transactions Most common technique is two-phase locking (2PL) [later] Prevent transactions that could interfere with each other Other protocols: timestamp ordering, optimistic concurrency control CS346 Advanced Databases

View Equivalence View equivalence is a weaker notion than conflict equivalence Based on the view of the data witnessed by each schedule Schedules S and S’ are said to be view equivalent if: Both S and S’ include all operations of the same set of transactions For any operation ri(X) in S, if the read value of X was written by operation wj(X), the same condition must hold for S’ If wk(Y) is the last operation to write to Y in S, then wk(Y) must also be the last to write to Y in S’ Read operations see the same view in both schedules The final write is the same in both, so the same state is reached S is view serializable if it is view equivalent to a serial schedule CS346 Advanced Databases

View and Conflict Serializability The constrained write assumption (no blind writes): CWA: If every wi(X) in Ti is preceded by ri(X) Implies computation of new value of X depends on the old value A blind write is when X is written without reading it first View and conflict serializability coincide if CWA holds Unconstrained write assumption: blind writes are allowed E.g. r1(X); w2(X); w1(X); w3(X); c1; c2; c3 is view serializable to r1(X); w1(X); c1; w2(X); c2; w3(X); is not conflict serializable to any serial schedule Testing for view serializability is NP-hard CS346 Advanced Databases

Venn diagram of schedules http://en.wikipedia.org/wiki/Schedule_%28computer_science%29 CS346 Advanced Databases

Other types of schedule equivalence In some situations, can relax the definition of equivalence Such as debit-credit transactions (e.g. bank account updates) All transactions add or subtract to the value of a data item Can have correct schedules that are not serializable Because addition and subtraction commute Consider two transactions that want to move money T1: r1(X); X  X – 10; w1(X); r1(Y); Y  Y + 10; w1(Y); T2: r2(Y); Y  Y – 20; w2(Y); r2(X); X  X + 20; w2(X); Schedule S: r1(X); w1(X); r2(Y); w2(Y); r1(Y); w1(Y); r2(X); w2(X) S is not serializable but is correct because of transaction semantics CS346 Advanced Databases

Transaction Support in SQL SQL allows the definition of atomic transactions No explicit begin_transaction, but must COMMIT or ROLLBACK The access mode of the transaction can be specified READ ONLY or READ WRITE (default) The diagnostic area keeps error data on n previous statements The isolation level defines how strict to be with transactions SERIALIZABLE (default) Lower levels: REPEATABLE READ, READ COMMITTED, READ UNCOMMITTED May allow a transaction to read a value that has not been committed, or read a value twice in a transaction and get two values CS346 Advanced Databases

Sample SQL transaction So you will know what one looks like: EXEC SQL whenever sqlerror go to UNDO;  EXEC SQL SET TRANSACTION READ WRITE DIAGNOSTICS SIZE 5 ISOLATION LEVEL SERIALIZABLE;  EXEC SQL INSERT INTO EMPLOYEE (FNAME, LNAME, SSN, DNO, SALARY) VALUES ('Robert','Smith','991004321',2,35000); EXEC SQL UPDATE EMPLOYEE SET SALARY = SALARY * 1.1 WHERE DNO = 2; EXEC SQL COMMIT; GOTO THE_END;   UNDO: EXEC SQL ROLLBACK; THE_END: ... CS346 Advanced Databases

Summary Saw the concepts of transactions and concurrency control ACID properties: atomicity, consistency, independence, durability System logging, commit points, and failure recovery Schedules, serializability, and conflicts Multiple definitions of serializability, and checking Transactions in SQL Chapter: “Introduction to Transaction Processing Concepts and Theory” in Elmasri and Navathe CS346 Advanced Databases