Transactions and Their Distribution Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems October 20, 2015.

Slides:



Advertisements
Similar presentations
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
Advertisements

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
1 Concurrency Control Chapter Conflict Serializable Schedules  Two actions are in conflict if  they operate on the same DB item,  they belong.
Transaction Management: Concurrency Control CS634 Class 17, Apr 7, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Concurrency Control Amol Deshpande CMSC424. Approach, Assumptions etc.. Approach  Guarantee conflict-serializability by allowing certain types of concurrency.
Topic 6.3: Transactions and Concurrency Control Hari Uday.
Lock-Based Concurrency Control
Lecture 11 Recoverability. 2 Serializability identifies schedules that maintain database consistency, assuming no transaction fails. Could also examine.
1 Supplemental Notes: Practical Aspects of Transactions THIS MATERIAL IS OPTIONAL.
1 Transaction Management Overview Chapter Transactions  Concurrent execution of user programs is essential for good DBMS performance.  Because.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture X: Transactions.
Database Systems, 8 th Edition Concurrency Control with Time Stamping Methods Assigns global unique time stamp to each transaction Produces explicit.
Jan. 2014Dr. Yangjun Chen ACS Database recovery techniques (Ch. 21, 3 rd ed. – Ch. 19, 4 th and 5 th ed. – Ch. 23, 6 th ed.)
Transaction Management Overview R & G Chapter 16 There are three side effects of acid. Enhanced long term memory, decreased short term memory, and I forget.
CMPT Dr. Alexandra Fedorova Lecture X: Transactions.
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Concurrency Control and Recovery In real life: users access the database concurrently, and systems crash. Concurrent access to the database also improves.
Distributed Systems Fall 2010 Transactions and concurrency control.
Quick Review of May 1 material Concurrent Execution and Serializability –inconsistent concurrent schedules –transaction conflicts serializable == conflict.
1 Concurrency Control and Recovery Module 6, Lecture 1.
1 Transaction Management Overview Yanlei Diao UMass Amherst March 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Transaction Management
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Distributed Deadlocks and Transaction Recovery.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
CS162 Section Lecture 10 Slides based from Lecture and
1 Transaction Management Overview Chapter Transactions  Concurrent execution of user programs is essential for good DBMS performance.  Because.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
1 Transaction Management Overview Chapter Transactions  A transaction is the DBMS’s abstract view of a user program: a sequence of reads and writes.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 18.
Concurrency Control and Recovery In real life: users access the database concurrently, and systems crash. Concurrent access to the database also improves.
Scalable Web Crawling and Basic Transactions Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems October 6, 2015.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 10 Transaction Management.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Chapterb19 Transaction Management Transaction: An action, or series of actions, carried out by a single user or application program, which reads or updates.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Lecture 21 Ramakrishnan - Chapter 18.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
Concurrency Control R &G - Chapter 19. Transactions Concurrent execution of user programs is essential for good DBMS performance. – Because disk accesses.
Database Systems/COMP4910/Spring05/Melikyan1 Transaction Management Overview Unit 2 Chapter 16.
1 Transaction Management Overview Chapter Transactions  Concurrent execution of user programs is essential for good DBMS performance.  Because.
Chapter 11 Concurrency Control. Lock-Based Protocols  A lock is a mechanism to control concurrent access to a data item  Data items can be locked in.
Concurrency Control in Database Operating Systems.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Instructor: Xintao Wu.
II.I Selected Database Issues: 2 - Transaction ManagementSlide 1/20 1 II. Selected Database Issues Part 2: Transaction Management Lecture 4 Lecturer: Chris.
Distributed Transactions Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 15, 2008.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
1 Concurrency Control Lecture 22 Ramakrishnan - Chapter 19.
Transaction Management Overview. Transactions Concurrent execution of user programs is essential for good DBMS performance. – Because disk accesses are.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
1 Database Systems ( 資料庫系統 ) December 27, 2004 Chapter 17 By Hao-hua Chu ( 朱浩華 )
Synchronization & Transactions Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 3, 2008 Some slide content by.
Transaction Management and Recovery, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 18.
NOEA/IT - FEN: Databases/Transactions1 Transactions ACID Concurrency Control.
© 2016 A. Haeberlen, Z. Ives CIS 455/555: Internet and Web Systems 1 University of Pennsylvania Distributed transactions April 11, 2016.
MULTIUSER DATABASES : Concurrency and Transaction Management.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Transaction Management Overview
Transaction Management Overview
Transaction Management Overview
Chapter 10 Transaction Management and Concurrency Control
Distributed Transactions
Transaction Management
Transaction Management Overview
Transaction Management Overview
Presentation transcript:

Transactions and Their Distribution Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems October 20, 2015

Administrivia  Please read PNUTS paper for discussion next week  Upcoming schedule:  lectures April 14, 19  no class April 21  midterm April 26  Project demo signups will be done online, for finals week 2

3 Recall: ACID Semantics  Atomicity: operations are atomic, either committing or aborting as a single entity  Consistency: the state of the data is internally consistent  Isolation: all operations act as if they were run by themselves  Durability: all writes stay persistent!

4 Providing Atomicity and Consistency  Database systems provide transactions with the ability to abort a transaction upon some failure condition  Based on transaction logging – record all operations and undo them as necessary  Database systems also use the log to perform recovery from crashes  Undo all of the steps in a partially-complete transaction  Then redo them in their entirety  This is part of a protocol called ARIES  These can be the basis of persistent storage, and we can use middleware like J2EE to build distributed transactions with the ability to abort database operations if necessary

5 The Need for Isolation  Suppose eBay seller S has a bank account that we’re depositing money into, as people buy:  What if two purchases occur simultaneously, from two different servers on different continents? S = Accounts.Get(1234) Write S.bal = S.bal + $50

6 Concurrent Deposits  This update code is represented as a sequence of read and write operations on “data items” (which for now should be thought of as individual accounts): where S is the data item representing the seller’s account # 1234 Deposit 1 Deposit 2 Read(S.bal) S.bal := S.bal + $50 S.bal:= S.bal + €10 Write(S.bal)

7 A “Bad” Concurrent Execution Only one action (e.g. a read or a write) can actually happen at a time for a given database, and we can interleave deposit operations in many ways: Deposit 1 Deposit 2 Read(S.bal) S.bal := S.bal + $50 S.bal:= S.bal + €10 Write(S.bal) time BAD!

8 A “Good” Execution  Previous execution would have been fine if the accounts were different (i.e. one were S and one were T), i.e., transactions were independent  The following execution is a serial execution, and executes one transaction after the other: Deposit 1 Deposit 2 Read(S.bal) S.bal := S.bal + $50 write(S.bal) Read(S.bal) S.bal:= S.bal + $10 Write(S.bal) time GOOD!

9 Good Executions  An execution is “good” if it is serial (transactions are executed atomically and consecutively) or serializable (i.e. equivalent to some serial execution)  Equivalent to executing Deposit 1 then 3, or vice versa  Why would we want to do this instead? Deposit 1 Deposit 3 read(S.bal) read(T.bal) S.bal := S.bal + $50 T.bal:= T.bal + €10 write(S.bal) write(T.bal)

10 Concurrency Control  A means of ensuring that transactions are serializable  There are many methods, of which we’ll see one  Lock-based concurrency control (2-phase locking)  Optimistic concurrency control (no locks – based on timestamps)  Multiversion CC  …

Lock-Based Concurrency Control  Strict Two-phase Locking (Strict 2PL) Protocol:  Each transaction must obtain:  a S (shared) lock on object before reading  an X (exclusive) lock on object before writing  An owner of an S lock can upgrade it to X if no one else is holding the lock  All locks held by a transaction are released when the transaction completes  Locks are handled in a “growing” phase, then a “shrinking” phase  (Non-strict) 2PL Variant: Release locks anytime, but cannot acquire locks after releasing any lock.

12 Benefits of Strict 2PL  Strict 2PL allows only serializable schedules.  Additionally, it simplifies transaction aborts  (Non-strict) 2PL also allows only serializable schedules, but involves more complex abort processing

Aborting a Transaction  If a transaction T i is aborted, all its actions have to be undone  Not only that, if T j reads an object last written by T i, T j must be aborted as well!  Most systems avoid such cascading aborts by releasing a transaction’s locks only at commit time  If T i writes an object, T j can read this only after T i commits  Actions are undone by consulting the transaction log mentioned earlier

The Transaction Log  The following actions are recorded in the log:  T i writes an object: the old value and the new value  Log record must go to disk before the changed page does!  T i commits/aborts: a log record indicating this action  Log records are chained together by transaction id, so it’s easy to undo a specific transaction  Log is often mirrored and archived on stable storage

Another Benefit of the Log: Recovering From a Crash  3 phases in the ARIES recovery algorithm:  Analysis  Scan the log forward (from the most recent checkpoint) to identify all pending transactions, unwritten pages  Redo  Redo all updates to unwritten pages in the buffer pool, to ensure that all logged updates are in fact carried out and written to disk  Undo  Undo all writes done by incomplete transactions by working backwards in the log  (Care must be taken to handle the case of a crash occurring during the recovery process!)

A Danger with Locks: Deadlocks  Deadlock: Cycle of transactions waiting for locks to be released by each other  Two ways of dealing with deadlocks:  Deadlock prevention  Deadlock detection

Deadlock Prevention  Assign priorities based on timestamps (older = higher)  Assume T i wants a lock that T j holds  Do one of:  Wait-Die: If T i has higher priority, T i waits for T j ; otherwise T i aborts  Wound-wait: If T i has higher priority, T j aborts; otherwise T i waits  Higher-priority transactions never wait for lower-priority  If a transaction re-starts, make sure it has its original timestamp  Keeps it from always getting aborted!

18 Database Transactions and Concurrency Control, Summarized The basic goal was to guarantee ACID properties  Transactions and logging provide Atomicity and Consistency  Locks ensure Isolation  The transaction log (and RAID, backups, etc.) are also used to ensure Durability So far, we’ve been in the realm of databases – how does this extend to the distributed context?

19 Distributed Transactions  We generally rely on a middleware layer called application servers, aka TP monitors, to provide transactions across systems  Tuxedo, iPlanet, WebSphere, etc.  For atomicity, two-phase commit protocol  For isolation, need distributed concurrency control DB Transact Server Transact Server Workflow Controller Msg Queue Web Server App Server Client

Two-Phase Commit (2PC)  Site at which a transaction originates is the coordinator; other sites at which it executes are subordinates  Two rounds of communication, initiated by coordinator:  Voting  Coordinator sends prepare messages, waits for yes or no votes  Then, decision or termination  Coordinator sends commit or rollback messages, waits for acks  Any site can decide to abort a transaction!

21 Steps in 2PC When a transaction wants to commit:  Coordinator sends prepare message to each subordinate  Subordinate force-writes an abort or prepare log record and then sends a no (abort) or yes (prepare) message to coordinator  Coordinator considers votes:  If unanimous yes votes, force-writes a commit log record and sends commit message to all subordinates  Else, force-writes abort log rec, and sends abort message  Subordinates force-write abort/commit log records based on message they get, then send ack message to coordinator  Coordinator writes end log record after getting all acks

22 Illustration of 2PC CoordinatorSubordinate 1Subordinate 2 force-write begin log entry force-write prepared log entry force-write prepared log entry send “prepare” send “yes” force-write commit log entry send “commit” force-write commit log entry force-write commit log entry send “ack” write end log entry

Comments on 2PC  Every message reflects a decision by the sender; to ensure that this decision survives failures, it is first recorded in the local log  All log records for a transaction contain its ID and the coordinator’s ID  The coordinator’s abort/commit record also includes IDs of all subordinates  Thm: there exists no distributed commit protocol that can recover without communicating with other processes, in the presence of multiple failures!

What if a Site Fails in the Middle?  If we have a commit or abort log record for transaction T, but not an end record, we must redo/undo T  If this site is the coordinator for T, keep sending commit/abort msgs to subordinates until acks have been received  If we have a prepare log record for transaction T, but not commit/abort, this site is a subordinate for T  Repeatedly contact the coordinator to find status of T, then write commit/abort log record; redo/undo T; and write end log record  If we don’t have even a prepare log record for T, unilaterally abort and undo T  This site may be coordinator! If so, subordinates may send messages and need to also be undone

Blocking for the Coordinator  If coordinator for transaction T fails, subordinates who have voted yes cannot decide whether to commit or abort T until coordinator recovers  T is blocked  Even if all subordinates know each other (extra overhead in prepare msg) they are blocked unless one of them voted no

Link and Remote Site Failures  If a remote site does not respond during the commit protocol for transaction T, either because the site failed or the link failed:  If the current site is the coordinator for T, should abort T  If the current site is a subordinate, and has not yet voted yes, it should abort T  If the current site is a subordinate and has voted yes, it is blocked until the coordinator responds!

Observations on 2PC  Ack msgs used to let coordinator know when it’s done with a transaction; until it receives all acks, it must keep T in the transaction-pending table  If the coordinator fails after sending prepare msgs but before writing commit/abort log recs, when it comes back up it aborts the transaction

28 From Distributed Commits to Distributed Concurrency Control  What we saw were the steps involved in preserving atomicity and consistency in a distributed fashion  Let’s briefly look at distributed isolation (locking)…

Distributed Locking How do we manage locks across many sites?  Centralized: One site does all locking  Vulnerable to single site failure  Primary Copy: All locking for an object done at the primary copy site for this object  Reading requires access to locking site as well as site where the object is stored  We’ll see how this is used in PNUTS  Fully Distributed: Locking for a copy done at site where the copy is stored  Locks at all sites holding the object being written

Distributed Deadlock Detection  Each site maintains a local waits-for graph  A global deadlock might exist even if the local graphs contain no cycles: T1 T2 SITE ASITE BGLOBAL Three solutions:  Centralized (send all local graphs to one site)  Hierarchical (organize sites into a hierarchy and send local graphs to parent in the hierarchy)  Timeout (abort transaction if it waits too long)

31 Summary of Transactions and Concurrency  There are many (especially monetary) transfers that need atomicity and isolation  Transactions and concurrency control provide these features  In a distributed, 3-tier setting they run in an Application Server  Similar features are provided in a 2-tier setting for applications that run directly in the DBMS  Two-phase locking ensures isolation  Two-phase commit is a voting scheme for doing distributed commit