Distributed Transaction Recovery

Slides:



Advertisements
Similar presentations
1 Transactions and Web Services. 2 Web Environment Web Service activities form a unit of work, but ACID properties are not always appropriate since Web.
Advertisements

6.852: Distributed Algorithms Spring, 2008 Class 7.
CS542: Topics in Distributed Systems Distributed Transactions and Two Phase Commit Protocol.
4/26/ Two Phase Commit CSEP 545 Transaction Processing for E-Commerce Philip A. Bernstein Copyright ©2003 Philip A. Bernstein.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
CS 603 Handling Failure in Commit February 20, 2002.
COS 461 Fall 1997 Transaction Processing u normal systems lose their state when they crash u many applications need better behavior u today’s topic: how.
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 12: Three-Phase Commits (3PC) Professor Chen Li.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
CIS 720 Concurrency Control. Timestamp-based concurrency control Assign a timestamp ts(T) to each transaction T. Each data item x has two timestamps:
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CS 582 / CMPE 481 Distributed Systems
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Chapter 12 Grid Transaction Atomicity and Durability
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
CS 603 Three-Phase Commit February 22, Centralized vs. Decentralized Protocols What if we don’t want a coordinator? Decentralized: –Each site broadcasts.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Distributed Commit. Example Consider a chain of stores and suppose a manager – wants to query all the stores, – find the inventory of toothbrushes at.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Distributed Databases
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Commit Protocols. CS5204 – Operating Systems2 Fault Tolerance Causes of failure: process failure machine failure network failure Goals : transparent:
Distributed Deadlocks and Transaction Recovery.
Distributed Commit Dr. Yingwu Zhu. Failures in a distributed system Consistency requires agreement among multiple servers – Is transaction X committed?
CS162 Section Lecture 10 Slides based from Lecture and
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Distributed Transactions Chapter 13
PAVANI REDDY KATHURI TRANSACTION COMMUNICATION. OUTLINE 0 P ART I : I NTRODUCTION 0 P ART II : C URRENT R ESEARCH 0 P ART III : F UTURE P OTENTIAL 0 R.
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
Distributed Transaction Management, Fall 2002Lecture Distributed Commit Protocols Jyrki Nummenmaa
University of Tampere, CS Department Distributed Commit.
Databases Illuminated
XA Transactions.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Distributed Transactions Chapter – Vidya Satyanarayanan.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
Committed:Effects are installed to the database. Aborted:Does not execute to completion and any partial effects on database are erased. Consistent state:
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Multidatabase Transaction Management COP5711. Multidatabase Transaction Management Outline Review - Transaction Processing Multidatabase Transaction Management.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Transactional Information Systems:
Two Phase Commit CSE593 Transaction Processing Philip A. Bernstein
Chapter 19: Distributed Databases
Outline Introduction Background Distributed DBMS Architecture
Database System Implementation CSE 507
Commit Protocols CS60002: Distributed Systems
Outline Introduction Background Distributed DBMS Architecture
Outline Announcements Fault Tolerance.
Distributed Transactions
Distributed Databases Recovery
Chapter 19: Distributed Transaction Recovery
CIS 720 Concurrency Control.
Transaction Communication
Presentation transcript:

Distributed Transaction Recovery

Goal All_or_nothing semantics of atomicity must be extended to a transaction’s updates on multiple servers Distributed system can fail partially The solution is to set up a special handshake protocol between servers from a family of distributed commit protocols The protocol implies that there are certain circumstances under which a failed server must communicate to other servers during its restart to find out the systemwide decision about the termination status of in-doubt transactions (uncertain transactions)

Goal Servers are no longer autonomous in that they can always independently recover „in splendid isolation” Commit protocols make very few assumptions about how the various servers implement their local recovery The requirement there is that the rules of the commit protocol are followed by all parties of a distributed system The 2PC protocol has been standardized and is generally accepted in the software industry

How can we ensure atomicity of a distributed transaction? Distributed Commit The problem: How can we ensure atomicity of a distributed transaction? Example: Consider a chain of stores belonging to a supermarket. Suppose a manager of the chain wants to query all the stores, find the inventory of pants at each, and issue instructions to move pants from store to store in order to balance the inventory. The whole operation is done by a single global transaction T that has component Ti at each ith store and component To at the office where the manager is located

Distributed Commit Component To is created at the site of the manager The sequence of activity performed by T are summarized below: Component To is created at the site of the manager To sends messages to all the stores instructing them to create components Ti Each Ti executes a query at store i to discover the number of pants in inventory and reports this number to To To takes these numbers and determines what shipments of pants are desired. To sends messages such as “store 10 should ship 100 pants to store 7” to the appropriate stores Stores receiving instructions update their inventory and perform the shipments

Failed Coordination of a Distributed Transaction (1) Committed T1 Active Aborted Crash/ recover T2 Committed Active Aborted Both T1 and T2 are ready to commit Crash/ recover

Failed Coordination of a Distributed Transaction (2) Committed T1 T1 committed; T2: crashes and recovers Active Aborted Crash/ recover T2 Committed What we need is some new state that has more flexibility than this simple state diagram Active Aborted Crash/ recover

Transaction State Diagram with Durable Prepared State Committed Active Precommitted Aborted Crash/ recover

Precommitted state Precommitted: a transaction becomes precommitted as a result of a request by a distributed transaction coordinator (TM). From a precommitted state, the transaction can enter either committed state or aborted state. In the event of a system crash and subsequent recovery, a transaction in a precommitted state returns to precommitted state.

Two-Phase Commit Protocol We assume that the atomicity mechanisms at each site assure that either the local component commits or it has no effect on the database state at that site – i.e. local components of a global transaction are atomic By enforcing the rule that either all components of a distributed transaction commit or none does, we make the distributed transaction itself atomic

Two-Phase Commit Protocol Some assumptions about the 2PC protocol: Each site logs actions at that site, but there is no global log One site, coordinator, plays a special role in deciding whether or not the distributed transaction can commit 2PC protocol involves sending certain messages between the coordinator and the other sites. As the message is sent, it is logged at the sending site, to aid in recovery if should it be necessary

Two-Phase Commit Protocol – Phase 1 The coordinator places a log record <Prepare T> on the log at its site The coordinator sends to each component’s site the message <prepare T> Each site receiving the message <prepare T> decides whether to commit or abort its component of T. The site can delay if the component has not yet completed its activity, but must eventually send a response If a site wants to commit its component of T, it must enter a state called precommitted. Once in the precommitted state, the site cannot abort its component of T without a directive to do so from the coordinator.

Two-Phase Commit Protocol – Phase 1 The following steps are done to become precommitted: Perform whatever steps are necessary to be sure the local component of T will not have to abort, even if there is a system failure followed by recovery at the site. So, all appropriate actions regarding the log must be taken so that T will be redone rather than undone in a recovery Place the record <Ready T> on the local log and flush the log to disk Send to the coordinator the message <ready T> If a site wants to abort its component of T, then it logs the record <Don’t commit T> and sends the message <don’t commit T> to the coordinator. It is safe to abort the component at this time, since T will surely abort even if only one component wants to abort

Two-Phase Commit Protocol – Phase 2 If the coordinator has received <ready T> from all components of T, then it decides to commit T. The coordinator: Logs <Commit T> at its site, and Sends message <commit T> to all sites involved in T If the coordinator has received <don’t commit T> from one or more sites, then it decides to abort T. The coordinator: Logs <Abort T> at its site, and Sends message <abort T> to all sites involved in T

Two-Phase Commit Protocol – Phase 2 If a site receives a <commit T> message, it commits the component of T at that site, logging <Commit T> as it does If a site receives the message <abort T>, it aborts the component of T at that site, and writes the log record <Abort T>

Failure model Message losses: a message does nor arrive at the destination process because of a network failure Message duplications: some network component may end up duplicating a message Transient process failures: one or more of the involved processes, participants or coordinators, exhibit a soft crash and need to be restarted, but without any damage to data on secondary storage Idea: distribute the responsibilities for handling various failure classes among transactional federation and the underlying communication system

Failure model 2PC does not make any assumptions about the underlying communication system (datagrams as well as session oriented protocols) Omission failures versus commission failures – the later ones would lead to the class of distributed consensus protocols known as Byzantine agreement We take into account only omission failures

Prepared log entry for in-doubt transactions Participant that replies „yes” to the coordinator’s poll in the first message round can then crashes. The participant can no longer perform local crash recovery as if there were no distributed transactions at all The restarted participant needs to check back with the coordinator first before it can decide to consider the transaction commited. Thus, every participant that votes „yes” must actually be prepared to go either way – redo or undo teh updates.

Two Phase Commit (cd) There are three places in 2PC where a process is waiting for a message: the participant is waiting for the <prepare T> message from the coordinator the coordinator is waiting for the <ready T> message from all the participants the participant is waiting for the <commit T> or <abort T> message from the coordinator

Two Phase Commit (cd) Timeout Actions : ad.1. unilateral abort after time_out (participant) ad.2. unilateral abort after time_out (coordinator) ad.3. the site can be blocked

Restart and Termination Protocols 2PC is robust with respect to failures in the sense that each failed and restarted process resumes its work in the last remembered state 2PC is not robust with respect to failures in the sense that all processes guarantee active progress toward a global commit or rollback Two extensions of the basic 2PC A restart protocol – specifies how failed and restarted protocol should proceed A termination protocol – specifies how a process should behave upon a timeout while it is waiting for some message

Restart and Termination Protocols As all participants follow the same protocol, we have to specify four cases: The coordinator restart protocol – the continuation of the coordinator’s protocol after a coordinator restart The coordinator termination protocol – the coordinator behavior upon timeout The participant restart protocol – the continuation of the participant’s protocol after a participant restart The participant termination protocol – the participant behavior upon timeout

Termination protocol for 2PC The simplest termination protocol is the following: The participant P remains blocked until it can re-establish communication with the coordinator. Then the coordinator can tell P the appropriate decision Drawback: P may be unnecessarily blocked for a long time Participants can exchange messages directly (without the mediation of the coordinator) We assume that coordinator attaches the list of the participant’s identities to the <prepare T> message it sends to each of them The fact that participants do not know each other before receiving the message is of no consequence, since a participant may unilaterally decide to abort transaction

Termination protocol for 2PC The cooperative termination protocol : A participant P (initiator) that times out while in its uncertainty period sends a <decision request> message to every other process Q (responder) to inquire whether Q either knows the decision or can unilaterally reach one Q has already decided Commit (or Abort) – Q sends a commit or abort to P, and P decides accordingly Q has not voted yet – Q can unilaterally decide Abort. It then sends an <abort T> to P, and P decides Abort Q has voted Yes but has not yet reached a decision

2PC with restart and termination protocols Statechart for 2PC with restart and termination protocols: F transitions are triggered during restart after a process failure. Once the process’s last state is determined from the log entries on the process’s stable log, the transition is made without any further preconditions T transitions are triggered upon timeout and are also made without further preconditions

2PC statechart - coordinator T or F initial T or F prepare_to_commit variant collecting commit abort abort commit T or F Ack T or F C-pending A-pending forgotten

2PC statechart - participant T or F initial Prepared/ no Prepared/ yes T or F prepared commit/ ack abort/ ack committed aborted Commit/ack Abort/ack

Communication Topologies for 2PC The communication topology of a protocol is the specification of who sends messages to whom Centralized 2PC Coordinator Participants

Communication Topologies for 2PC Decentralized 2PC (reduce time complexity) Linear 2PC (reduce message complexity)

Three Phase Commit (3PC) The protocol is non-blocking in the absence of total site failures The protocol may cause inconsistent decisions to be reached in the event of communication failures We assume that only site failures occur All operational sites can communicate with each other Process that times out waiting for a message from process q knows that q is down and therefore that no processing can be taking place there (no other process can be communicating with q)

Three Phase Commit (3PC) Which of the processes involved in a transaction takes the role of the coordinator ? The choice of the coordinator depends on: Transaction initiator client PC, application server, Web server, database server, ORB Reliability and speed of participants – how many participants does the transaction involve? Communication topology and protocol The simplest way – transaction initiator is chosen as the coordinator If the initiator is a client, it often makes more sense to choose a data server as a coordinator

Three Phase Commit (3PC) Nonblocking property: If any operational process is uncertain then no process (whether operational or failed) can have decided to Commit If the operational sites discover they are all uncertain, they can decide Abort, safe in their knowledge that the failed processes had not decided Commit. When the failed processes recover they can be told to decide Abort. This way blocking is prevented Does 2PC obeys NB property? NO

Three Phase Commit (3PC) The coordinator sends a <prepare T> message to all participants When a participant receives a <prepare T> message, it responds with YES or NO message, depending on its vote. If a participant sends NO, it decides Abort and stops The coordinator collects the vote messages from all participants. If any of them was NO or if the coordinator voted No, then the coordinator decides Abort, sends <abort T> to all participants that voted YES, and stops. Otherwise, the coordinator sends <pre-commit T> messages to all participants

Three Phase Commit (3PC) A participant that voted Yes waits for <pre-commit T> or <abort T> message from the coordinator. If it receives an <abort T> , the participant decides Abort and stops. If it receives a <pre-commit T> , it responds with an ACK message to the coordinator The coordinator collects the ACKs. When they have all been received, it decides Commit, sends <commit T> to all participants, and stops A participant waits for a <commit T> message from the coordinator. When it receives that message, it decides Commit and stops

3PC – timeout actions What a process should do when it times out depends on the message it is was waiting for: In step (2) participants wait for <prepare T> message In step (3) the coordinator waits for the votes In step (4) participants wait for a <pre-commit T> or <abort T> message In step (5) the coordinator waits for ACKs In step (6) participants wait for <commit T> message

3PC – timeout actions Cases (1) and (2) are handled exactly as in 2PC In case (4) the coordinator may decide Commit while some failed participant is uncertain In case (3) the participant must communicate with other processes to reach a consistent decision In case (5) the participant must communicate with other processes to reach a consistent decision. Why can participant ignore the timeout and simply decide commit? The participant could violates condition NB (the coordinator failed after sending pre-commit to p but before sending it to q

Termination protocol for 3PC The state of the process relative to the message it has sent or received: Aborted: the process has not voted No, has voted No, or has received <abort T> Uncertain: the process is in its uncertain period Committable: the process has received a <pre-commit T>, but has not yet received a <commit T> Committed: the process has received a <commit T> When a participant times out in cases (3) or (5) it initiates an election protocol The election protocol involves all processes that are operational and results in the “election” of a new coordinator (the old one must have failed, otherwise no participant would have timed out!)

Termination protocol for 3PC The coordinator sends a <state-req> message to all participants that participated in the election. Each participant responds to this message by sending its state. The coordinator collects these states and proceeds according to the following termination protocol: TR1: If some process is Aborted, the coordinator decides Abort, sends <abort T> messages to all participants, and stops.   TR2: If some process is Committed, the coordinator decides Commit, sends <commit T> messages to all participants, and stops.

Termination protocol for 3PC TR3: If all processes that reported their state are Uncertain, the coordinator decides Abort, sends <abort T> messages to all participants, and stops. TR4: If some process is Committable but none is Committed, the coordinator first sends <pre-commit T> messages to all processes that reported Uncertain, and waits for acknowledgements from these processes. After having received these acknowledgements the coordinator decides Commit, sends <commit T> messages to all participants, and stops. A participant that receives a <commit T> (or <abort T>) message, decides Commit (or Abort), and stops

Correctness of 3PC and its Termination Protocol Theorem: In the absence of total failures, 3PC and its termination protocol never cause processes to block Theorem: Under 3PC and its termination protocol, all operational processes reach the same decision

Tree 2PC algorithm Which of the processes involved in a transaction takes the role of the coordinator ? LAN vs WAN Everybody can talk to everybody The initiator does not know all participants (and probably cannot communicate with them directly and efficiently) Observation: During the execution of a transaction, the involved processes dynamically form a tree with the transaction initiator as the root; each edge in the tree corresponds to a dynamically established communication link

Tree 2PC algorithm In the hierarchical commit protocol the message flow and writing of log entries follows from the two roles of an intermediate node, participant with regard to its caller and coordinator for its subtree Process 0 initiator Process 3 Process 1 Process 4 Process 5 Process 2 Communication during commit protocol

.......... initiator Process 1 Process 2 Process 3 Process 4 Process 5 prepare prepare prepare prepare prepare Yes Yes Yes Yes Yes commit commit ..........

Optimized Algorithms for Distributed Commit What we can do: Reducing the number of messages and the number of log writes Shortening the „critical path” from the begin of the commit protocol to the point when local locks can be released Reducing the probability of blocking

Optimized Algorithms for Distributed Commit The number of messages and forced log writes can be reduced by introducing specific conventions for the presumed behavior of a process (i.e. default reaction) in the absence of more explicit information e.g. If a participant did not force a commit or abort log entry and thus could lose this info in a local crash, this participant could by default contact the coordinator to obtain missing information (but coordinator has already truncated its log and forgotten the transaction)

Optimized Algorithms for Distributed Commit Basic 2PC: 4n messages; 2N+2 forced log entries (coordinator - begin and commit or rollback log entry) When a certain piece of information about transaction’s behavior is missing – two variations of 2PC Presumed abort (PA) Presumed commit (PC) Presumed abort 2PC has been considered as the method of choice and has been selected for the industry standard XA

Heterogeneous Commit Coordinators As networks grow together, applications increasingly must access servers and data residing on heterogeneous systems, performing transactions that involve multiple nodes of a network. These nodes have different TP monitors and different commit protocols If each transaction manager supports a standard and open commit protocol, then portability and interoperability among them are relatively easy to achieve There are two standard commit protocols and application programming interfaces: LU6.2 (IBM) - embodied by CICS and CICS API OSI-TP - combined with X/Open DTP

Closed versus Open Transaction Managers Some transaction managers have an open commit protocol: resource managers can participate in the commit decision, and the commit message formats and protocols are public CICS, Decdtm, TOPEND (NCR), Transarc TM Some transaction managers have a closed commit protocol: resource managers cannot participate in the commit decision. The term is used to describe TP systems that have only private protocols and therefore cannot cooperate with other transaction processing systems IBM’s IMS, Tandem’s TMF