CS 603 Handling Failure in Commit February 20, 2002.

Slides:



Advertisements
Similar presentations
6.852: Distributed Algorithms Spring, 2008 Class 7.
Advertisements

CS542: Topics in Distributed Systems Distributed Transactions and Two Phase Commit Protocol.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 12: Three-Phase Commits (3PC) Professor Chen Li.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
CIS 720 Concurrency Control. Timestamp-based concurrency control Assign a timestamp ts(T) to each transaction T. Each data item x has two timestamps:
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Distributed Transaction Processing Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of.
Termination and Recovery MESSAGES RECEIVED SITE 1SITE 2SITE 3SITE 4 initial state committablenon Round 1(1)CNNN-NNNN Round 2FAILED(1)-CNNN--NNN Round 3FAILED.
Transaction Processing Lecture ACID 2 phase commit.
Two phase commit. What we’ve learnt so far Sequential consistency –All nodes agree on a total order of ops on a single object Crash recovery –An operation.
CS 582 / CMPE 481 Distributed Systems
CS 603 Distributed Transactions February 18, 2002.
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
The Atomic Commit Problem. 2 The Problem Reaching a decision in a distributed environment Every participant: has an opinion can veto.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
1 Distributed Databases CS347 Lecture 16 June 6, 2001.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
1 CS 194: Distributed Systems Distributed Commit, Recovery Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering.
CS 603 Three-Phase Commit February 22, Centralized vs. Decentralized Protocols What if we don’t want a coordinator? Decentralized: –Each site broadcasts.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
Failure and Availibility DS Lecture # 12. Optimistic Replication Let everyone make changes –Only 3 % transactions ever abort Make changes, send updates.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Distributed Commit. Example Consider a chain of stores and suppose a manager – wants to query all the stores, – find the inventory of toothbrushes at.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Commit Protocols. CS5204 – Operating Systems2 Fault Tolerance Causes of failure: process failure machine failure network failure Goals : transparent:
CS162 Section Lecture 10 Slides based from Lecture and
III. Current Trends: 2 - Distributed DBMSsSlide 1/47 III. Current Trends Distributed DBMSs: Advanced Concepts 3C13/D63C13/D6.
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
Distributed Txn Management, 2003Lecture 3 / Distributed Transaction Management – 2003 Jyrki Nummenmaa
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Distributed Transactions Chapter 13
Distributed Txn Management, 2003Lecture 4 / Distributed Transaction Management – 2003 Jyrki Nummenmaa
EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
1 Distributed and Replicated Data Seif Haridi. 2 Distributed and Replicated Data Purpose –Increase performance (parallel processing) –Increase safety.
Distributed Transaction Management, Fall 2002Lecture Distributed Commit Protocols Jyrki Nummenmaa
Fault Tolerance CSCI 4780/6780. Distributed Commit Commit – Making an operation permanent Transactions in databases One phase commit does not work !!!
University of Tampere, CS Department Distributed Commit.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
XA Transactions.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
More on Fault Tolerance Chapter 7. Topics Group Communication Virtual Synchrony Atomic Commit Checkpointing, Logging, Recovery.
Distributed Transactions Chapter – Vidya Satyanarayanan.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
Concurrency Control and Reliable Commit Protocol in Distributed Database Systems Jian Jia Chen 2002/05/09 Real-time and Embedded System Lab., CSIE, National.
Committed:Effects are installed to the database. Aborted:Does not execute to completion and any partial effects on database are erased. Consistent state:
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Multi-phase Commit Protocols1 Based on slides by Ken Birman, Cornell University.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Chapter 8 – Fault Tolerance Section 8.5 Distributed Commit Heta Desai Dr. Yanqing Zhang Csc Advanced Operating Systems October 14 th, 2015.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
More on Fault Tolerance
Chapter 19: Distributed Databases
Outline Introduction Background Distributed DBMS Architecture
Database System Implementation CSE 507
Two phase commit.
Commit Protocols CS60002: Distributed Systems
Outline Introduction Background Distributed DBMS Architecture
Outline Announcements Fault Tolerance.
Outline Introduction Background Distributed DBMS Architecture
Assignment 5 - Solution Problem 1
Distributed Transactions
Distributed Databases Recovery
Chapter 19: Distributed Transaction Recovery
CIS 720 Concurrency Control.
Presentation transcript:

CS 603 Handling Failure in Commit February 20, 2002

Two-Phase Commit (Lamport ’76, Gray ’79) Central coordinator initiates protocol –Phase 1: Coordinator asks if participants can commit Participants respond yes/no –Phase 2: If all votes yes, coordinator sends Commit Participants respond when done Blocks on failure –Participants must replace coordinator –If participant and coordinator fail, wait for recovery While blocked, transaction must remain Isolated –Prevents other transactions from completing

Formal Recovery Models (Skeen & Stonebraker ’83) Formal models for commit protocols –Transaction state –Failure Protocol types –Commit –Termination –Recovery Necessary Conditions for non-blocking –Leads to Three-Phase Protocol

Background Transaction has “commit point” –Failure after  transaction visible –Failure before  abort as part of recovery Protocol needed to ensure commit/abort decision unanimous –Any site can abort before first site commits –No site can abort after first site commits Non-Blocking Protocol: Failure doesn’t cause operational sites to suspend

Transaction Model Network: –Point to point communication –Maximum delay T or timeout If timeout, sender can assume recipient of network failure If network failure, sender doesn’t know if message received Each site can be viewed as Finite State Automaton –Sites have three state “classes”: Initial: Allowed to abort the transaction Abort: Can’t transition to non-abort Commit: Can’t transition to non-commit –Nondeterministic with respect to protocol, i.e., State transitions may occur independent of protocol due to local actions –State diagram is acyclic (assures termination) –Site transitions asynchronous –State transitions atomic

State Transition Graph for Two-Phase Commit c1c1 a1a1 c2c2 a2a2 q1q1 w1w1 p2p2 q2q2 xact request/ start xact no/ - start xact/ no start xact/ yes commit/ - abort/ - yes/ abort yes/ commit CoordinatorParticipant

Transaction Model Global Transaction State –Vector of local states –Outstanding messages in the network –Final if all local states in final state –Inconsistent if vector contains both commit and abort states –Global state transition occurs when local state transition occurs Reachable State Graph –Possible global transitions –Protocol correct if and only if reachable state graph has: No inconsistent states All terminal states are final states Local states potentially concurrent if a reachable global state contains both local states –Concurrency set C(s) is all states potentially concurrent with s Sender set S(s) = {local states t | t sends m and s can receive m}

Reachable Global State Graph for Two-Phase Commit Leaf nodes are terminal states –All contain only final states No nodes have both “abort” and “commit” –Protocol consistent Therefore 2-phase commit is operationally correct

Failure Models Site failure assumed when expected message not received in time –Modeled as “failure transition” Assumption: A site knows it has failed –Allow multiple failure transitions from a state Different types of failure Independent Recovery –Transition directly to final state without communication –Paper discusses only two site case

Single Site Failure Lemma 1: If a protocol contains a local state s with both abort and commit in C(s), cannot independently recover –C(s) has both abort and commit –s cannot fail to either abort or commit 2PC: C(p 2 ) has both commit and abort! Rule 1: If C(s) has commit, insert failure transition from s to commit, else insert failure from s to abort

Modified 2PC with Rule 1 c1c1 a1a1 c2c2 a2a2 q1q1 w1w1 p2p2 q2q2 xact request/ start xact no/ - start xact/ no start xact/ yes commit/ done abort/ - yes/ abort yes/ commit CoordinatorParticipant p1p1 done/ -

Handling Timeout Rule 2: For each intermediate state s –if t is in S(s) and t had a failure transition to a commit (abort) state, then –assign a timeout transition from s to a commit (abort) state Assumption: Failed state will independently recover –Rule 1 forces transition to commit / abort –Rule 2 forces “live” transaction to do same

Modified 2PC with Rules 1,2 c1c1 a1a1 c2c2 a2a2 q1q1 w1w1 p2p2 q2q2 xact request/ start xact no/ - start xact/ no start xact/ yes commit/ done abort/ - yes/ abort yes/ commit CoordinatorParticipant p1p1 done/ -

Theorem: Sufficient Conditions for Handling Failure Rules 1 and 2 are sufficient for designing protocols resilient to a single site failure Proof: Let P be protocol s.t. there is no s where C(s) contains commit and abort –P’ is P modified by Rules 1 and 2 –Site 1 fails in state s 1 when Site 2 in s 2 Transitions to f 1 inconsistent with f 2 Case 1: Site t 2 in final state f 2 –Implies f 2 in C(s 1 ) – violates Rule 1 Case 2: Site 2 in nonfinal state, timeouts to f 2 –Implies s 1  S(s 2 ) – violates Rule 2

Two site failure Theorem: No protocol using independent recovery resilient to arbitrary two site failures –Holds only if failures concurrent (both sites fail without knowing other has failed) Proof: Assume path in global state graph G 0, …, G m, all sites recover to abort from G 0, to commit from G m Let G k be first state where first site j recovers to commit –j recovers to abort in G k-1 –j was only site to transition between G k and G k-1 –All other sites will recover same in G k and G k-1 –So either G k or G k-1 inconsistent if j and another site fail Key: No non-blocking recovery –Doesn’t mean operational sites have to block