CSS434 Distributed Transactions and Replication

Slides:



Advertisements
Similar presentations
Winter, 2004CSS490 MPI1 CSS490 Group Communication and MPI Textbook Ch3 Instructor: Munehiro Fukuda These slides were compiled from the course textbook,
Advertisements

CS542: Topics in Distributed Systems Distributed Transactions and Two Phase Commit Protocol.
Slides for Chapter 13: Distributed transactions
Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.
Consistency and Replication (3). Topics Consistency protocols.
DISTRIBUTED SYSTEMS II REPLICATION CNT. II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Exercises for Chapter 17: Distributed Transactions
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CS 582 / CMPE 481 Distributed Systems
CSS490 Replication & Fault Tolerance
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems Fall 2009 Distributed transactions.
Distributed Commit Dr. Yingwu Zhu. Failures in a distributed system Consistency requires agreement among multiple servers – Is transaction X committed?
DISTRIBUTED SYSTEMS II AGREEMENT (2-3 PHASE COM.) Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Replication ( ) by Ramya Balakumar
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Distributed Transactions Chapter 13
Lecture 12: Distributed transactions Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.
CSE 486/586 CSE 486/586 Distributed Systems Concurrency Control Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Concurrency Control Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Transactions CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Nikita Borisov - UIUC Material derived from slides by I. Gupta, M. Harandi,
Exercises for Chapter 18: Replication From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley 2001.
Concurrency Server accesses data on behalf of client – series of operations is a transaction – transactions are atomic Several clients may invoke transactions.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
DISTRIBUTED SYSTEMS II AGREEMENT - COMMIT (2-3 PHASE COMMIT) Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Highly available services  we discuss the application of replication techniques to make services highly available. –we aim to give clients access to.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Distributed Transactions Chapter – Vidya Satyanarayanan.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
Fault Tolerance and Replication
Consensus and leader election Landon Cox February 6, 2015.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 17: Distributed.
 2002 M. T. Harandi and J. Hou (modified: I. Gupta) Distributed Transactions.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
A client transaction becomes distributed if it invokes operations in several different Servers There are two different ways that distributed transactions.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Lecture 13: Replication Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Transactions on Replicated Data Steve Ko Computer Sciences and Engineering University at Buffalo.
Fault Tolerance (2). Topics r Reliable Group Communication.
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
10-Jun-16COMP28112 Lecture 131 Distributed Transactions.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Distributed Computing Systems Replication Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,
Replication Chapter Katherine Dawicki. Motivations Performance enhancement Increased availability Fault Tolerance.
Exercises for Chapter 14: Replication
Advanced Operating System
Outline Announcements Fault Tolerance.
Active replication for fault tolerance
Replication and Recovery in Distributed Systems
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
CSE 486/586 Distributed Systems Concurrency Control --- 3
Slides for Chapter 14: Distributed transactions
Distributed Transactions
Slides for Chapter 15: Replication
Exercises for Chapter 14: Distributed Transactions
Distributed Transactions
UNIVERSITAS GUNADARMA
Distributed Transactions
EEC 688/788 Secure and Dependable Computing
Distributed Transactions
CSE 486/586 Distributed Systems Concurrency Control --- 3
Presentation transcript:

CSS434 Distributed Transactions and Replication Textbook Ch 14 - 15 Professor: Munehiro Fukuda CSS434 Replication

Outline Distributed transaction File replication Two-phase commitment protocol File replication Group communication revisited Primary copy replication Active replication Read-any-write-all protocol Available copy protocol Quorum-based protocol CSS434 Replication

Distributed Transaction Example: Banking Transaction . BranchZ BranchX participant C D Client BranchY B A join T a.withdraw(4); c.deposit(4); b.withdraw(3); d.deposit(3); openTransaction b.withdraw(T, 3); closeTransaction T = Note: the coordinator is in one of the servers, e.g. BranchX CSS434 Replication

Transaction Commitment How can all participant servers either commit a transaction or abort it? One-phase atomic commit protocol The coordinator keep requesting all participants to commit until they return an acknowledgment. No chance of a participant to initiate an abort. Two-phase commit protocol Phase 1: calls for participants’ vote. Phase 2: Complete a commit or an abort according to output of vet. CSS434 Replication

Two-Phase Commit Protocol operations canCommit?(trans)-> Yes / No Call from coordinator to participant to ask whether it can commit a transaction. Participant replies with its vote. doCommit(trans) Call from coordinator to participant to tell participant to commit its part of a transaction. doAbort(trans) Call from coordinator to participant to tell participant to abort its part of a transaction. haveCommitted(trans, participant) Call from participant to coordinator to confirm that it has committed the transaction. getDecision(trans) -> Yes / No Call from participant to coordinator to ask for the decision on a transaction after it has voted Yes but has still had no reply after some delay. Used to recover from server crash or delayed messages. CSS434 Replication

Two-Phase Commit Protocol Communication canCommit? Yes doCommit haveCommitted Coordinator 1 3 (waiting for votes) committed done prepared to commit step Participant 2 4 (uncertain) status CSS434 Replication

Two-Phase Commit Protocol State Transition INIT WAIT ABORT COMMIT Client_wants_to_commit CanCommit? Vote-No doAbort Vote-Yes doCommit Coordinator INIT READY ABORT COMMIT CanCommit? Vote-Yes doCommit Ack doAbort Vote-No Worker 1 INIT READY ABORT COMMIT CanCommit? Vote-Yes doCommit Ack doAbort Vote-No Worker 2 Another possible cases: The coordinator didn’t receive all vote-Yes. → Time out and send a doAbort. A worker didn’t receive a CanCommit?. → All workers eventually receive a doAbort. A worker didn’t receive a doCommit. → Time out and check the other work’s status. CSS434 Replication

File Replication Concepts Difference between replication and caching A replica is associated with a server, whereas a cache with client. A replicate focuses on availability, while a cache on locality A replicate is more persistent than a cache is A cache is contingent upon a replica Advantages Increased availability/reliability Performance enhancement (response time and network traffic) Scalability and autonomous operation Requirements Naming: no need to be aware of multiple replicas. Consistency: data consistency among replicated files. Replication control: explicit v.s. implicit/lazy replication ACID: Atomicity, Consistency, Isolation, and Durability CSS434 Replication

File Replication Basic Architectural Model Request: send a client request to a server. Coordination: deliver the request to each replica manger in some order. Execution: process a client request but not permanently commit it. Agreement: agree if the execution will be committed (ex. Two-phase commit protocol) Response: respond to the front end Client Replica Manger Front End Replica Manger Client Front End Replica Manger Ex: DNS Web server CSS434 Replication

Review: Group Communication Group membership service Create and destroy a group. Add or withdraw a replica manager to/from a group. Detect a failure. Notify members of group membership changes. Provide clients with a group address. Message delivery Absolute ordering Consistent ordering Replica Manger Replica Manger Client Replica Manger Replica Manger group CSS434 Replication

Review: Group Communication Example: ISIS Group view multicast p1 Joins the group multicast p2 crashed rejoins multicast p3 p4 Deleted or delivered? Multicast to available processes In ISIS, if P4 receives this partially multicast message at the same time when it knows p3 has been crashed, it forwards it to all the others and immediately sends a flush message. In other words, P1, P2, and P4 receive this multicast message as if P3 was still alive. CSS434 Replication

Review: Group Communication Absolute Ordering - Linearizability Rule: Mi must be delivered before mj if Ti < Tj Implementation: A clock synchronized among machines A sliding time window used to commit message delivery whose timestamp is in this window. Example: Distributed simulation Drawback Too strict constraint No absolute synchronized clock No guarantee to catch all tardy messages Ti < Tj Ti mi Tj mi mj mj CSS434 Replication

Review: Group Communication Total Ordering - Sequential Consistency Rule: Messages received in the same order (regardless of their timestamp). Implementation: A message sent to a sequencer, assigned a sequence number, and finally multicast to receivers A message retrieved in incremental order at a receiver Example: Replicated database update Drawback: A centralized algorithm Ti < Tj Ti Tj mj mj mi mi CSS434 Replication

Multi-copy Update Problem Keep in mind the basic architecture and group communication models, how can we update multiple copies over replica servers? Read-only replication Allow the replication of only immutable files. Primary backup replication Designate one copy as the primary copy and all the others as secondary copies. Active backup replication Access any or all of replicas Read-any-write-all protocol Available-copies protocol Quorum-based consensus CSS434 Replication

Primary-Copy Replication Request: The front end sends a request to the primary replica. Coordination:. The primary takes the request atomically. Execution: The primary executes and stores the results. Agreement: The primary sends the updates to all the backups and receives an ask from them. Response: reply to the front end. Advantage: an easy implementation, linearizable, coping with n-1 crashes. Disadvantage: large overhead especially if the failing primary must be replaced with a backup. Client Replica Manger Front End Primary Backup Replica Manger Client Front End Replica Manger Backup Ex: Sun NIS (Yellow Page) CSS434 Replication

Active Replication Request: The front end multicasts to all replicas. Coordination:. All replica take the request in the sequential order. Execution: Every replica executes the request. Agreement: No agreement needed. Response: Each replies to the front. Advantage: achieve sequential consistency, cope with (n/2 – 1) byzantine failures Disadvantage: no more linearizable Client Replica Manger Front End Replica Manger Client Front End Replica Manger CSS434 Replication

Read-Any-Write-All Protocol Read from any one of them Read Lock any one of replicas for a read Write Lock all of replicas for a write Sequential consistency Intolerable for even 1 failing replica upon a write. Client Replica Manger Front End Write to all of them Replica Manger Client Front End Replica Manger Replica Manger CSS434 Replication

Available-Copies Protocol Read Lock any one of replicas for a read Write Lock all available replicas for a write Recovering replica Bring itself up to date by coping from other servers before accepting any user request. Better availability Cannot cope with network partition. (Inconsistency in two sub-divided network groups) Read from any one of them Client Replica Manger Front End Write to all available replicats Replica Manger X Client Front End Replica Manger Replica Manger CSS434 Replication

Available Copies Protocol Example 1: Gossip If (Tj > Tk) update RMk else discard the gossip message Categorized in lazy available copies protocol Tardy messages are ignored RMk Gossip RMi (Ti) RMj (Tj) Update, Tf Query, Tf Value, Ti Update id If (Tf < Ti) return value else { waits for RMi to be updated or query RMj/RMk} If (Tf > Tj) update RMj else { update Client or ignore and update RMj} FE (Tf) FE Query Value Update Client Client CSS434 Replication

Available Copies Protocol Example 2: Bayou Committed Tentative Categorized in lazy available copies protocol Tardy messages are reordered or merged. Primary RM FE Client Tn T3 Secretary and other employees: book 3pm Executive: book 3pm Sent first Sent later T1 T0 C0 C1 C2 CN T0 T1 T2 T3 Tn Tn+1 To make a tentative update committed: Perform a dependency check Check conflicts Check priority Merge Procedure Cancel tentative updates Change tentative updates CSS434 Replication

Network Partitions Well-known Solution: Quorum-Based Protocols #replicas in read quorum + #replicas in write quorum > n Read Retrieve the read quorum Select the one with the latest version. Perform a read on it Write Retrieve the write quorum. Find the latest version and increment it. Perform a write on the entire write quorum. If a sufficient number of replicas from read/write quorum, the operation must be aborted. Client Replica Manger Front End Read quorum Write quorum Read-any-write-all: r = 1, w = n CSS434 Replication

Network Partitions System example: Coda Normal case: Read-any, write-all protocol Whenever a client writes back its file, it increments the file version at each server. Version[1,1,1] W Version[2,2,2] Network disconnection: A client writes back its file to only available servers. Version conflicts are detected and resolved automatically when network is reconnected W Version[2,2,3] Version[3,3,2] Client disconnection: A client caches as many files as possible (in hoard walking). A client works in local if disconnected (in emulation mode). A client writes back updated files to servers (in reintegration mode). hoard reintegration emulation Server 2 Server 3 Server 1 CSS434 Replication

Paper Review by Students ISIS System Gossip Architecture Bayou System Coda Discussions What if a message is lost in ISIS group communication? What if another crash occurs when unstable/flush messages are exchanged? What performance drawbacks does Gossip have? What problems remain to users in Bayou? Why doesn’t Coda use read/write quorum? CSS434 Replication

Non-Turn-In Exercises The following state transition diagram describes the two-phase commitment protocol. Let’s assume that worker1 crashed when a coordinate sent a commit message. Trace this diagram. To be specific, make appropriate dashed arrows “thick and solid arrows” with your pen or pencil. INIT WAIT ABORT COMMIT Client_wants_to_commit CanCommit? Vote-No doAbort Vote-Yes doCommit Coordinator INIT READY ABORT COMMIT CanCommit? Vote-Yes doCommit Ack doAbort Vote-No Worker 1 INIT READY ABORT COMMIT CanCommit? Vote-Yes doCommit Ack doAbort Vote-No Worker 2 CSS434 Replication

Non-Turn-In Exercises Textbook p762, Q17.1: In a decentralized variant of the two-phase commit protocol, the participants communicate directly with one another instead of indirectly via the coordinator. In phase 1, the coordinator sends its vote to all the participants. In phase 2, if the coordinator’s vote is No, the participants just abort the transaction; if it is Yes, each participant sends its vote to the coordinator and the other participants, each of which decides on the outcome according to the vote and carries it out. Calculate the number of messages and the number of rounds it takes. What are its advantages or disadvantages in comparison with the centralized variant? Textbook p816, Q18.10: Explain why allowing backups to process read operations directly, (i.e., without contacting a primary), leads to sequentially consistent rather than linearizable executions in a primary-copy replication. Textbook p816, Q18.11: Could the gossip architecture be used for a distributed computer game as describe below? The players move figures around a common scene. The state of the game is replicated at the players’ workstations and at a server, which contains services controlling the game overall, such as collision detection. Updates are multicast to all replicas. The quorum-based replication protocol can address network partition problems. Why didn’t Coda use this protocol? Explain the reason. What if a message is lost in ISIS group communication? Describe a solution. CSS434 Replication