Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSS434 Distributed Transactions and Replication

Similar presentations


Presentation on theme: "CSS434 Distributed Transactions and Replication"— Presentation transcript:

1 CSS434 Distributed Transactions and Replication
Textbook Ch Professor: Munehiro Fukuda CSS434 Replication

2 Outline Distributed transaction File replication
Two-phase commitment protocol File replication Group communication revisited Primary copy replication Active replication Read-any-write-all protocol Available copy protocol Quorum-based protocol CSS434 Replication

3 Distributed Transaction Example: Banking Transaction
. BranchZ BranchX participant C D Client BranchY B A join T a.withdraw(4); c.deposit(4); b.withdraw(3); d.deposit(3); openTransaction b.withdraw(T, 3); closeTransaction T = Note: the coordinator is in one of the servers, e.g. BranchX CSS434 Replication

4 Transaction Commitment
How can all participant servers either commit a transaction or abort it? One-phase atomic commit protocol The coordinator keep requesting all participants to commit until they return an acknowledgment. No chance of a participant to initiate an abort. Two-phase commit protocol Phase 1: calls for participants’ vote. Phase 2: Complete a commit or an abort according to output of vet. CSS434 Replication

5 Two-Phase Commit Protocol operations
canCommit?(trans)-> Yes / No Call from coordinator to participant to ask whether it can commit a transaction. Participant replies with its vote. doCommit(trans) Call from coordinator to participant to tell participant to commit its part of a transaction. doAbort(trans) Call from coordinator to participant to tell participant to abort its part of a transaction. haveCommitted(trans, participant) Call from participant to coordinator to confirm that it has committed the transaction. getDecision(trans) -> Yes / No Call from participant to coordinator to ask for the decision on a transaction after it has voted Yes but has still had no reply after some delay. Used to recover from server crash or delayed messages. CSS434 Replication

6 Two-Phase Commit Protocol Communication
canCommit? Yes doCommit haveCommitted Coordinator 1 3 (waiting for votes) committed done prepared to commit step Participant 2 4 (uncertain) status CSS434 Replication

7 Two-Phase Commit Protocol State Transition
INIT WAIT ABORT COMMIT Client_wants_to_commit CanCommit? Vote-No doAbort Vote-Yes doCommit Coordinator INIT READY ABORT COMMIT CanCommit? Vote-Yes doCommit Ack doAbort Vote-No Worker 1 INIT READY ABORT COMMIT CanCommit? Vote-Yes doCommit Ack doAbort Vote-No Worker 2 Another possible cases: The coordinator didn’t receive all vote-Yes. → Time out and send a doAbort. A worker didn’t receive a CanCommit?. → All workers eventually receive a doAbort. A worker didn’t receive a doCommit. → Time out and check the other work’s status. CSS434 Replication

8 File Replication Concepts
Difference between replication and caching A replica is associated with a server, whereas a cache with client. A replicate focuses on availability, while a cache on locality A replicate is more persistent than a cache is A cache is contingent upon a replica Advantages Increased availability/reliability Performance enhancement (response time and network traffic) Scalability and autonomous operation Requirements Naming: no need to be aware of multiple replicas. Consistency: data consistency among replicated files. Replication control: explicit v.s. implicit/lazy replication ACID: Atomicity, Consistency, Isolation, and Durability CSS434 Replication

9 File Replication Basic Architectural Model
Request: send a client request to a server. Coordination: deliver the request to each replica manger in some order. Execution: process a client request but not permanently commit it. Agreement: agree if the execution will be committed (ex. Two-phase commit protocol) Response: respond to the front end Client Replica Manger Front End Replica Manger Client Front End Replica Manger Ex: DNS Web server CSS434 Replication

10 Review: Group Communication
Group membership service Create and destroy a group. Add or withdraw a replica manager to/from a group. Detect a failure. Notify members of group membership changes. Provide clients with a group address. Message delivery Absolute ordering Consistent ordering Replica Manger Replica Manger Client Replica Manger Replica Manger group CSS434 Replication

11 Review: Group Communication Example: ISIS
Group view multicast p1 Joins the group multicast p2 crashed rejoins multicast p3 p4 Deleted or delivered? Multicast to available processes In ISIS, if P4 receives this partially multicast message at the same time when it knows p3 has been crashed, it forwards it to all the others and immediately sends a flush message. In other words, P1, P2, and P4 receive this multicast message as if P3 was still alive. CSS434 Replication

12 Review: Group Communication Absolute Ordering - Linearizability
Rule: Mi must be delivered before mj if Ti < Tj Implementation: A clock synchronized among machines A sliding time window used to commit message delivery whose timestamp is in this window. Example: Distributed simulation Drawback Too strict constraint No absolute synchronized clock No guarantee to catch all tardy messages Ti < Tj Ti mi Tj mi mj mj CSS434 Replication

13 Review: Group Communication Total Ordering - Sequential Consistency
Rule: Messages received in the same order (regardless of their timestamp). Implementation: A message sent to a sequencer, assigned a sequence number, and finally multicast to receivers A message retrieved in incremental order at a receiver Example: Replicated database update Drawback: A centralized algorithm Ti < Tj Ti Tj mj mj mi mi CSS434 Replication

14 Multi-copy Update Problem
Keep in mind the basic architecture and group communication models, how can we update multiple copies over replica servers? Read-only replication Allow the replication of only immutable files. Primary backup replication Designate one copy as the primary copy and all the others as secondary copies. Active backup replication Access any or all of replicas Read-any-write-all protocol Available-copies protocol Quorum-based consensus CSS434 Replication

15 Primary-Copy Replication
Request: The front end sends a request to the primary replica. Coordination:. The primary takes the request atomically. Execution: The primary executes and stores the results. Agreement: The primary sends the updates to all the backups and receives an ask from them. Response: reply to the front end. Advantage: an easy implementation, linearizable, coping with n-1 crashes. Disadvantage: large overhead especially if the failing primary must be replaced with a backup. Client Replica Manger Front End Primary Backup Replica Manger Client Front End Replica Manger Backup Ex: Sun NIS (Yellow Page) CSS434 Replication

16 Active Replication Request: The front end multicasts to all replicas.
Coordination:. All replica take the request in the sequential order. Execution: Every replica executes the request. Agreement: No agreement needed. Response: Each replies to the front. Advantage: achieve sequential consistency, cope with (n/2 – 1) byzantine failures Disadvantage: no more linearizable Client Replica Manger Front End Replica Manger Client Front End Replica Manger CSS434 Replication

17 Read-Any-Write-All Protocol
Read from any one of them Read Lock any one of replicas for a read Write Lock all of replicas for a write Sequential consistency Intolerable for even 1 failing replica upon a write. Client Replica Manger Front End Write to all of them Replica Manger Client Front End Replica Manger Replica Manger CSS434 Replication

18 Available-Copies Protocol
Read Lock any one of replicas for a read Write Lock all available replicas for a write Recovering replica Bring itself up to date by coping from other servers before accepting any user request. Better availability Cannot cope with network partition. (Inconsistency in two sub-divided network groups) Read from any one of them Client Replica Manger Front End Write to all available replicats Replica Manger X Client Front End Replica Manger Replica Manger CSS434 Replication

19 Available Copies Protocol Example 1: Gossip
If (Tj > Tk) update RMk else discard the gossip message Categorized in lazy available copies protocol Tardy messages are ignored RMk Gossip RMi (Ti) RMj (Tj) Update, Tf Query, Tf Value, Ti Update id If (Tf < Ti) return value else { waits for RMi to be updated or query RMj/RMk} If (Tf > Tj) update RMj else { update Client or ignore and update RMj} FE (Tf) FE Query Value Update Client Client CSS434 Replication

20 Available Copies Protocol Example 2: Bayou
Committed Tentative Categorized in lazy available copies protocol Tardy messages are reordered or merged. Primary RM FE Client Tn T3 Secretary and other employees: book 3pm Executive: book 3pm Sent first Sent later T1 T0 C0 C1 C2 CN T0 T1 T2 T3 Tn Tn+1 To make a tentative update committed: Perform a dependency check Check conflicts Check priority Merge Procedure Cancel tentative updates Change tentative updates CSS434 Replication

21 Network Partitions Well-known Solution: Quorum-Based Protocols
#replicas in read quorum + #replicas in write quorum > n Read Retrieve the read quorum Select the one with the latest version. Perform a read on it Write Retrieve the write quorum. Find the latest version and increment it. Perform a write on the entire write quorum. If a sufficient number of replicas from read/write quorum, the operation must be aborted. Client Replica Manger Front End Read quorum Write quorum Read-any-write-all: r = 1, w = n CSS434 Replication

22 Network Partitions System example: Coda
Normal case: Read-any, write-all protocol Whenever a client writes back its file, it increments the file version at each server. Version[1,1,1] W Version[2,2,2] Network disconnection: A client writes back its file to only available servers. Version conflicts are detected and resolved automatically when network is reconnected W Version[2,2,3] Version[3,3,2] Client disconnection: A client caches as many files as possible (in hoard walking). A client works in local if disconnected (in emulation mode). A client writes back updated files to servers (in reintegration mode). hoard reintegration emulation Server 2 Server 3 Server 1 CSS434 Replication

23 Paper Review by Students
ISIS System Gossip Architecture Bayou System Coda Discussions What if a message is lost in ISIS group communication? What if another crash occurs when unstable/flush messages are exchanged? What performance drawbacks does Gossip have? What problems remain to users in Bayou? Why doesn’t Coda use read/write quorum? CSS434 Replication

24 Non-Turn-In Exercises
The following state transition diagram describes the two-phase commitment protocol. Let’s assume that worker1 crashed when a coordinate sent a commit message. Trace this diagram. To be specific, make appropriate dashed arrows “thick and solid arrows” with your pen or pencil. INIT WAIT ABORT COMMIT Client_wants_to_commit CanCommit? Vote-No doAbort Vote-Yes doCommit Coordinator INIT READY ABORT COMMIT CanCommit? Vote-Yes doCommit Ack doAbort Vote-No Worker 1 INIT READY ABORT COMMIT CanCommit? Vote-Yes doCommit Ack doAbort Vote-No Worker 2 CSS434 Replication

25 Non-Turn-In Exercises
Textbook p762, Q17.1: In a decentralized variant of the two-phase commit protocol, the participants communicate directly with one another instead of indirectly via the coordinator. In phase 1, the coordinator sends its vote to all the participants. In phase 2, if the coordinator’s vote is No, the participants just abort the transaction; if it is Yes, each participant sends its vote to the coordinator and the other participants, each of which decides on the outcome according to the vote and carries it out. Calculate the number of messages and the number of rounds it takes. What are its advantages or disadvantages in comparison with the centralized variant? Textbook p816, Q18.10: Explain why allowing backups to process read operations directly, (i.e., without contacting a primary), leads to sequentially consistent rather than linearizable executions in a primary-copy replication. Textbook p816, Q18.11: Could the gossip architecture be used for a distributed computer game as describe below? The players move figures around a common scene. The state of the game is replicated at the players’ workstations and at a server, which contains services controlling the game overall, such as collision detection. Updates are multicast to all replicas. The quorum-based replication protocol can address network partition problems. Why didn’t Coda use this protocol? Explain the reason. What if a message is lost in ISIS group communication? Describe a solution. CSS434 Replication


Download ppt "CSS434 Distributed Transactions and Replication"

Similar presentations


Ads by Google