Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 DISTRIBUTED SYSTEMS.

Slides:



Advertisements
Similar presentations
CS542: Topics in Distributed Systems Distributed Transactions and Two Phase Commit Protocol.
Advertisements

L-15 Fault Tolerance 1. Fault Tolerance Terminology & Background Byzantine Fault Tolerance Issues in client/server Reliable group communication 2.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Computer Science Lecture 18, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
Chapter 7 Fault Tolerance Basic Concepts Failure Models Process Design Issues Flat vs hierarchical group Group Membership Reliable Client.
Distributed Systems CS Fault Tolerance- Part III Lecture 15, Oct 26, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
"Failure is not an option. It comes bundled with your system.“ (--unknown)
Fault Tolerance A partial failure occurs when a component in a distributed system fails. Conjecture: build the system in a such a way that continues to.
1 CS 194: Distributed Systems Distributed Commit, Recovery Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering.
1 Fault Tolerance Chapter 7. 2 Fault Tolerance An important goal in distributed systems design is to construct the system in such a way that it can automatically.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
Fault Tolerance Dealing successfully with partial failure within a Distributed System. Key technique: Redundancy.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Distributed Commit Dr. Yingwu Zhu. Failures in a distributed system Consistency requires agreement among multiple servers – Is transaction X committed?
Real Time Multimedia Lab Fault Tolerance Chapter – 7 (Distributed Systems) Mr. Imran Rao Ms. NiuYu 22 nd November 2005.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
1 Distributed Systems Fault Tolerance Chapter 8. 2 Course/Slides Credits Note: all course presentations are based on those developed by Andrew S. Tanenbaum.
Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Distributed Transactions Chapter 13
Distributed Systems CS Fault Tolerance- Part III Lecture 19, Nov 25, 2013 Mohammad Hammoud 1.
Distributed Systems Principles and Paradigms Chapter 07 Fault Tolerance 01 Introduction 02 Communication 03 Processes 04 Naming 05 Synchronization 06 Consistency.
1 8.3 Reliable Client-Server Communication So far: Concentrated on process resilience (by means of process groups). What about reliable communication channels?
Fault Tolerance CSCI 4780/6780. Distributed Commit Commit – Making an operation permanent Transactions in databases One phase commit does not work !!!
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 7: Fault Tolerance 11/13/20151Distributed Systems - COMP 655.
More on Fault Tolerance Chapter 7. Topics Group Communication Virtual Synchrony Atomic Commit Checkpointing, Logging, Recovery.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
Fault Tolerance Chapter 7.
Fault Tolerance. Basic Concepts Availability The system is ready to work immediately Reliability The system can run continuously Safety When the system.
Kyung Hee University 1/33 Fault Tolerance Chap 7.
Reliable Communication Smita Hiremath CSC Reliable Client-Server Communication Point-to-Point communication Established by TCP Masks omission failure,
Fault Tolerance Chapter 7. Failures in Distributed Systems Partial failures – characteristic of distributed systems Goals: Construct systems which can.
Chapter 11 Fault Tolerance. Topics Introduction Process Resilience Reliable Group Communication Recovery.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Fault Tolerance Chapter 7. Basic Concepts Dependability Includes Availability Reliability Safety Maintainability.
1 CHAPTER 5 Fault Tolerance Chapter 5-- Fault Tolerance.
Fault Tolerance Chapter 7. Goal An important goal in distributed systems design is to construct the system in such a way that it can automatically recover.
Fault Tolerance (2). Topics r Reliable Group Communication.
1 Fault Tolerance Chapter 8. 2 Basic Concepts Dependability Includes Availability Reliability Safety Maintainability.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Chapter 8 – Fault Tolerance Section 8.5 Distributed Commit Heta Desai Dr. Yanqing Zhang Csc Advanced Operating Systems October 14 th, 2015.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
More on Fault Tolerance
Fault Tolerance Prof. Orhan Gemikonakli
Fault Tolerance Chap 7.
Chapter 8 Fault Tolerance Part I Introduction.
Reliable group communication
Outline Announcements Fault Tolerance.
Distributed Systems CS
Distributed Systems CS
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Advanced Operating System
Distributed Systems CS
Reliable Client-Server Communication
Distributed Databases Recovery
Fault Tolerance and Reliability in DS.
Distributed Systems - Comp 655
Last Class: Fault Tolerance
Presentation transcript:

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8/B Fault Tolerance Modified by Dr. Gheith Abandah

Overview Introduction to Fault Tolerance Basic Concepts Failure Models Failure Masking by Redundancy Process Resilience Design Issues Failure Masking and Replication Agreement in Faulty Systems Failure Detection Reliable Client-server Communication Point-to-point Communication RPC Semantics in The Presence of Failures Reliable Group Communication Basic Reliable-multicasting Schemes Scalability in Reliable Multicasting Atomic Multicast Distributed Commit Two-phase Commit Three-phase Commit Recovery Check-pointing Message Logging Recovery-oriented Computing Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

Reliable Client-Server Communication Error detection Framing of packets to allow for bit error detection Use of frame numbering to detect packet loss Error correction Add so much redundancy that corrupted packets can be automatically corrected Request retransmission of lost, or last N packets

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Reliable RPC Five different classes of failures that can occur in RPC systems: 1.The client is unable to locate the server: Report back to client. 2.The request message from the client to the server is lost: Resend message (some number of retries). 3.The server crashes after receiving a request: Difficult, see next slide. 4.The reply message from the server to the client is lost: Client re-requests with same sequence no. 5.The client crashes after sending a request: New epoch no. after client reboots.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Server Crashes (1) A server in client-server communication. (a) The normal case. (b) Crash after execution. (c) Crash before execution.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Server Crashes (2) We need to decide on what we expect from the server At-least-once-semantics: The server guarantees it will carry out an operation at least once, no matter what. At-most-once-semantics: The server guarantees it will carry out an operation at most once. Three events that can happen at the server: Send the completion message (M), Print the text (P), Crash (C).

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Server Crashes (3) These events can occur in six different orderings: 1.M →P →C: A crash occurs after sending the completion message and printing the text. 2.M →C (→P): A crash happens after sending the completion message, but before the text could be printed. 3.P →M →C: A crash occurs after sending the completion message and printing the text. 4.P→C(→M): The text printed, after which a crash occurs before the completion message could be sent. 5.C (→P →M): A crash happens before the server could do anything. 6.C (→M →P): A crash happens before the server could do anything.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Server Crashes (4) Different combinations of client and server strategies in the presence of server crashes.

Overview Introduction to Fault Tolerance Basic Concepts Failure Models Failure Masking by Redundancy Process Resilience Design Issues Failure Masking and Replication Agreement in Faulty Systems Failure Detection Reliable Client-server Communication Point-to-point Communication RPC Semantics in The Presence of Failures Reliable Group Communication Basic Reliable-multicasting Schemes Scalability in Reliable Multicasting Atomic Multicast Distributed Commit Two-phase Commit Three-phase Commit Recovery Check-pointing Message Logging Recovery-oriented Computing Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

Reliable Multicasting Observation If we can stick to a local-area network, reliable multicasting is “easy” Principle Let the sender log messages submitted to channel c: If P sends message m, m is stored in a history buffer Each receiver acknowledges the receipt of m, or requests retransmission at P when noticing message lost Sender P removes m from history buffer when everyone has acknowledged receipt

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Scalable Reliable Multicasting: Feedback Suppression Basic idea Let a process P suppress its own feedback when it notices another process Q is already asking for a retransmission Assumptions All receivers listen to a common feedback channel to which feedback messages are submitted Process P schedules its own feedback message randomly, and suppresses it when observing another feedback message

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Basic Reliable-Multicasting Schemes A simple solution to reliable multicasting when all receivers are known and are assumed not to fail. (a) Message transmission. (b) Reporting feedback.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Scalable Reliable Multicasting: Feedback Suppression Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Scalable Reliable Multicasting: Hierarchical Solutions The essence of hierarchical reliable multicasting. Each local coordinator forwards the message to its children and later handles retransmission requests.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Atomic Multicast (1) Basic model We have a multicast channel c with two (possibly overlapping) groups: The sender group SND(c) of processes that submit messages to channel c The receiver group RCV(c) of processes that can receive messages from channel c Simple reliability: If process P  RCV(c) at the time message m was submitted to c, and P does not leave RCV(c), m should be delivered to P Atomic multicast: How can we ensure that a message m submitted to channel c is delivered to process P  RCV(c) only if m is delivered to all members of RCV(c)

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Atomic Multicast (2) Reliable multicasting in the presence of process failures in terms of process groups and changes to group membership. A message is delivered only to the non-faulty members of the current group. All members should agree on the current group membership →Virtually synchronous multicast.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Message Ordering (1) Four different orderings are distinguished: Unordered multicasts FIFO-ordered multicasts Causally-ordered multicasts Totally-ordered multicasts

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Message Ordering (2) Unordered: Three communicating processes in the same group. The ordering of events per process is shown along the vertical axis.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Message Ordering (3) FIFO ordered from one process: Four processes in the same group with two different senders, and a possible delivery order of messages under FIFO-ordered multicasting

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Message Ordering (4) Six different versions of virtually synchronous reliable multicasting.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Implementing Virtual Synchrony (1) Example from Isis (uses TCP for reliable p2p): (a) Process 4 notices that process 7 has crashed and sends a view change.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Implementing Virtual Synchrony (2) (b) Process 6 sends out all its unstable messages (to insure that all have received all messages from the previous view), followed by a flush message to enforce the new view.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Implementing Virtual Synchrony (3) (c) Process 6 installs the new view when it has received a flush message from everyone else.

Overview Introduction to Fault Tolerance Basic Concepts Failure Models Failure Masking by Redundancy Process Resilience Design Issues Failure Masking and Replication Agreement in Faulty Systems Failure Detection Reliable Client-server Communication Point-to-point Communication RPC Semantics in The Presence of Failures Reliable Group Communication Basic Reliable-multicasting Schemes Scalability in Reliable Multicasting Atomic Multicast Distributed Commit Two-phase Commit Three-phase Commit Recovery Check-pointing Message Logging Recovery-oriented Computing Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

Distributed Commit Essential issue Given a computation distributed across a process group, how can we ensure that either all processes commit to the final result, or none of them do (atomicity)? One-phase commit is not sufficient Two-phase commit Three-phase commit

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Two-Phase Commit (1) Model The client who initiated the computation acts as coordinator; processes required to commit are the participants Phase 1a: Coordinator sends vote-request to participants (also called a pre-write) Phase 1b: When participant receives vote-request it returns either vote-commit or vote-abort to coordinator. If it sends vote-abort, it aborts its local computation Phase 2a: Coordinator collects all votes; if all are vote- commit, it sends global-commit to all participants, otherwise it sends global-abort Phase 2b: Each participant waits for global-commit or global- abort and handles accordingly.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Two-Phase Commit (2) (a) The finite state machine for the coordinator in 2PC. (b) The finite state machine for a participant.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Two-Phase Commit (3) Failing Participant Participant crashes in state S, and recovers to S Initial state: No problem: participant was unaware of protocol Ready state: Participant is waiting to either commit or abort. After recovery, participant needs to know which state transition it should make → log the coordinator’s decision Abort state: Merely make entry into abort state idempotent, e.g., removing the workspace of results Commit state: Also make entry into commit state idempotent, e.g., copying workspace to storage. Observation When distributed commit is required, having participants use temporary workspaces to keep their results allows for simple recovery in the presence of failures.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Two-Phase Commit (4) Result: If all participants are in the READY state, the protocol blocks. Apparently, the coordinator is failing. Note: The protocol prescribes that we need the decision from the coordinator. Alternative: When P recovers in the READY state, can check the state of participant Q → no need to log coordinator’s decision.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Three-Phase Commit (1) The states of the coordinator and each participant satisfy the following two conditions: 1.There is no single state from which it is possible to make a transition directly to either a COMMIT or an ABORT state. 2.There is no state in which it is not possible to make a final decision, and from which a transition to a COMMIT state can be made.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Three-Phase Commit (2) The client acts as coordinator Phase 1a: Coordinator sends vote-request to participants Phase 1b: When participant receives vote-request it returns either vote-commit or vote-abort to coordinator. If it sends vote-abort, it aborts its local computation Phase 2a: Coordinator collects all votes; if all are vote- commit, it sends prepare-commit to all participants, otherwise it sends global-abort, and halts Phase 2b: Each participant waits for prepare-commit, or waits for global-abort after which it halts Phase 3a: (Prepare to commit) Coordinator waits until all participants have sent ready-commit, and then sends global- commit to all Phase 3b: (Prepare to commit) Participant waits for global- commit

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Three-Phase Commit (3) (a) The finite state machine for the coordinator in 3PC. (b) The finite state machine for a participant.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Three-Phase Commit (4) Can P find out what it should it do after crashing in the ready or pre-commit state, even if other participants or the coordinator failed? Reasoning Essence: Coordinator and participants on their way to commit, never differ by more than one state transition Consequence: If a participant timeouts in ready state, it can find out at the coordinator or other participants whether it should abort, or enter pre-commit state Observation: If a participant already made it to the pre-commit state, it can always safely commit (but is not allowed to do so for the sake of failing other processes) Observation: We may need to elect another coordinator to send off the final COMMIT

Overview Introduction to Fault Tolerance Basic Concepts Failure Models Failure Masking by Redundancy Process Resilience Design Issues Failure Masking and Replication Agreement in Faulty Systems Failure Detection Reliable Client-server Communication Point-to-point Communication RPC Semantics in The Presence of Failures Reliable Group Communication Basic Reliable-multicasting Schemes Scalability in Reliable Multicasting Atomic Multicast Distributed Commit Two-phase Commit Three-phase Commit Recovery Check-pointing Message Logging Recovery-oriented Computing Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

Recovery: Background Essence When a failure occurs, we need to bring the system into an error- free state: Forward error recovery: Find a new state from which the system can continue operation Backward error recovery: Bring the system back into a previous error-free state Practice Use backward error recovery, requiring that we establish recovery points Observation Recovery in distributed systems is complicated by the fact that processes need to cooperate in identifying a consistent state from where to recover

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Recovery – Stable Storage Two disks after recovery: (a) Stable storage. (b) Crash after drive 1 is updated (copy from the primary). (c) Bad spot (copy the good d).

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Consistent Recovery State Requirement: Every message that has been received is also shown to have been sent in the state of the sender. Recovery line: Assuming processes regularly checkpoint their state, the most recent consistent global checkpoint.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Independent Checkpointing The domino effect: If checkpointing is done at the “wrong” instants, the recovery line may lie at system startup time → cascaded rollback

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Characterizing Message-Logging Schemes Incorrect replay of messages after recovery, leading to an orphan process. Alternative: Instead of taking an (expensive) checkpoint, try to replay your (communication) behavior from the most recent checkpoint → store messages in a log.