Fault Tolerance CSCI 4780/6780. Reliable Group Communication Reliable multicasting is important for several applications Transport layer protocols rarely.

Slides:



Advertisements
Similar presentations
Reliable Multicasting –IP-multicast is unreliable. Need a reliable multicast layer to simplify the software design. –Like in reliable unicast (PAR), we.
Advertisements

1 Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create (
CS 542: Topics in Distributed Systems Diganta Goswami.
Reliable Group Communication Quanzeng You & Haoliang Wang.
Reliability on Web Services Presented by Pat Chan 17/10/2005.
Computer Science Lecture 18, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 OSI Transport Layer Network Fundamentals – Chapter 4.
Transport Layer Transport Layer. Reliable data transfer: getting started send side receive side rdt_send(): called from above, (e.g., by app.).
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
1 Chapter Six - Errors, Error Detection, and Error Control Chapter Six.
Distributed Systems CS Fault Tolerance- Part III Lecture 15, Oct 26, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
William Stallings Data and Computer Communications 7 th Edition (Selected slides used for lectures at Bina Nusantara University) Transport Layer.
Multicast Transport Protocols: A Survey and Taxonomy Author: Katia Obraczka University of Southern California Presenter: Venkatesh Prabhakar.
CSE679: Multicast and Multimedia r Basics r Addressing r Routing r Hierarchical multicast r QoS multicast.
1 Transport Layer Computer Networks. 2 Where are we?
Ming-Yu Jiang and Wanjiun Liao,IEEE ICC 2002 Family ACK Tree (FAT): A New Reliable Multicast Protocol for Mobile Ad Hoc Networks. Speaker : Wilson Lai.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
1 Distributed Systems Fault Tolerance Chapter 8. 2 Course/Slides Credits Note: all course presentations are based on those developed by Andrew S. Tanenbaum.
Dec 4, 2007 Reliable Multicast Group Neelofer T. CMSC 621.
3: Transport Layer 3a-1 8: Principles of Reliable Data Transfer Last Modified: 10/15/2015 7:04:07 PM Slides adapted from: J.F Kurose and K.W. Ross,
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 2.5 Internetworking Chapter 25 (Transport Protocols, UDP and TCP, Protocol Port Numbers)
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
7/26/ Design and Implementation of a Simple Totally-Ordered Reliable Multicast Protocol in Java.
Group Communication Group oriented activities are steadily increasing. There are many types of groups:  Open and Closed groups  Peer-to-peer and hierarchical.
1 Chapter Six - Errors, Error Detection, and Error Control Chapter Six.
CIS679: Multicast and Multimedia (more) r Review of Last Lecture r More about Multicast.
Hwajung Lee. A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types of groups:
More on Fault Tolerance Chapter 7. Topics Group Communication Virtual Synchrony Atomic Commit Checkpointing, Logging, Recovery.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CS603 Fault Tolerance - Communication April 17, 2002.
Fault Tolerance Chapter 7.
Fault Tolerance. Basic Concepts Availability The system is ready to work immediately Reliability The system can run continuously Safety When the system.
Reliable Communication Smita Hiremath CSC Reliable Client-Server Communication Point-to-Point communication Established by TCP Masks omission failure,
V1.7Fault Tolerance1. V1.7Fault Tolerance2 A characteristic of Distributed Systems is that they are tolerant of partial failures within the distributed.
Fault Tolerance Chapter 7. Failures in Distributed Systems Partial failures – characteristic of distributed systems Goals: Construct systems which can.
- Manvitha Potluri. Client-Server Communication It can be performed in two ways 1. Client-server communication using TCP 2. Client-server communication.
Chapter 11 Fault Tolerance. Topics Introduction Process Resilience Reliable Group Communication Recovery.
EE689 Lecture 13 Review of Last Lecture Reliable Multicast.
TCP OVER ADHOC NETWORK. TCP Basics TCP (Transmission Control Protocol) was designed to provide reliable end-to-end delivery of data over unreliable networks.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Ασύρματες και Κινητές Επικοινωνίες Ενότητα # 11: Mobile Transport Layer Διδάσκων: Βασίλειος Σύρης Τμήμα: Πληροφορικής.
Fault Tolerance Chapter 7. Basic Concepts Dependability Includes Availability Reliability Safety Maintainability.
Networks, Part 2 March 7, Networks End to End Layer  Build upon unreliable Network Layer  As needed, compensate for latency, ordering, data.
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Principles of reliable data transfer 0.
1 CHAPTER 5 Fault Tolerance Chapter 5-- Fault Tolerance.
Fault Tolerance Chapter 7. Goal An important goal in distributed systems design is to construct the system in such a way that it can automatically recover.
PROCESS RESILIENCE By Ravalika Pola. outline: Process Resilience  Design Issues  Failure Masking and Replication  Agreement in Faulty Systems  Failure.
Fault Tolerance (2). Topics r Reliable Group Communication.
CSCI 465 D ata Communications and Networks Lecture 11 Martin van Bommel CSCI 465 Data Communications & Networks 1.
DATA LINK CONTROL. DATA LINK LAYER RESPONSIBILTIES  FRAMING  ERROR CONTROL  FLOW CONTROL.
Ch 3. Transport Layer Myungchul Kim
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Ch 3. Transport Layer Myungchul Kim
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
CMPE 252A: Computer Networks
More on Fault Tolerance
Fault Tolerance Prof. Orhan Gemikonakli
Fault Tolerance Chap 7.
Fast Retransmit For sliding windows flow control we waited for a timer to expire before beginning retransmission of a packet TCP uses an additional mechanism.
Data Link Layer Flow Control.
Chapter 6: Transport Layer (Part I)
DC7: More Coordination Chapter 11 and 14.2
Reliable group communication
Reliable Multicast Group
Distributed Systems CS
Advanced Operating System
Distributed Systems CS
Last Class: Fault Tolerance
Presentation transcript:

Fault Tolerance CSCI 4780/6780

Reliable Group Communication Reliable multicasting is important for several applications Transport layer protocols rarely offer reliable multicasting What is reliable multicasting? –Communication sent to the group should reach each member –What happens if process crashes (or enters) during multicasting? Multicasting with non-faulty processes & multicasting with faulty processes

Basic Reliable Multicasting Group is assumed to be stable Communication may be faulty –Underlying unreliable multicasting service Straightforward if the number of processes are small Sequence number for each message Use acknowledgements –Either positive or negative Retransmission on negative ack or on timeout Poor scalability of positive ack

Basic Reliable-Multicasting Schemes A simple solution to reliable multicasting when all receivers are known and are assumed not to fail a)Message transmission b)Reporting feedback

Positive Vs. Negative Feedback Can we do better than both of them? Hybrid scheme that has strengths of both but can mask the drawbacks Negative ack on each msg but positive ack on every n th msg Process not positively acking will receive all msgs in the cycle Positive AckNegative Ack When is the ack sent?Upon receiving msg.Upon noticing missed msg. When is msg retransmitted?Upon TimeoutUpon receiving a –ve Ack DrawbackHigh msg loadsArbitrary wait

Reliable Group Communication Reliable multicasting is important for several applications Transport layer protocols rarely offer reliable multicasting What is reliable multicasting? –Communication sent to the group should reach each member –What happens if process crashes (or enters) during multicasting? Multicasting with non-faulty processes & multicasting with faulty processes

Basic Reliable Multicasting Group is assumed to be stable Communication may be faulty –Underlying unreliable multicasting service Straightforward if the number of processes are small Sequence number for each message Use acknowledgements –Either positive or negative Retransmission on negative ack or on timeout Poor scalability of positive ack

Basic Reliable-Multicasting Schemes A simple solution to reliable multicasting when all receivers are known and are assumed not to fail a)Message transmission b)Reporting feedback

Positive Vs. Negative Feedback Can we do better than both of them? Hybrid scheme that has strengths of both but can mask the drawbacks Negative ack on each msg but positive ack on every n th msg Process not positively acking will receive all msgs in the cycle Positive AckNegative Ack When is the ack sent?Upon receiving msg.Upon noticing missed msg. When is msg retransmitted?Upon TimeoutUpon receiving a –ve Ack DrawbackHigh msg loadsArbitrary wait

Nonhierarchical Feedback Control Reducing feedback overheads Only negative feedback Feedback is multicast to all members Retransmissions are multicast too Feedback time has to be carefully adjusted Can unnecessarily interrupt other processes Processes that regularly miss msgs form a separate group –Retransmission to that group

Nonhierarchical Feedback Control Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others.

Hierarchical Feedback Control Nonhierarchical feedback control may suffice for small multicast groups Overheads are still too heavy for large groups Limited geographic scalability One sender, large numbers of receivers Receivers partitioned into sub-groups Each subgroup has a coordinator Coordinator responsible for retransmissions within subgroup Constructing and maintaining multicast tree is notoriously difficult

Hierarchical Feedback Control The essence of hierarchical reliable multicasting. a)Each local coordinator forwards the message to its children. b)A local coordinator handles retransmission requests.