Group Communication Group oriented activities are steadily increasing. There are many types of groups:  Open and Closed groups  Peer-to-peer and hierarchical.

Slides:



Advertisements
Similar presentations
1 Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create (
Advertisements

Fault Tolerance CSCI 4780/6780. Reliable Group Communication Reliable multicasting is important for several applications Transport layer protocols rarely.
Push Technology Humie Leung Annabelle Huo. Introduction Push technology is a set of technologies used to send information to a client without the client.
COS 461 Fall 1997 Group Communication u communicate to a group of processes rather than point-to-point u uses –replicated service –efficient dissemination.
CS425/CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Reliable Communication in the Presence of Failures Kenneth Birman, Thomas Joseph Cornell University, 1987 Julia Campbell 19 November 2003.
1 CS 194: Distributed Systems Process resilience, Reliable Group Communication Scott Shenker and Ion Stoica Computer Science Division Department of Electrical.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Reliable Group Communication Quanzeng You & Haoliang Wang.
Page 1 Introduction Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CS 582 / CMPE 481 Distributed Systems Replication.
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Network Multicast Prakash Linga. Last Class COReL: Algorithm for totally-ordered multicast in an asynchronous environment, in face of network partitions.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 9: Time, Coordination and Replication Dr. Michael R. Lyu Computer.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Transport Protocols Slide 1 Transport Protocols.
TCP: Software for Reliable Communication. Spring 2002Computer Networks Applications Internet: a Collection of Disparate Networks Different goals: Speed,
Multicast and Anycast Mike Freedman COS 461: Computer Networks
Lab 1 Bulletin Board System Farnaz Moradi Based on slides by Andreas Larsson 2012.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Dec 4, 2007 Reliable Multicast Group Neelofer T. CMSC 621.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
7/26/ Design and Implementation of a Simple Totally-Ordered Reliable Multicast Protocol in Java.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Dealing with open groups The view of a process is its current knowledge of the membership. It is important that all processes have identical views. Inconsistent.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Hwajung Lee. A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types of groups:
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CS603 Fault Tolerance - Communication April 17, 2002.
Push Technology Humie Leung Annabelle Huo. Introduction Push technology is a set of technologies used to send information to a client without the client.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Fault Tolerance. Basic Concepts Availability The system is ready to work immediately Reliability The system can run continuously Safety When the system.
Reliable Communication Smita Hiremath CSC Reliable Client-Server Communication Point-to-Point communication Established by TCP Masks omission failure,
V1.7Fault Tolerance1. V1.7Fault Tolerance2 A characteristic of Distributed Systems is that they are tolerant of partial failures within the distributed.
Chapter 11 Fault Tolerance. Topics Introduction Process Resilience Reliable Group Communication Recovery.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Replication and Group Communication. Management of Replicated Data FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM Instructor’s.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
CS5248 Student Presentation1 Scalable Resilient Media Streaming Suman Banerjee, Seungjoon Lee, Ryan Braud, Bobby Bhattacharjee, Aravind Srinivasan NOSSDAV.
Lecture 10: Coordination and Agreement (Chap 12) Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.
Failure detection The design of fault-tolerant systems will be easier if failures can be detected. Depends on the 1. System model, and 2. The type of failures.
Fault Tolerance (2). Topics r Reliable Group Communication.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
Classifying fault-tolerance Masking tolerance. Application runs as it is. The failure does not have a visible impact. All properties (both liveness & safety)
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Distributed Systems Lecture 7 Multicast 1. Previous lecture Global states – Cuts – Collecting state – Algorithms 2.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
More on Fault Tolerance
Replication & Fault Tolerance CONARD JAMES B. FARAON
Advanced Operating System
DC7: More Coordination Chapter 11 and 14.2
Reliable group communication
Reliable Multicast Group
Advanced Operating System
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

Group Communication Group oriented activities are steadily increasing. There are many types of groups:  Open and Closed groups  Peer-to-peer and hierarchical groups We focus on how the members of a group communicate

Major issues  Different types of multicast  Atomic multicast  Ordered multicast  Dynamic groups  How to communicate when the membership constantly changes  Failure handling  Keeping track of membership and membership changes

Atomic multicast A multicast is called atomic, when the message is delivered to every correct member, or to no member at all. How can we implement atomic multicast?

Basic vs. reliable multicast Basic multicast does not consider crash failures. Reliable multicast does. Three criteria for basic multicast: Liveness. Each process must receive every message Integrity. No spurious message received No duplicate. Accepts exactly one copy of a message

Reliable atomic multicast Sender’s programReceiver’s program i:=0;if m is new  do i ≠ n  accept it; send message to i;multicast m; i:= i+1  m is duplicate  discard m od fi Tolerates process crashes.

Multicast support in networks Sometimes, certain features available in the infrastructure of a distributed system simplify the implementation of multicast. Examples are Multicast on an ethernet LAN IP multicast

IP Multicast Although multicasts can be implemented using point-to-point communications, there are some practical forms of multicasts that make use of the inherent multicasting ability of the underlying medium. IP multicast is a bandwidth-conserving technology that reduces traffic by simultaneously delivering a single stream of information to multiple clients. Applications that take advantage of multicast include distance learning, videoconferencing, and distribution of software, stock quotes, and news. The source sends only one copy, which is replicated by the routers.

Distribution trees Class D addresses are assigned to multicast groups. Routers maintain and update distribution trees when members join / leave a group. Concern: Too much load on routers. Application layer multicast overcomes this.

Ordered multicasts Total order Causal order Local order (a.k.a. Single source FIFO ) Definitions? Applications? Total order multicast is useful in the consistent update of replicated servers Causal order multicast is relevant in implementing bulletin boards Local order multicast is useful in updating cache memories in multi-computers

Implementing total order multicast Basic multicast using a sequencer { The sequencer S} define seq: integer (initially 0} do receive m  multicast (m, seq); seq := seq+1; deliver m od sequencer

Implementing basic total order multicast Distributed implementation without a sequencer Uses the idea of 2PC p q r

Implementing basic total order multicast Step 1. Sender i sends (m, ts) to all Step 2. Receiver j saves it in a holdback queue, and sends an ack (a, ts) Step 3. Receive all acks, and pick the largest ts. Then send (m, ts, commit) to all. Step 4. Receiver removes it from the holdback queue and delivers m in the ascending order of timestamps. Why does it work?

Implementing basic causal order multicast Use vector clocks. The recipient i delivers a message from j iff 1. VC j (j) = LC j (i) +1 {LC = local vector clock} 2.  k: k≠j :: VC k (j) ≤ LC k (i) VC = incoming vector clock LC = Local vector clock Note the slight difference in the implementation of the vector clocks

Reliable multicast Tolerates process crashes. The additional requirements are as follows: Only correct processes are required to receive the messages from all correct processes in the group. Multicasts by faulty processes will either be received by every correct process, or by none at all.

A theorem on reliable multicast In an asynchronous distributed system, total order reliable multicasts cannot be implemented when even a single process undergoes a crash failure. Why? Since it will violate the FLP impossibility result.

Scalable Reliable Multicast IP multicast or application layer multicast has to detect the loss of messages and use retransmission for achieving reliability. For large groups (like distance learning applications) scalability is a major problem.

Scalable Reliable Multicast Difficult to scale: Sender state explosion Message implosion State: receiver 1, receiver 2, … receiver n

Scalable Reliable Multicast If omission failures are rare, then receivers will only report the non-receipt of messages using NACK, This has the potential to trigger selective point-to-point retransmission The reduction of acknowledgements is the underlying principle of Scalable Reliable Multicasts (SRM). If several members of a group fail to receive a message, then each such member waits for a random period of time before sending its NACK. This helps to suppress redundant NACKs. Sender multicasts the missing copy only once.