Hwajung Lee. A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types of groups:

Slides:



Advertisements
Similar presentations
COS 461 Fall 1997 Group Communication u communicate to a group of processes rather than point-to-point u uses –replicated service –efficient dissemination.
Advertisements

CS425/CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
1 CS 194: Distributed Systems Process resilience, Reliable Group Communication Scott Shenker and Ion Stoica Computer Science Division Department of Electrical.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CS542 Topics in Distributed Systems Diganta Goswami.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Reliable Group Communication Quanzeng You & Haoliang Wang.
Page 1 Introduction Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
UNIT-IV Computer Network Network Layer. Network Layer Prepared by - ROHIT KOSHTA In the seven-layer OSI model of computer networking, the network layer.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 9: Time, Coordination and Replication Dr. Michael R. Lyu Computer.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Lab 1 Bulletin Board System Farnaz Moradi Based on slides by Andreas Larsson 2012.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Dec 4, 2007 Reliable Multicast Group Neelofer T. CMSC 621.
TOTEM: A FAULT-TOLERANT MULTICAST GROUP COMMUNICATION SYSTEM L. E. Moser, P. M. Melliar Smith, D. A. Agarwal, B. K. Budhia C. A. Lingley-Papadopoulos University.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
7/26/ Design and Implementation of a Simple Totally-Ordered Reliable Multicast Protocol in Java.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Practical Byzantine Fault Tolerance
Group Communication Group oriented activities are steadily increasing. There are many types of groups:  Open and Closed groups  Peer-to-peer and hierarchical.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Dealing with open groups The view of a process is its current knowledge of the membership. It is important that all processes have identical views. Inconsistent.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Dealing with open groups The view of a process is its current knowledge of the membership. It is important that all processes have identical views. Inconsistent.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Fault Tolerance. Basic Concepts Availability The system is ready to work immediately Reliability The system can run continuously Safety When the system.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
Reliable Communication Smita Hiremath CSC Reliable Client-Server Communication Point-to-Point communication Established by TCP Masks omission failure,
V1.7Fault Tolerance1. V1.7Fault Tolerance2 A characteristic of Distributed Systems is that they are tolerant of partial failures within the distributed.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Lecture 10: Coordination and Agreement (Chap 12) Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.
Failure detection The design of fault-tolerant systems will be easier if failures can be detected. Depends on the 1. System model, and 2. The type of failures.
Building a Reliable IP Multicast Distributed System Karl Thomas Rees CS 560.
Fault Tolerance (2). Topics r Reliable Group Communication.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Distributed Systems Lecture 7 Multicast 1. Previous lecture Global states – Cuts – Collecting state – Algorithms 2.
Distributed Computing Systems Replication Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
Fault Tolerance Prof. Orhan Gemikonakli
Reliable group communication
Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013
Alternating Bit Protocol
Advanced Operating System
Active replication for fault tolerance
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
EEC 688/788 Secure and Dependable Computing
Seminar Mobilkommunikation Reliable Multicast in Wireless Networks
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

Hwajung Lee

A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types of groups:  Open group (anyone can join, customers of Walmart )  Closed groups (membership is closed, class of 2000 )  Peer-to-peer group (all have equal status, graduate students of CS department, members in a videoconferencing / netmeeting )  Hierarchical groups (one or more members are distinguished from the rest. President and the employees of a company, distance learning ).

Various forms of multicast to communicate with the members. Two different examples are  Atomic multicast  Ordered multicast Dynamic groups  How to correctly communicate when the membership constantly changes?  Keeping track of membership changes

A multicast is atomic, when the message is delivered to every correct member, or to no member at all. In general, processes may crash, yet the atomicity of multicast is to be guaranteed. How can we implement atomic multicast?

Basic multicast does not consider failures. Reliable multicast handles failures. Three criteria for basic multicast : Liveness. Each process must receive every message Integrity. No spurious message received No duplicate. Accepts exactly one copy of a message

Sender’s programReceiver’s program i:=0; if m is new  do i ≠ n  accept it; send message to member[i];multicast m ; i:= i+1  m is duplicate  discard m od fi Tolerates process crashes. Why does it work?

Sometimes, certain features available in the infrastructure of a network simplify the implementation of multicast. Examples are Multicast on an ethernet LAN IP multicast for wide area networks

IP multicast is a bandwidth-conserving technology where the router reduces traffic by replicating a single stream of information and forwarding them to multiple clients. It is a form of pruned broadcast. Sender sends a single copy to a special multicast IP address (Class D) that represents a group, where other members register.

Source is the root of a spanning tree Routers maintain & update distribution trees whenever members join / leave a group All multicasts are Routed via a Rendezvous point Too much load on routers. Application layer multicast overcomes this.

Total order multicast. Every member must receive all updates in the same order. Example: consistent update of replicated data on servers Causal order multicast. If a, b are two updates and a happened before b, then every member must accept a before accepting b. Example: implementation of a bulletin board. Local order (a.k.a. S ingle source FIFO ). Example: video distribution, distance learning using “push technology.”

First method. Basic multicast using a sequencer { The sequencer S} define seq: integer (initially 0} do receive m  multicast (m, seq); seq := seq+1; deliver m od sequencer

Second method. Basic multicast without a sequencer. Uses the idea of 2PC (two-phase commit) p q r

Step 1. Sender i sends (m, ts) to all Step 2. Receiver j saves it in a holdback queue, and sends an ack (a, ts) Step 3. Receive all acks, and pick the largest ts. Then send (m, ts, commit) to all. Step 4. Receiver removes it from the holdback queue and delivers m in the ascending order of timestamps. Why does it work?

Basic multicast only. Use vector clocks. Recipient i will deliver a message from j iff 1. VC j (j) = LC j (i) + 1 {LC = local vector clock} 2.  k: k≠j :: VC k (j) ≤ LC k (i) VC = incoming vector clock LC = Local vector clock Note the slight difference in the implementation of the vector clocks

Tolerates process crashes. The additional requirements are: Only correct processes are required to receive the messages from all correct processes in the group. Multicasts by faulty processes will either be received by every correct process, or by none at all.

Theorem. In an asynchronous distributed system, total order reliable multicasts cannot be implemented when even a single process undergoes a crash failure. (Hint) The implementation will violate the FLP impossibility result. Complete the arguments!

IP multicast or application layer multicast provides unreliable datagram service. Reliability requires the detection of the message omission followed by retransmission. This can be done using ack. However, for large groups (as in distance learning applications or software distribution) scalability is a major problem.

If omission failures are rare, then receivers will only report the non-receipt of messages using NACK. The reduction of acknowledgements is the underlying principle of Scalable Reliable Multicasts (SRM).

If several members of a group fail to receive a message, then each such member waits for a random period of time before sending its NACK. This helps to suppress redundant NACKs. Sender multicasts the missing copy only once. Use of cached copies in the network and selective point-to-point retransmission further reduces the traffic.

Source sending m[0], m[1], m[2] … Missed m[7] and sent NACK Missed m[7] and sent NACK Missed m[7] and sent NACK m[7] cached here m[7] cached here m[7]

The view of a process is its current knowledge of the membership. It is important that all processes have identical views. Inconsistent views can lead to problems. Example: Four members (0,1,2,3) will send out 144 s. Assume that 3 left the group but only 2 knows about it. So, 0 will send 144/4 = 36 s (first quarter 1-36) 1 will send 144/4 = 48 s (second quarter 37-71) 2 will send 144/3 = 48 s (last one-third ) 3 has left. The mails will not be delivered!

Views can change unpredictably, and no member may have exact information about who joined and who leaved at any given time. In a managed group, views and their changes should propagate in the same order to all members. Example. Current view (of all processes) v 0 (g) = {0, 1, 2, 3}. Let 1, 2 leave and 4 join the group concurrently. This view change can be serialized in many ways:  {0,1,2,3}, {0,1,3} {0,3,4}, OR  {0,1,2,3}, {0,2,3}, {0,3}, {0,3,4}, OR  {0,1,2,3}, {0,3}, {0,3,4} To make sure that every member observe these changes in the same order, changes in the view should be sent via total order multicast.

{Process 0}: ▪ v 0 (g); v 0 (g) = {0.1,2,3}, ▪ send m1,... ; ▪ v 1 (g); ▪ send m2, send m3; v 1 (g) = {0,1,3}, ▪ v 2 (g) ; {Process 1}: v 2 (g) = {0,3,4} ▪ v 0 (g); ▪ send m4, send m5; ▪ v 1 (g); ▪ send m6; ▪ v 2 (g)...;

If a process j joins and continues its membership in a group g that already contains a process i, then eventually j appears in all views delivered by process i. If a process j permanently leaves a group g that contains a process i, then eventually j is excluded from all views delivered by process i.

Rule. With respect to each message, all correct processes have the same view. m sent in view V  m received in view V

Agreement. If a correct process k delivers a message m in v i (g ) before delivering the next view v i+1 (g), then every correct process j  v i (g)  v i+1 (g) must deliver m before delivering v i+1 (g). Integrity. If a process j delivers a view v i (g ), then v i (g ) must include j. Validity. If a process k delivers a message m in view v i (g) and another process j  v i (g) does not deliver that message m, then the next view v i+1 (g) delivered by k must exclude j. v i (g ) v i+1 (g), m v i (g ) v i+1 (g), m Sender k Receiver j

Let process 1 deliver m and then crash. Possibility 1. No one delivers m, but each delivers the new view {0,2,3}. Possibility 2. Processes 0, 2, 3 deliver m and then deliver the new view {0,2,3} Possibility 3. Processes 2, 3 deliver m and then deliver the new view {0,2,3} but process 0 first delivers the view {0,2,3} and then delivers m. Are these acceptable? {0,1,2,3}{0,2,3} m m m Possibility 3

 Group communication system developed by Danny Dolev and his group at the Hebrew University of Jerusalem.  Deals with open group  Supports scalable reliable multicast  Tolerates network partition

1. IP multicast (or ethernet LAN) used to support high bandwidth multicast. 2. Acks are piggybacked and message loss is detected transparently, leading to selective retransmission. Example: (Notation: a2B1 denotes the ack of A2 piggybacked on B1) A process that receives A1, A2, a2B1, b3C1 … suspects that it did not receive message B2, and sends a NACK to request a retransmission

Causal mode (maintains causal order) Agreed mode (maintains total order that does not conflict with the causal order) Safe mode (Delivers a message only after the lower levels of the system have acknowledged its reception at all the destination machines. All messages are delivered relative to a safe message)

Each partition assumes that the machines in the other partition have failed, and maintains virtual synchrony within its own partition only. After repair, consistency is restored in the entire system. Dealing with partition

Assume A was sending a safe message M, and the configuration changed to {A, B, C}  {A, B}, {C}. All but C sent ack to A, B. To deliver M, A, B must receive the new view {A,B} first. After the delivery of M, {A, B}, {C}  {A, B, D}, {C} occurs. If C acked M and also received acks from A and B before the partition, then C may deliver M before it receives the new view {C}. Otherwise, C will ignore message M as spurious without contradicting any guarantee.