Distributed Systems: Group Communication

Slides:



Advertisements
Similar presentations
COS 461 Fall 1997 Group Communication u communicate to a group of processes rather than point-to-point u uses –replicated service –efficient dissemination.
Advertisements

Reliable Communication in the Presence of Failures Kenneth Birman, Thomas Joseph Cornell University, 1987 Julia Campbell 19 November 2003.
1 CS 194: Distributed Systems Process resilience, Reliable Group Communication Scott Shenker and Ion Stoica Computer Science Division Department of Electrical.
Reliable Group Communication Quanzeng You & Haoliang Wang.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Virtual Synchrony Ki Suh Lee Some slides are borrowed from Ken, Jared (cs ) and Justin (cs )
Virtual Synchrony Scott Phung Nov 15, 2011 Some slides borrowed from Jared (‘09)
Distributed Systems Spring 2009
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
CS 582 / CMPE 481 Distributed Systems Replication.
Outline Why distributed computing? Atomic Broadcast The atom system Relevance for e-textiles What’s next? Q&A.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
A Randomized Error Recovery Algorithm for Reliable Multicast Zhen Xiao Ken Birman AT&T Labs – Research Cornell University.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
1 Causal Delivery Advanced Networks PhD. Saúl Pomares Hernández.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Group Communication Group oriented activities are steadily increasing. There are many types of groups:  Open and Closed groups  Peer-to-peer and hierarchical.
2007/1/15http:// Lightweight Probabilistic Broadcast M2 Tatsuya Shirai M1 Dai Saito.
Event Ordering Greg Bilodeau CS 5204 November 3, 2009.
Hwajung Lee. A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types of groups:
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CS603 Fault Tolerance - Communication April 17, 2002.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Reliable Communication in the Presence of Failures Kenneth P. Birman and Thomas A. Joseph Presented by Gloria Chang.
Fault Tolerance (2). Topics r Reliable Group Communication.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
Exercises for Chapter 11: COORDINATION AND AGREEMENT
Fault Tolerance Chap 7.
Replication & Fault Tolerance CONARD JAMES B. FARAON
Last Class: Introduction
Primary-Backup Replication
The consensus problem in distributed systems
When Is Agreement Possible
Distributed Systems – Paxos
EEC 688/788 Secure and Dependable Computing
Reliable group communication
COT 5611 Operating Systems Design Principles Spring 2012
CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)
EECS 498 Introduction to Distributed Systems Fall 2017
Advanced Operating System
EEC 688/788 Secure and Dependable Computing
Active replication for fault tolerance
PERSPECTIVES ON THE CAP THEOREM
EEC 688/788 Secure and Dependable Computing
Time And Global Clocks CMPT 431.
Event Ordering.
EEC 688/788 Secure and Dependable Computing
Replication State Machines via Primary-Backup
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Lecture 9: Ordered Multicasting
Distributed Systems (15-440)
COT 5611 Operating Systems Design Principles Spring 2014
Last Class: Fault Tolerance
Presentation transcript:

Distributed Systems: Group Communication Julia Proft and Utkarsh Mall

The Process Group Approach to Reliable Distributed Computing Communications of the ACM, Dec. 1993 Ken Birman, Cornell University Reviews a decade of research on the Isis system. By naming our system ‘The Isis Toolkit’ we wanted to evoke this very old image of something that picks up the pieces and restores a computing system to life. Isis “glues the system together.”

Timeline Year Event Contributor(s) 1978 Time, Clocks, and the Ordering of Events in a Distributed System Lamport 1982 Byzantine Generals Problem Lamport, Shostak, and Pease 1983 Impossibility of Distributed Fault Tolerant Consensus Fischer, Lynch, and Patterson Virtual Synchrony and the Isis Toolkit Birman et al. 1984 State Machine Replication Lamport, Schneider 1985 Distributed Process Groups (V System) Cheriton, Deering, and Zwaenepoel 1987- 1993 Bulk of development on the Isis Toolkit

Motivation Problem: the construction of reliable distributed software. Issues of reliability have been left to the application programmers, who are “largely unable to respond to the challenge”; solutions to the problems are “probably beyond the ability of a typical distributed applications programmer.” Solution: programming with distributed groups of cooperating programs, implemented in the computing environment itself or the operating system. “The only practical approach”!

Process Groups Anonymous groups Explicit groups Application publishes data to a topic Other processes subscribe to this topic Properties needed for automatic, reliable operation: Ability to address group Atomic message delivery Ordered message delivery Access to history of group Explicit groups Direct cooperation between members Share responsibility for responding to requests Membership changes published to the group

Example: the Robot Operating System (ROS) ROS Master Image Processing Node Camera Node /image_data topic Subscribe Publish Register /gestures topic Input

Advantages Fault tolerance Consistency Ease of development Transparent adaptation to failure and recovery State machine replication Consistency Ordered and atomic message delivery Consistent view of group membership Ease of development Need not worry about communication protocol Leave fault tolerance and consistency to the OS

Problems Unreliable communication Membership changes Delivery ordering State transfer Failure atomicity

Unreliable communication UDP: packets lost, duplicated, delivered out of order RPC: sender cannot distinguish reason for failure TCP: broken channels result in inconsistent behavior How to recover consistently from message loss?

Membership changes Group membership changes do not happen instantaneously How to make sure messages reach the latest group members?

Delivery ordering Messages need to be ordered by causality How to deliver in causal ordering? Context record on each message

State transfer Processes joining group must get latest state How to handle inconsistencies from concurrent messages? Might need to change the diagram

Failure atomicity Need to achieve all-or-nothing message delivery How to handle mid-transmission failures? Protocol needs exactly once delivery

Close Synchrony A synchronous execution model. Multicasts to a process group are delivered to all members Send and delivery events occur as a single, instantaneous event

Close Synchrony Execution runs in genuine lockstep. Processes receive/multicast membership changes in the same relative order

Close Synchrony Unreliable Communication Membership changes Multicast is always reliable Consistent membership at any logical instant Concurrent multicasts are distinct events Happens instantaneously Multicast is a single logical event Unreliable Communication Membership changes Delivery Ordering State Transfer Failure Atomicity

Problems with Close Synchrony In the real world, events are not instantaneous! Expensive: execution runs in genuine lockstep! Impossible to achieve in presence of failures (why?) What do we do? Speed of system same as the slowest member

Virtual Synchrony Asynchronous Close Synchrony Synchronization needed only for events sensitive to ordering

Virtual Synchrony Group Membership Service Group Communication Service Replicated service within the process group itself Membership change needs to be done synchronously Group Communication Service Uses Lamport’s happened before relationship CBcast (Causal Broadcast) or ABcast (Atomic Broadcast) Multicasts are going to be a total event ordering equivalent to some close synchrony execution How is VS with abcast different from CS?

Vector Clocks Array of clocks, indexed by processes in the process group Protocol: VT(pi) = clock maintained by process pi VT(pi) initialized to zero For each send(m) at pi, VT(pi)[i]+=1 and VT(m) = VT(pi) If pj delivers a message, received from pi: For k in 1..n: VT(pj)[k] = max(VT(m)[k],VT(pi)[k]) Ordering VT1 ≤ VT2 iff ∀i, VT1[i] ≤ VT2[i] VT1 < VT2 iff VT1 ≤ VT2 and ∃i, VT1[i] < VT2[i] Why use vc instead of lamport clock

CBcast Uses vector clocks to detect causality Delivery of received messages delayed until “happened before” messages are delivered Protocol: pj on receiving message m from pi, delays delivery until VT(m)[k] = VT(pj)[k]+1 if k=i VT(m)[k] ≤ VT(pj)[k] otherwise When m is delivered follow vector clock protocol Delayed messages stored in CBcast delay queue Concurrent messages delivered out of order Fast because asynchronous Implementation details

ABcast Stronger ordering guarantee than CBcast Total message ordering within a group Messages can only be delivered if, no prior ABcast is undelivered Slow Protocol: A process pi holding token CBcasts message If pi is not holding the token CBcast but mark undeliverable Token holder delivers and CBcasts a set-order Other follow the set-order Implementation details

Virtual Synchrony Unreliable Communication Membership changes Group communication service Group membership service ABcast, CBcast Group communication service, group membership service Unreliable Communication Membership changes Delivery Ordering State Transfer Failure Atomicity

Isis An implementation of virtual synchrony Used by Also provides New York/Swiss stock exchange French air traffic control system (PHIDIAS) Also provides monitoring facilities: site failures, triggers Automated recovery Styles of group Write applications

Discussion Questions How is virtual synchrony with ABcast different from close synchrony?

Takeaways Close synchrony with process groups provides: Ease of development Consistency Fault tolerance Virtual synchrony: Faster asynchronous system

Bimodal Multicast (1999) Ken Birman PhD Berkeley ‘81 Mark Hayden → Cornell University Mark Hayden PhD Cornell ‘98 → Compaq Research → North Fork Networks → Lefthand Networks → Ventura Networks Öznur Özkasap PhD Ege ‘00 → Koç University Spent two years (and completed dissertation) at Cornell Zhen Xiao PhD Cornell ‘01 → AT&T Research → IBM Research → Peking University Mihai Budiu PhD CMU ‘03 → Microsoft Research → Barefoot Networks → VMware Research Spent a year at Cornell Yaron Minsky PhD Cornell ‘02 → Jane Street Fun fact: introduced Jane Street to OCaml

Motivation Virtual synchrony Best effort reliability protocols Costly protocol Unstable under stress Not scalable Best effort reliability protocols Scalable Starts re-multicasting under low levels of noise No membership check No end-to-end guarantee Multicast with stable throughput e.g. Streaming Media, teleconferencing

Design Two step protocol Optimistic Dissemination Protocol Unreliable Multicast like IP multicast Two-Phase Anti-Entropy Protocol Random gossip Unicast lost messages Cheaper than re-multicasting Read about ip mutlicast

Advantages PBcast (Probabilistic Broadcast) Atomicity (Almost all or almost none) Scalability Throughput Stability Add bimodal diagram

Performance

Performance

Takeaways Bimodal Multicast Stable throughput Scalability at cost of “weaker” reliability Predictable reliability Predictable load

C A P CAP Conjecture Consistency Availability Partition Tolerance Client receives the latest the version of state Availability Client request always gets a response Partition Tolerance Can tolerate network partition C A P Enforced Consistency Eventual Consistency In presence of partition, choose a trade-off between Consistency and Availability.

Acknowledgments Many slides/diagrams borrowed from Scott P., CS 6410 Fall 2011 Some diagrams borrowed from Ken Birman, CS 614 Fall 2006 Vector Clock, CBcast and ABcast borrowed from Birman, Schiper, Stephenson, Lightweight causal and atomic group multicast, 1991