Algorithm for Virtually Synchronous Group Communication

Slides:



Advertisements
Similar presentations
Ranveer Chandra , Kenneth P. Birman Department of Computer Science
Advertisements

Computer Science Lecture 18, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Lab 2 Group Communication Andreas Larsson
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
The Ensemble system Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 31st, 2004.
Algorithm for Virtually Synchronous Group Communication Idit Keidar, Roger Khazan MIT Lab for Computer Science Theory of Distributed Systems Group.
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Transis Efficient Message Ordering in Dynamic Networks PODC 1996 talk slides Idit Keidar and Danny Dolev The Hebrew University Transis Project.
1 An Inheritance-Based Technique for Building Simulation Proofs Incrementally Idit Keidar, Roger Khazan, Nancy Lynch, Alex Shvartsman MIT Lab for Computer.
CS 582 / CMPE 481 Distributed Systems Replication.
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant.
Scalable Group Communication for the Internet Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group The main part of this talk is.
Abstractions for Fault-Tolerant Distributed Computing Idit Keidar MIT LCS.
Group Communication Robbert van Renesse CS614 – Tuesday Feb 20, 2001.
1 Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group Paradigms for Building Distributed Systems: Performance Measurements and.
1 Network Layer: Host-to-Host Communication. 2 Network Layer: Motivation Can we built a global network such as Internet by extending LAN segments using.
1 A Framework for Highly Available Services Based on Group Communication Alan Fekete Idit Keidar University of Sidney MIT.
Transis 1 Fault Tolerant Video-On-Demand Services Tal Anker, Danny Dolev, Idit Keidar, The Transis Project.
Lab 1 Bulletin Board System Farnaz Moradi Based on slides by Andreas Larsson 2012.
Vs. Object-Process Methodology Written by Linder Tanya Rubinshtein Leena Nazaredko Anton Research Report Work Flow Management System.
Overlay Network Physical LayerR : router Overlay Layer N R R R R R N.
SPREAD TOOLKIT High performance messaging middleware Presented by Sayantam Dey Vipin Mehta.
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
November NC state university Group Communication Specifications Gregory V Chockler, Idit Keidar, Roman Vitenberg Presented by – Jyothish S Varma.
Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 Theory of Distributed Systems (TDS) Group Leaders: Nancy Lynch, Idit Keidar PhD students: Victor Luchangco, Josh Tauber, Roger Khazan, Carl Livadas,
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Scalable Group Communication for the Internet Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
Chapter 11 Fault Tolerance. Topics Introduction Process Resilience Reliable Group Communication Recovery.
December 4, 2002 CDS&N Lab., ICU Dukyun Nam The implementation of video distribution application using mobile group communication ICE 798 Wireless Mobile.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Replication and Group Communication. Management of Replicated Data FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM Instructor’s.
1 Modeling and Analyzing Fault-Tolerant, Real-Time Communication Protocols Nancy Lynch Theory of Distributed Systems MIT Second MURI Workshop Berkeley,
1 Communication and Data Management in Dynamic Distributed Systems Nancy Lynch MIT June 20, 2002 …
1 Reliable Group Communication: a Mathematical Approach Nancy Lynch Theory of Distributed Systems MIT LCS Kansai chapter, IEEE July 7, 2000 GC …
1 New Directions for NEST Research Nancy Lynch MIT NEST Annual P.I. Meeting Bar Harbor, Maine July 12, 2002 …
NTT - MIT Research Collaboration — Bi-Annual Report, July 1—December 31, 1999 MIT : Cooperative Computing in Dynamic Environments Nancy Lynch, Idit.
Distributed Systems Lecture 7 Multicast 1. Previous lecture Global states – Cuts – Collecting state – Algorithms 2.
1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Washington,
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 Modeling and Analyzing Distributed Systems Using I/O Automata Nancy Lynch, MIT Draper Laboratory, IR&D Kickoff Meeting Aug. 30, 2002.
The Consensus Problem in Fault Tolerant Computing
Replication & Fault Tolerance CONARD JAMES B. FARAON
Reliable group communication
Alternating Bit Protocol
Advanced Operating System
IOA Code Generator (Making IOA Run)
EEC 688/788 Secure and Dependable Computing
Chapter 19: Interfaces and Components
Indirect Communication Paradigms (or Messaging Methods)
Indirect Communication Paradigms (or Messaging Methods)
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Group Service in CORBA Xing Gang Supervisor: Prof. Michael R. Lyu
Interfaces and Components
Distributed Systems (15-440)
CSE 486/586 Distributed Systems Reliable Multicast --- 2
Last Class: Fault Tolerance
Chapter 19: Interfaces and Components
Presentation transcript:

Algorithm for Virtually Synchronous Group Communication Idit Keidar, Roger Khazan, Nancy Lynch MIT Lab for Computer Science Theory of Distributed Systems Group

Virtual Synchorny Application Virtual Synchrony Multicast Service Membership Service

Virtual Synchrony Synchronization of Messages and Views: Powerful abstraction for replication Semantics: VS [Birman, Joseph 87], EVS, SVS Procs that go together through same views, deliver same sets of messages.

Example: Virtual Synchrony

Project Goals High-quality design of a VS GCS for WANs Mathematical-quality (precise, formal, well-documented) Useful semantcis Efficient algorithm Scalable architecture Modular design Specification Algorithm Proof Performance Analysis

Publications ICDCS’00: Intern. Conf. On Distributed Comp. Submited to SICOMP (SIAM Journal of Comp.) ICSE’00: Intern. Conf. On Software Engineering Invited to ACM TOSEM

Virtual Synchrony: How To? Before moving into new view: Need to know which synch msgs to use, since there may be several view proposals Exchange synch messages (“flush”) to agree which msgs to deliver in old view.

Example: Synchronization Msgs

Problematic Scenario

Existing Solutions Limit Reconfiguration Do not allow joins during reconfiguration When someone wants to join: first, deliver view without joiner; then, start new reconfiguration. Use common id to identify synch msgs for same view proposal

Limited Reconfiguration

Problems with Existing Solutions Limited Reconfiguration Obsolete views delivered to application Creates overhead Limits usefulness of virtual synchrony Use of common id to identify synch msgs Pre-agreement or dissemination is required Costly, especially in WANs

Our Idea Don’t limit reconfiguration Issue locally unique id per process for each view proposal Tag synch msgs with these local ids View includes vector of latest local ids View is a triple: e.g., < 4, {p, q, r}, [8, 9, 3] > Procs use sync msgs identified by view Hence, procs use right sync msgs

Our Algorithm Allows Joiners

No Common Sync Ids Required

Transient Failure

Implementation VS library (C++), linked with application Use [KSMD,00] membership service implemented in C++, socket interface with members Reliable FIFO layer (made in Hebrew University), uses IP multicast and recovers lost messages, library --- linked with VS

Group Communication -- Useful “Building Block” Group Abstraction processes interact in a group dynamic: fail/join/partition/merge Reliable Group Multicast Group Membership -- generates “views” tell each process who it is connected to Systems: Ensemble, Horus, Isis, Newtop, Psync, Sphynx, Relacs, Totem, Transis

Example: Group Communication