Virtual Synchrony Ki Suh Lee Some slides are borrowed from Ken, Jared (cs6410 2009) and Justin (cs614 2005)

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

Reliable Communication in the Presence of Failures Kenneth Birman, Thomas Joseph Cornell University, 1987 Julia Campbell 19 November 2003.
CS 542: Topics in Distributed Systems Diganta Goswami.
Teaser - Introduction to Distributed Computing
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
1 CS 194: Elections, Exclusion and Transactions Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Systems Overview Ali Ghodsi
Computer Science Lecture 18, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Lab 2 Group Communication Andreas Larsson
Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Virtual Synchrony Scott Phung Nov 15, 2011 Some slides borrowed from Jared (‘09)
Ken Birman Cornell University. CS5410 Fall
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
Reliable Distributed Systems Virtual Synchrony. A powerful programming model! Called virtual synchrony It offers Process groups with state transfer, automated.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Distributed Systems 2006 Retrofitting Reliability* *With material adapted from Ken Birman.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 688/788 Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman.
Ken Birman Cornell University. CS5410 Fall
CS603 Communication Mechanisms 14 January Types of Communication Shared Memory Message Passing Stream-oriented Communications Remote Procedure Call.
Distributed Systems Principles and Paradigms
TOTEM: A FAULT-TOLERANT MULTICAST GROUP COMMUNICATION SYSTEM L. E. Moser, P. M. Melliar Smith, D. A. Agarwal, B. K. Budhia C. A. Lingley-Papadopoulos University.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.
Distributed Systems and Algorithms Sukumar Ghosh University of Iowa Spring 2011.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Event Ordering Greg Bilodeau CS 5204 November 3, 2009.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
November NC state university Group Communication Specifications Gregory V Chockler, Idit Keidar, Roman Vitenberg Presented by – Jyothish S Varma.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
1 © R. Guerraoui Distributed algorithms Prof R. Guerraoui Assistant Marko Vukolic Exam: Written, Feb 5th Reference: Book - Springer.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Reliable Communication in the Presence of Failures Kenneth P. Birman and Thomas A. Joseph Presented by Gloria Chang.
Multi-phase Commit Protocols1 Based on slides by Ken Birman, Cornell University.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Lecture 9: Multicast Sep 22, 2015 All slides © IG.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Distributed Systems Lecture 7 Multicast 1. Previous lecture Global states – Cuts – Collecting state – Algorithms 2.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Replication & Fault Tolerance CONARD JAMES B. FARAON
The consensus problem in distributed systems
CS514: Intermediate Course in Operating Systems
CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)
Active replication for fault tolerance
EEC 688/788 Secure and Dependable Computing
Distributed Systems: Group Communication
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
CS514: Intermediate Course in Operating Systems
EEC 688/788 Secure and Dependable Computing
Last Class: Fault Tolerance
Presentation transcript:

Virtual Synchrony Ki Suh Lee Some slides are borrowed from Ken, Jared (cs ) and Justin (cs )

The Process Group Approach to Reliable Distributed Computing Ken Birman – Professor, Cornell University – Isis – Quicksilver – Live Object

Understanding the Limitations of Causally and Totally Ordered Communication David Cheriton – Stanford – PhD – Waterloo – Billionaire Dale Skeen – PhD – UC Berkeley – Distributed pub/sub communication – 3-phase commit protocol

Recap… End-to-End Argument Multicast Partial/Total Ordering – Happens-before relation Logical/Physical Clocks Distributed snapshop Consensus

Recap Asynchronous vs. synchronous Failure model – Crash-stop (fail-stop) Failures – Byzantine Failures

Distributed computing 1978 Lamport’s “Time, Clocks, and the Ordering of Events in a Distributed System”“Time, Clocks, and the Ordering of Events in a Distributed System 1983 Schneider’s State machine replication 1985 FLP’s the impossibility of asynchronous fault-tolerant consensus 1981 transactional serializability (2PC) 1981 Non-blocking 3PC

Motivation Distributed system with – Fault-tolerance – Reliability – Easy programmability

Virtual Synchrony In the early 1980’s Key idea: equate “group” with “data abstraction” – Each group implements some object – An application can belong to many groups

Virtual Synchrony The user sees what looks like a synchronous execution – Simplifies the developer’s task Process groups with state transfer, automated fault detection and membership reporting Ordered reliable multicast, in several flavors Extremely good performance

Historical Aside Isis (Virtual synchrony) – Weaker properties – not quite “FLP consensus” – Much higher performance (orders of magnitude) – Simple Dynamic membership control Paxos (state machine) – Closer to FLP definition of consensus – Slower (by orders of magnitude) – Sometimes can make progress in partitioning situations where virtual synchrony can’t – Complex dynamic membership control

Programming with groups Many systems just have one group – E.g. replicated bank servers – Cluster mimics one highly reliable server But we can also use groups at finer granularity – E.g. to replicate a shared data structure – Now one process might belong to many groups A further reason that different processes might see different inputs and event orders

ISIS

Assumptions Fail-stop model Clocks are not synchronized Unreliable network Network partitions is rare Failure detection subsystem – Consistent system-wide view

Difficulties Conventional message passing technologies – TCP, UDP, RPC, … Group addressing Logical time and causal dependency Message delivery ordering State transfer (membership change) Fault tolerance …

No Reliable Multicast p q r IdealReality UDP, TCP, Multicast not good enough What is the correct way to recover?

Membership Churn p q r Receives new membership Never sent Membership changes are not instant How to handle failure cases?

Message Ordering p q r 12 Everybody wants it! How can you know if you have it? How can you get it?

State Transfers New nodes must get current state Does not happen instantly How do you handle nodes failing/joining? p q r

Failure Atomicity p q r IdealReality x ? Nodes can fail mid-transmit Some nodes receive message, others do not Inconsistencies arise!

Process Groups Distributed groups of cooperating programs Pub/sub style of interaction Requirements – Group communication – Group membership as input – Synchronization

Process Groups Anonymous group – Group addressing – All or none delivery – Message Ordering Explicit group – Members cooperate directly – Consistent views of group membership

Process groups The group view gives a simple leader election rule A group can easily solve consensus A group can easily do consistent snapshot

Close Synchrony Lock-step execution model – Implementing synchronous model in asynchronous environment – Order of events is preserved – A multicast is delivered to its full membership

Close Synchrony p q r s t u

Not practical – Impossible in the presence of failures – Expensive We want close synchrony with high throughput. => Virtual Synchrony

Virtual Synchrony Relax synchronization requirements where possible – Different orders among concurrent events won’t matter as long as they are delivered.

Asynchronous Execution p q r s t u

ABCAST Atomic delivery ordering Stronger Ordering, but costly locking or token passing Not all applications need this…

CBCAST Two messages can be sent to concurrently only when their effects on the group are independent If m1 causally precedes m2, then m1 should be delivered before m2. Weaker then ABCAST Fast!

When to use CBCAST? When any conflicting multicasts are uniquely ordered along a single causal chain …..This is Virtual Synchrony p r s t Each thread corresponds to a different lock

Benefits Assuming a closely synchronous execution model Asynchronous, pipelined communication Failure handling through a system membership list

Isis toolkit A collection of higher-level mechanisms for process groups Still used in – New York and Swiss Stock Exchange – French Air Traffic Control System – US Navy AEGIS

Problems Message delivery is atomic, but not durable Incidental ordering – Limited to ensure communication-level semantics – Not enough to ensure application-level consistency. Violates end-to-end argument.

Limitations Can’t say “for sure” Can’t say the “whole story” Can’t say “together” Can’t say “efficiently”

Can’t say “for sure” Causal relationships at semantic level are not recognizable External or ‘hidden’ communication channel.

Can’t say “together”, “whole story” Serializable ordering, semantic ordering are not ensured

Can’t say “efficiently” No efficiency gain over state-level techniques False Causality Not scalable – Overhead of message reordering – Buffering requirements grow quadratically

False Causality What if m2 happened to follow m1, but was not causally related?

Discussion Virtual Synchrony good! But, not perfect