Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic

Slides:

Advertisements

Similar presentations

COS 461 Fall 1997 Group Communication u communicate to a group of processes rather than point-to-point u uses –replicated service –efficient dissemination.

Advertisements

Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.

Reliable Communication in the Presence of Failures Kenneth Birman, Thomas Joseph Cornell University, 1987 Julia Campbell 19 November 2003.

CS 542: Topics in Distributed Systems Diganta Goswami.

CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.

DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.

CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.

CS542 Topics in Distributed Systems Diganta Goswami.

Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.

6.852: Distributed Algorithms Spring, 2008 Class 7.

(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)

LEADER ELECTION CS Election Algorithms Many distributed algorithms need one process to act as coordinator – Doesn’t matter which process does the.

CS 582 / CMPE 481 Distributed Systems Fault Tolerance.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.

Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.

CS 582 / CMPE 481 Distributed Systems

Transis Efficient Message Ordering in Dynamic Networks PODC 1996 talk slides Idit Keidar and Danny Dolev The Hebrew University Transis Project.

Multicast Protocols Jed Liu 28 February Introduction  Recall Atomic Broadcast:  All correct processors receive same set of messages.  All messages.

Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.

1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.

Group Communication using Ensemble Part II. 2 Introduction From previous tutorial: Ensemble’s application interface: Concepts of Group Membership, View,

Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.

Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.

Lab 1 Bulletin Board System Farnaz Moradi Based on slides by Andreas Larsson 2012.

1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:

Chapter 2 – X.25, Frame Relay & ATM. Switched Network Stations are not connected together necessarily by a single link Stations are typically far apart.

Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.

ARMADA Middleware and Communication Services T. ABDELZAHER, M. BJORKLUND, S. DAWSON, W.-C. FENG, F. JAHANIAN, S. JOHNSON, P. MARRON, A. MEHRA, T. MITTON,

TOTEM: A FAULT-TOLERANT MULTICAST GROUP COMMUNICATION SYSTEM L. E. Moser, P. M. Melliar Smith, D. A. Agarwal, B. K. Budhia C. A. Lingley-Papadopoulos University.

Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.

Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.

Farnaz Moradi Based on slides by Andreas Larsson 2013.

Practical Byzantine Fault Tolerance

Farnaz Moradi Based on slides by Andreas Larsson 2013.

Dealing with open groups The view of a process is its current knowledge of the membership. It is important that all processes have identical views. Inconsistent.

IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.

Dealing with open groups The view of a process is its current knowledge of the membership. It is important that all processes have identical views. Inconsistent.

Hwajung Lee. A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types of groups:

November NC state university Group Communication Specifications Gregory V Chockler, Idit Keidar, Roman Vitenberg Presented by – Jyothish S Varma.

EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.

D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.

The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.

Building Dependable Distributed Systems, Copyright Wenbing Zhao

SysRép / 2.5A. SchiperEté The consensus problem.

Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.

Lecture 10: Coordination and Agreement (Chap 12) Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.

The Internet Book. Chapter 16 3 A Packet Switching System Can Be Overrun Packet switching allows multiple computers to communicate without delay. –Requires.

Reliable Communication in the Presence of Failures Kenneth P. Birman and Thomas A. Joseph Presented by Gloria Chang.

Lecture 4-1 Computer Science 425 Distributed Systems (Fall2009) Lecture 4 Chandy-Lamport Snapshot Algorithm and Multicast Communication Reading: Section.

Relying on Safe Distance to Achieve Strong Partitionable Group Membership in Ad Hoc Networks Authors: Q. Huang, C. Julien, G. Roman Presented By: Jeff.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.

DATA LINK CONTROL. DATA LINK LAYER RESPONSIBILTIES  FRAMING  ERROR CONTROL  FLOW CONTROL.

Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.

EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.

Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013

COT 5611 Operating Systems Design Principles Spring 2012

Outline Announcements Fault Tolerance.

Active replication for fault tolerance

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Abstractions for Fault Tolerance

COT 5611 Operating Systems Design Principles Spring 2014

Presentation transcript:

Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic

Overview COReL (Introduction) COReL (The Model) COReL (System Architecture) COReL (Problem Definition:Service Guarantees) COReL (Algorithm) Related Work Summary/Conclusion

COReL (Introduction) COReL (Consistent Object Replication Layer) : An algorithm for Totally Ordered Broadcast when the network is partitioned or processes fail. To achieve this, the algorithm uses an underlying GCS to multicast messages to all connected members GCS (Group Communication Service) : provides reliable multicast and membership services among multicast groups, herein detecting failures and extracting faulty members from the membership, Failure Detector Totally Ordered Broadcast : For all messages m 1 and m 2 and all processes p i and p j, if m 1 is received at p i before m 2 is, then m 2 is not received at p j before m 1 is COReL guarantees that if a majority of processes (primary component) form a connected component, eventually they will deliver all messages sent by any of them, in the same order Processes using COReL are allowed to initiate messages even if they are not members of a majority component.Thus, messages can eventually become totally ordered even if their initiator is never a member of a majority component COReL recovers lost messages (in scenarios where processes are disconnected, by GCS, and then connected again) and extends the order achieved by the GCS to a global total order The algorithm imposes no overhead, all information needed is piggybacked on regular messages

COReL (The Model) There is no bound on the message transmission time (asynchronus system) The underlying communication network provides datagram message delivery Crashed processes may later recover with their stable storage intact, live processes are considered correct and crashed processes are considered faulty Malicious failures are not considered, there exists a property for message integrity Message Integrity: If a message m is delivered by a process p, then there is a causally preceding send event of m at some process q

COReL (System Architecture)

All copies of COReL are members of one multicast group The multicast group undergoes view changes when processes are added or taken out of the group due to failures A view v is a pair consisting of a view set v.set, and a view identifier v.id A process p is a member of a view v if p is in v.set Changes in views are reported to COReL through special view messages If view v is the latest view that process p received before send (receive) event e, then we say that e occurs at p in v

COReL (System Architecture) - Properties of the GCS No Duplication: Every message delivered at process p is delivered only once at p Total Order: A unique logical timestamp is attached to every message when it is delivered. The TS total order preserves the causal partial order and the GCS deliveres messages at each process in the TS order, possibly with gaps Virtual Synchrony: Any two processes undergoing the same two consecutive views in a group G deliver the same set of messages in G within the former view Property 3.5: Let p and q be members of a view v. If p delivers a message m before m’ in v, and if q delivers m’ and later sends a message m’’, such that p then delivers m’’, then q also delivers m before m’

COReL (Problem Definition:Service Guarantees) Safety - Property 3.6: At each process, messages become totally ordered in an order which is a prefix of some common global total order - Propery 3.7: Messages are totally ordered by each process in an order which preserves the causal partial order Liveness - Propery 3.7: P is a set of processes such that P = v.set, where v is a view. After a time t, no member of P delivers any views, so that the last view delivered by each p in P is v. Then we assume that every message sent by a process in P is delivered by every other process in P. Under these circumstances COReL guarantees that every message sent by a process in P is eventually totally ordered by all the members of P

COReL (Algorithm) Reliable Multicast - Each process logs every message that it receives from the GCS on stable storage - When components re-merge, a Recovery Procedure is invoken - Messages are stored before the message send event is complete Message Ordering - COReL orders messages according to their TS, provided by the GCS - When a message is retransmitted, only the ”old” TS is valid - Order Rule 1: Once a message is acknowledged by all the members of a primary component, members of this component are allowed to totally order the message - Each instance of COReL maintains a local message queue, MQ, incoming messages within each component are placed at the end of this queue - When components merge, retransmitted messages from other components may interleave with messages in MQ, but never preceding allredy ordered messages

COReL (Algorithm) - The Colors Model  red: no knowledge about the message’s total order, no knowledge that it has a different color  yellow: messages that are received and acknowledged in the context of a primary component and might have become green at other members of PM  green: have knowledge about message’s total order, marks messages green when it knows that all the other members of the PM know that the message is yellow. Green  totally ordered  The MQ is divided into three zones: a green prefix, then a yellow zone and then a red suffix  When components merge, members of the last PM enforce all the green an yellow messages before the red, red messages from different components are interleaved according to the TS order

COReL (Algorithm) - Invariants of the algorithm If a process p has in its MQ a message m, originally sent by process q, then for every message m’ that q sent before m (or had in its MQ before sending m), the MQ of p contains m’ before m. (Causal) New green messages are appended to the end of the green prefix of MQ for a process p, and this is the only way that this order,green prefix of MQ for p, of green messages is allowed to change, otherwise the order of green messages in MQ is never altered. (No Changes in Green) If p and q are processes running the algorithm, then at any point during execution, one of the green prefixes of MQ for either p or q is a prefix of the other. (Agreed green) If a process p has marked a message m green in the context of a PM, and a process q knows of PM, then q has m marked as green/yellow. Furthermore the prefixes of the MQ’s for the two processes ending at message m are equal. (Yellow)

COReL (Algorithm) - Handling view changes When a view change is delivered, the View Change Handler is invoken (Figure 3.5) The primary component bit is set to FALSE, blocks regular messages and stops initiating regular messages If the new view introduces new members, the Recovery Procedure (Section 5.5.2) is invoken in order to bring all the members of the new view to a common state The members of the new view exchange state messages, and report on the last PM they know of, and it’s green and yellow lines. In this way every process knows which every other member has, and missing messages are retransmitted

COReL (Algorithm) - Handling view changes (2) The new green line is the maximum of the green lines of all the members, and the new yellow line is the minimum of the yellow lines of the members that know of PM Processes that need to retransmit messages do so,according to the Retransmission Rule, and with their original ACKs If the new view is a majority, the members will try to establish a new PM (Section 5.5.1), if not (process faults-new members)  establish view,no retransmission If the GCS reports of a new view change before the protocol of View Change Handler is over,the protocol is immediately restarted for the new view, nothing needs to be undone

COReL (Related Work) Consensus Algorithms Paxos multiple Consensus Algorithm Atomic Broadcast (Sequence of C-decisions) Total Ordering protocol in Amir,1995 Fekete et al.1997 made some simplifications, simpler to represent but less efficient The correctness of the algorithm is proved in Keidar,1994

Summary Motive: propose an algorithm for Totally Ordered Broadcast, resillient to both process failures and network partitions Conclusion: the algorithm proposed is briefly described,the correctness of it is proven in another paper, Keidar,1994..Nevertheless,the paper corrsponds to the above motive. What makes it different from similar algorithms?