Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman.

Slides:



Advertisements
Similar presentations
COS 461 Fall 1997 Group Communication u communicate to a group of processes rather than point-to-point u uses –replicated service –efficient dissemination.
Advertisements

Reliable Communication in the Presence of Failures Kenneth Birman, Thomas Joseph Cornell University, 1987 Julia Campbell 19 November 2003.
Teaser - Introduction to Distributed Computing
6.852: Distributed Algorithms Spring, 2008 Class 7.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 PRACTICAL DISTRIBUTED COMMIT IN MODERN ENVIRONMENTS by Jyrki Nummenmaa.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Lab 2 Group Communication Andreas Larsson
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Virtual Synchrony Ki Suh Lee Some slides are borrowed from Ken, Jared (cs ) and Justin (cs )
LEADER ELECTION CS Election Algorithms Many distributed algorithms need one process to act as coordinator – Doesn’t matter which process does the.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Distributed Systems 2006 Group Communication II * *With material adapted from Ken Birman.
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Distributed Systems 2006 Retrofitting Reliability* *With material adapted from Ken Birman.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman.
Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure.
Ken Birman Cornell University. CS5410 Fall
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Composition Model and its code. bound:=bound+1.
Election Algorithms and Distributed Processing Section 6.5.
Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Consensus and Its Impossibility in Asynchronous Systems.
Group Communication Group oriented activities are steadily increasing. There are many types of groups:  Open and Closed groups  Peer-to-peer and hierarchical.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Krzys Ostrowski: TA.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Krzys Ostrowski: TA.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Hwajung Lee. A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types of groups:
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CS603 Fault Tolerance - Communication April 17, 2002.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Distributed Transaction Management, Fall 2002Lecture 2 / Distributed Locking Jyrki Nummenmaa
Building Dependable Distributed Systems, Copyright Wenbing Zhao
SysRép / 2.5A. SchiperEté The consensus problem.
Replication and Group Communication. Management of Replicated Data FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM Instructor’s.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Reliable Communication in the Presence of Failures Kenneth P. Birman and Thomas A. Joseph Presented by Gloria Chang.
Failure detection The design of fault-tolerant systems will be easier if failures can be detected. Depends on the 1. System model, and 2. The type of failures.
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Networks, Part 2 March 7, Networks End to End Layer  Build upon unreliable Network Layer  As needed, compensate for latency, ordering, data.
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Principles of reliable data transfer 0.
Fault Tolerance (2). Topics r Reliable Group Communication.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Lecture 5 Page 1 CS 111 Summer 2013 Bounded Buffers A higher level abstraction than shared domains or simple messages But not quite as high level as RPC.
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
Replication & Fault Tolerance CONARD JAMES B. FARAON
Reliable group communication
Alternating Bit Protocol
Active replication for fault tolerance
CS514: Intermediate Course in Operating Systems
Distributed systems Consensus
Presentation transcript:

Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman

Distributed Systems Plan (We skip Section ) Tracking group membership: We’ll base it on 2PC and 3PC Fault-tolerant multicast: We’ll use membership Ordered multicast: We’ll base it on fault-tolerant multicast Tools for solving practical replication and availability problems: we’ll base them on ordered multicast Robust Web Services: We’ll build them with these tools 2PC and 3PC: Our first “tools” (lowest layer)

Distributed Systems Basic use

Distributed Systems Replication A fundamental concept with many uses –If we can solve this core problem, we can apply the solution in many settings Replicate data or a service for high availability Replicate data so that group members can share loads and improve scalability Replicate locking or synchronization state Replicate membership information in a data center so that we can route requests Replicate management information or parameters to tune performance Replication is a basic primitive… But one missing in many development toolkits –We find replication mechanisms inside the middleware (e.g. IBM WebSphere uses replication, as does Microsoft’s Windows Clustering technology… ) –End-users often given weaker solutions (e.g., publish-subscribe)

Distributed Systems Who “does” the replication? We think of replication as happening inside groups –Could be a group of identical components –Or just a group of processes that asked to join in order to replicate a data structure Members might be different programs... Sometimes we know who might be a replica ahead of time (the static model), sometimes not (the dynamic model)

Distributed Systems Let’s focus on update ordering We want to –Replicate data –Update it while accessing it What sorts of issues must be addressed?

Distributed Systems Drill down: Life of a group p q r s t Q does an update. It needs to reach the three members u0u0 u1u1 Initial group membership is {r, s, t} Now s goes offline for a while. Maybe it crashed If p tries to “reliably” multicast to s, it won’t get an ack and will wait indefinitely. But how can p be sure that s has failed? If p is wrong, s will be missing an update! Now s is back online and presumably should receive the update q is sending. Here we see the update ordering issue in a “pure” form. Which came first, p’s update or the one from q?

Distributed Systems Questions to ask about order Who should receive an update? What update ordering to use? How expensive is the ordering property?

Distributed Systems Questions to ask about order Delivery order for concurrent updates –Issue is more subtle than it looks! –We can fix a system-wide order, but… Sometimes nobody notices out of order delivery System-wide ordering is expensive If we care about speed we may need to look closely at cost of ordering

Distributed Systems Ordering example System replicates variables x, y –Process p sends “x = x/2” –Process q sends “x = 83” –Process r sends “y = 17” –Process s sends “z = x/y” To what degree is ordering needed?

Distributed Systems Ordering example x=x/2 x=83 These clearly “conflict” –If we execute x=x/2 first, then x=83, x will have value 83. –In opposite order, x is left equal to 41.5

Distributed Systems Ordering example x=x/2 y=17 These don’t seem to conflict –After the fact, nobody can tell what order they were performed in – they commute

Distributed Systems Ordering example z=x/y This conflicts with updates to x, updates to y and with other updates to z

Distributed Systems Single updater In many systems, there is only one process that can update a given type of data –For example, the variable might be “sensor values” for a temperature sensor –Only the process monitoring the sensor does updates, although perhaps many processes want to read the data and we replicate it to exploit parallelism –Here the only “ordering” that matters is the FIFO ordering of the updates emitted by that process

Distributed Systems Single updater If p is the only update source, the need is a bit like the TCP “fifo” ordering p r s t

Distributed Systems Mutual exclusion Another important case Arises in systems that use locks to control access to shared data –This is very common, for example in transactional systems –Very often without locks, a system rapidly becomes corrupted Suppose that before performing conflicting operations, processes must lock the variables –This means that there will never be any true concurrency –And it simplifies our ordering requirement

Distributed Systems Mutual exclusion Dark blue when holding the lock How is this case similar to “FIFO” with one sender? How does it differ? p q r s t

Distributed Systems Mutual exclusion Are these updates in “FIFO” order? –No, the sender isn’t always the same –But yes in the sense that there is a unique path through the system (corresponding to the lock) and the updates are ordered along that path Here updates are ordered by Lamport’s happens-before relation: 

Distributed Systems Types of ordering we may want Deliver updates in an order matching the FIFO order in which they were sent Deliver updates in an order matching the  order in which they were sent For conflicting concurrent updates, pick an order and use that order at all replicas Ordered with respect to all other kinds of communication (delivered either before or entirely after) Cheapest More costly Most costly Still cheap

Distributed Systems Types of ordering we may want Deliver updates in an order matching the FIFO order in which they were sent Deliver updates in an order matching the  order in which they were sent For conflicting concurrent updates, pick an order and use that order at all replicas Ordered with respect to all other kinds of communication (delivered either before or entirely after) fbcast abcast gbcast cbcast

Distributed Systems Now continue to “drill down” We drilled down on ordering But what about failure? –We have dynamic group membership –Can we build reliable multicast? p q r s t

Distributed Systems Unreliable multicast Suppose that to send a multicast, a process just uses an unreliable protocol –Perhaps IP multicast –Perhaps UDP point-to-point –Perhaps TCP… Some messages might get dropped –If so it eventually finds out and resends them (various options for how to do it)

Distributed Systems Concerns if sender crashes Perhaps it sent some message and only one process has seen it We would prefer to ensure that –All receivers, in “current view” of the group receive any messages that any receiver receives unless the sender and all receivers crash, erasing evidence…

Distributed Systems An interrupted multicast A message from q to r was “dropped” Since q has crashed, it won’t be resent p q r s

Distributed Systems Guarantees and multicast Failure-atomic multicasts –If a process remains operational, multicast is delivered to all other operational Dynamically uniform multicasts –If a process delivers a message, all destinations that remain operational will deliver a copy of the message too –Very costly Non-uniform multicasts –If a message has been delivered to a set of members that fail, the message may not be delivered to remaining members Flushing –A message is ”unstable” if some receiver has it but others don’t –Flush lets applications pause until multicasts sent/received are delivered –Non-uniform multicast becomes dynamically uniform if flush is called before delivering a message!

Distributed Systems Non-uniform, failure-atomic group multicast Simple approach –Add membership list to message Assume GMS for group –Send message to all members Hardware IP multicast or reliable stream-style protocol –Members echo messages to all other members A flurry of O(n 2 ) messages Failure-atomic –Assume Pi receives and delivers message and remains operational –If Pj does not receive, either Pi or Pj or both are failed – GMS reports so Non-uniform –Assume sender and Pi fail after Pi has received message

Distributed Systems Optimizations Members use GMS to decide whether to retransmit –Need to save copy of message Three phases –Send message, get ack over reliable stream –send OK to delete saved copy Reation message id to avoid duplicates –Send OK to delete saved IDs Delay phase two and three messages –piggyback if process sends phase one again anytime soon... Also need to optimize failure case

Distributed Systems Dynamically uniform failure-atomic group multicast Extend non-uniform protocol –Don’t deliver message until it is known that processes in destination group have a copy Need additional round of messages –Very costly – need to wait for roundtrip to slowest member

Distributed Systems View-synchronous failure atomicity What if we, e.g., use views to subdivide work and process fails while multicasting? –need to synchronize on view change –Implementing flush Easy if P1 is new coordinator –Otherwise add a new round to new-view protcol in which members flush All members see messages delivered ”in” the same view

Distributed Systems Summary We studied (reliable) multicast –examples of use –Implementation Still need ordering Tracking group membership: We’ll base it on 2PC and 3PC Fault-tolerant multicast: We’ll use membership Ordered multicast: We’ll base it on fault-tolerant multicast Tools for solving practical replication and availability problems: we’ll base them on ordered multicast Robust Web Services: We’ll build them with these tools 2PC and 3PC: Our first “tools” (lowest layer)