Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman.

Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman

Distributed Systems 20062 Plan (We skip Section 16.2.5) Tracking group membership: We’ll base it on 2PC and 3PC Fault-tolerant multicast: We’ll use membership Ordered multicast: We’ll base it on fault-tolerant multicast Tools for solving practical replication and availability problems: we’ll base them on ordered multicast Robust Web Services: We’ll build them with these tools 2PC and 3PC: Our first “tools” (lowest layer)

Distributed Systems 20063 Basic use

Distributed Systems 20064 Replication A fundamental concept with many uses –If we can solve this core problem, we can apply the solution in many settings Replicate data or a service for high availability Replicate data so that group members can share loads and improve scalability Replicate locking or synchronization state Replicate membership information in a data center so that we can route requests Replicate management information or parameters to tune performance Replication is a basic primitive… But one missing in many development toolkits –We find replication mechanisms inside the middleware (e.g. IBM WebSphere uses replication, as does Microsoft’s Windows Clustering technology… ) –End-users often given weaker solutions (e.g., publish-subscribe)

Distributed Systems 20065 Who “does” the replication? We think of replication as happening inside groups –Could be a group of identical components –Or just a group of processes that asked to join in order to replicate a data structure Members might be different programs... Sometimes we know who might be a replica ahead of time (the static model), sometimes not (the dynamic model)

Distributed Systems 20066 Let’s focus on update ordering We want to –Replicate data –Update it while accessing it What sorts of issues must be addressed?

Distributed Systems 20067 Drill down: Life of a group p q r s t Q does an update. It needs to reach the three members u0u0 u1u1 Initial group membership is {r, s, t} Now s goes offline for a while. Maybe it crashed If p tries to “reliably” multicast to s, it won’t get an ack and will wait indefinitely. But how can p be sure that s has failed? If p is wrong, s will be missing an update! Now s is back online and presumably should receive the update q is sending. Here we see the update ordering issue in a “pure” form. Which came first, p’s update or the one from q?

Distributed Systems 20068 Questions to ask about order Who should receive an update? What update ordering to use? How expensive is the ordering property?

Distributed Systems 20069 Questions to ask about order Delivery order for concurrent updates –Issue is more subtle than it looks! –We can fix a system-wide order, but… Sometimes nobody notices out of order delivery System-wide ordering is expensive If we care about speed we may need to look closely at cost of ordering

Distributed Systems 200610 Ordering example System replicates variables x, y –Process p sends “x = x/2” –Process q sends “x = 83” –Process r sends “y = 17” –Process s sends “z = x/y” To what degree is ordering needed?

Distributed Systems 200611 Ordering example x=x/2 x=83 These clearly “conflict” –If we execute x=x/2 first, then x=83, x will have value 83. –In opposite order, x is left equal to 41.5

Distributed Systems 200612 Ordering example x=x/2 y=17 These don’t seem to conflict –After the fact, nobody can tell what order they were performed in – they commute

Distributed Systems 200613 Ordering example z=x/y This conflicts with updates to x, updates to y and with other updates to z

Distributed Systems 200614 Single updater In many systems, there is only one process that can update a given type of data –For example, the variable might be “sensor values” for a temperature sensor –Only the process monitoring the sensor does updates, although perhaps many processes want to read the data and we replicate it to exploit parallelism –Here the only “ordering” that matters is the FIFO ordering of the updates emitted by that process

Distributed Systems 200615 Single updater If p is the only update source, the need is a bit like the TCP “fifo” ordering p r s t

Distributed Systems 200616 Mutual exclusion Another important case Arises in systems that use locks to control access to shared data –This is very common, for example in transactional systems –Very often without locks, a system rapidly becomes corrupted Suppose that before performing conflicting operations, processes must lock the variables –This means that there will never be any true concurrency –And it simplifies our ordering requirement

Distributed Systems 200617 Mutual exclusion Dark blue when holding the lock How is this case similar to “FIFO” with one sender? How does it differ? p q r s t

Distributed Systems 200618 Mutual exclusion Are these updates in “FIFO” order? –No, the sender isn’t always the same –But yes in the sense that there is a unique path through the system (corresponding to the lock) and the updates are ordered along that path Here updates are ordered by Lamport’s happens-before relation: 

Distributed Systems 200619 Types of ordering we may want Deliver updates in an order matching the FIFO order in which they were sent Deliver updates in an order matching the  order in which they were sent For conflicting concurrent updates, pick an order and use that order at all replicas Ordered with respect to all other kinds of communication (delivered either before or entirely after) Cheapest More costly Most costly Still cheap

Distributed Systems 200620 Types of ordering we may want Deliver updates in an order matching the FIFO order in which they were sent Deliver updates in an order matching the  order in which they were sent For conflicting concurrent updates, pick an order and use that order at all replicas Ordered with respect to all other kinds of communication (delivered either before or entirely after) fbcast abcast gbcast cbcast

Distributed Systems 200621 Now continue to “drill down” We drilled down on ordering But what about failure? –We have dynamic group membership –Can we build reliable multicast? p q r s t

Distributed Systems 200622 Unreliable multicast Suppose that to send a multicast, a process just uses an unreliable protocol –Perhaps IP multicast –Perhaps UDP point-to-point –Perhaps TCP… Some messages might get dropped –If so it eventually finds out and resends them (various options for how to do it)

Distributed Systems 200623 Concerns if sender crashes Perhaps it sent some message and only one process has seen it We would prefer to ensure that –All receivers, in “current view” of the group receive any messages that any receiver receives unless the sender and all receivers crash, erasing evidence…

Distributed Systems 200624 An interrupted multicast A message from q to r was “dropped” Since q has crashed, it won’t be resent p q r s

Distributed Systems 200625 Guarantees and multicast Failure-atomic multicasts –If a process remains operational, multicast is delivered to all other operational Dynamically uniform multicasts –If a process delivers a message, all destinations that remain operational will deliver a copy of the message too –Very costly Non-uniform multicasts –If a message has been delivered to a set of members that fail, the message may not be delivered to remaining members Flushing –A message is ”unstable” if some receiver has it but others don’t –Flush lets applications pause until multicasts sent/received are delivered –Non-uniform multicast becomes dynamically uniform if flush is called before delivering a message!

Distributed Systems 200626 Non-uniform, failure-atomic group multicast Simple approach –Add membership list to message Assume GMS for group –Send message to all members Hardware IP multicast or reliable stream-style protocol –Members echo messages to all other members A flurry of O(n 2 ) messages Failure-atomic –Assume Pi receives and delivers message and remains operational –If Pj does not receive, either Pi or Pj or both are failed – GMS reports so Non-uniform –Assume sender and Pi fail after Pi has received message

Distributed Systems 200627 Optimizations Members use GMS to decide whether to retransmit –Need to save copy of message Three phases –Send message, get ack over reliable stream –send OK to delete saved copy Reation message id to avoid duplicates –Send OK to delete saved IDs Delay phase two and three messages –piggyback if process sends phase one again anytime soon... Also need to optimize failure case

Distributed Systems 200628 Dynamically uniform failure-atomic group multicast Extend non-uniform protocol –Don’t deliver message until it is known that processes in destination group have a copy Need additional round of messages –Very costly – need to wait for roundtrip to slowest member

Distributed Systems 200629 View-synchronous failure atomicity What if we, e.g., use views to subdivide work and process fails while multicasting? –need to synchronize on view change –Implementing flush Easy if P1 is new coordinator –Otherwise add a new round to new-view protcol in which members flush All members see messages delivered ”in” the same view

Distributed Systems 200630 Summary We studied (reliable) multicast –examples of use –Implementation Still need ordering Tracking group membership: We’ll base it on 2PC and 3PC Fault-tolerant multicast: We’ll use membership Ordered multicast: We’ll base it on fault-tolerant multicast Tools for solving practical replication and availability problems: we’ll base them on ordered multicast Robust Web Services: We’ll build them with these tools 2PC and 3PC: Our first “tools” (lowest layer)

Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman.

Similar presentations

Presentation on theme: "Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman.

Similar presentations

Presentation on theme: "Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman."— Presentation transcript:

Similar presentations

About project

Feedback