Download presentation
Presentation is loading. Please wait.
1
Multicast Protocols Jed Liu 28 February 2002
2
2 Introduction Recall Atomic Broadcast: All correct processors receive same set of messages. All messages delivered in same order to all processors. Any message sent by a correct processor is eventually delivered to all processors.
3
3 Introduction (cont’d) But what happens if the network partitions? Atomic Broadcast becomes unsolvable! Define Totally Ordered Broadcast If a majority of the processes form a connected component, guarantee Atomic Broadcast for this component only. COReL is an implementation of this.
4
4 The Model Network uses datagram message delivery. Asynchronous fail-stop model. Stable storage. Communication links are transient. Message Integrity: Messages cannot be corrupted or generated by the network spontaneously.
5
5 The System Architecture Application COReL – Totally Ordered Broadcast Group Communication Service Application messages COReL messages Messages with TS views Totally Ordered Broadcast messages Delivered Totally Ordered
6
6 Properties of the GCS No Duplication: Every message delivered at a process p is delivered only once at p. Total Order: A logical, globally unique timestamp is attached to every message when it is delivered. Causal order is preserved. GCS delivers messages in TS order. Virtual Synchrony: Any two processes undergoing the same two consecutive views in a group G deliver the same set of messages in G within the former view
7
7 Properties of the GCS (cont’d) PQ P and Q in same view. Deliver m Deliver m’ Deliver m” Deliver m’ Send m” Q also delivers m.
8
8 Guarantees Made by COReL Safety: At each process, messages become totally ordered in an order which is a prefix of some common global total order. Total ordering of messages preserves the causal partial order. Liveness: Messages are eventually totally ordered by the members of a view.
9
9 The COReL Algorithm GCS supplies a unique timestamp for each message that gets delivered to COReL. On delivery, the message gets written to stable storage, and an acknowledgement is sent. Within a majority component, messages are ordered in TS order. Concurrent messages are ordered such that messages from the majority component come first.
10
10 The Primary Component Use the notion of a primary component to allow members of one network component to continue ordering messages when a partition occurs. (Can be a majority, or in general, a quorum.) Ordering Rule: Members of the current primary component PM are allowed to totally order a message once the message was acknowledged by all members of PM.
11
11 The Colours Model Green: messages that have been totally ordered according to the Ordering Rule. Yellow: messages received and acknowledged in the context of a primary component. May have become green at other members of the primary component. Red: no knowledge about message’s total order.
12
12 Invariants Order of green messages determines the global total order of those messages. Order of such messages cannot change, and processes have to agree on the order. Causal order of messages is preserved.
13
13 View Changes Set the primary component bit to FALSE. Stop handling regular messages and stop sending regular messages. If new view v contains new members, run a Recovery Procedure. If v is a majority, establish a new primary component. Continue handling regular messages and sending regular messages.
14
14 State Variables Last_Committed_Primary Number of last primary component that the process has committed to establish. Last_Attempted_Primary Number of last primary component that the process has attempted to establish.
15
15 Recovery Procedure Send state message to members of new group. Wait for state messages from all other group members. Find a set of Representatives in the group. Set of processes with the largest Last_Committed_Primary in the group. Get Representatives to agree on the set of green messages and the set of yellow messages. Set of green messages determined by the union. Set of yellow messages determined by the intersection.
16
16 Recovery Procedure (cont’d) A deterministically chosen representative retransmits green and yellow messages to get all group members to agree on the set of green and yellow messages. Non-representatives re-colour yellow messages as red if the message is not yellow at any representative. Retransmit red messages as necessary to get all group members to agree on the state and colour of their message queues.
17
17 View Change During Recovery? If in the middle of recovery and we get a view change, we immediately restart recovery with the new view. No need to undo anything. If view change only removes processes from group, no need to retransmit messages.
18
18 Establishing a New Primary Component Attempt: Record attempt on stable storage and send attempt message to all other members. Wait for attempt messages from all other members. Commit: Record commit on stable storage. Mark all non-green messages as yellow. Send a commit message. Establish: When commit messages from all other members arrive, set primary component bit to TRUE and mark all messages as green.
19
19 View Change while Establishing? A process marks the messages in its message queue as green only when it knows that all other members have marked them as yellow. If a failure occurs during the protocol, the invariants are not violated.
20
20 COReL Summary An algorithm for totally-ordered multicast in an asynchronous environment. Resilient to network partitions and communication link failures. But only live in the primary component! Allows members of minority components to initiate messages. These messages can become totally ordered even if the originating process in never a member of the primary component.
21
21 Transis Another multicast protocol that deals with network partitions. Regulates network flow to avoid flooding and message loss. Uses a sliding-window algorithm similar to that used in TCP.
22
22 The Persistent Replication Services Layer (PRSL) Built on top of Transis. Provides applications with long term services such as message logging and replaying, and reconciliation of states among recovered and reconnected endpoints. With just Transis: Message delivery only guaranteed within the current group. No end-to-end acknowledgement at application level, so no guarantee that any destination actually acted on the message.
23
23 Replication Groups The basis of PRSL operations. A static set of processes defined at startup time. Different from multicast groups — can only change through startup and shutdown of members.
24
24 Replication Group Operations Uniform multicast. Totally ordered uniform multicast. Stable multicast. Explicit application-level acknowledgement. Startup/shutdown for adding/removing a member to/from the replication group.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.