IP Multicasting By Behzad Akbari Fall 2008 These slides are based on the slides of J. Kurose (UMASS) and Shivkumar (RPI)
creation/transmission Broadcast Routing deliver packets from source to all other nodes source duplication is inefficient: R1 R2 R3 R4 source duplication in-network duplication duplicate creation/transmission source duplication: how does source determine recipient addresses?
In-network duplication flooding: when node receives brdcst pckt, sends copy to all neighbors Problems: cycles & broadcast storm controlled flooding: node only brdcsts pkt if it hasn’t brdcst same packet before Node keeps track of pckt ids already brdcsted Or reverse path forwarding (RPF): only forward pckt if it arrived on shortest path between node and source spanning tree No redundant packets received by any node
Spanning Tree First construct a spanning tree Nodes forward copies only along spanning tree A B G D E c F (a) Broadcast initiated at A (b) Broadcast initiated at D
Spanning Tree: Creation Center node Each node sends unicast join message to center node Message forwarded until it arrives at a node already belonging to spanning tree A A 3 B B c c 4 2 D D F E F E 1 5 G G Stepwise construction of spanning tree (b) Constructed spanning tree
Multicast Routing: Problem Statement Goal: find a tree (or trees) connecting routers having local mcast group members tree: not all paths between routers used source-based: different tree from each sender to rcvrs shared-tree: same tree used by all group members Source-based trees Notes: 3.3 Network Layer: Multicast Routing Algorithms 3-9 Shared tree
Approaches for building mcast trees source-based tree: one tree per source shortest path trees reverse path forwarding group-shared tree: group uses one tree minimal spanning (Steiner) center-based trees …we first look at basic approaches, then specific protocols adopting these approaches Notes: 3.3 Network Layer: Multicast Routing Algorithms 3-11
Shortest Path Tree mcast forwarding tree: tree of shortest path routes from source to all receivers Dijkstra’s algorithm S: source LEGEND R1 2 R4 router with attached group member 1 R2 5 router with no attached group member 3 4 R5 Notes: 3.3 Network Layer: Multicast Routing Algorithms 3-12 6 link used for forwarding, i indicates order link added by algorithm R3 i R6 R7
Reverse Path Forwarding rely on router’s knowledge of unicast shortest path from it to sender each router has simple forwarding behavior: if (mcast datagram received on incoming link on shortest path back to center) then flood datagram onto all outgoing links else ignore datagram Notes: 3.3 Network Layer: Multicast Routing Algorithms 3-13
Reverse Path Forwarding: example S: source LEGEND R1 R4 router with attached group member R2 router with no attached group member R5 R3 datagram will be forwarded R6 R7 datagram will not be forwarded result is a source-specific reverse SPT may be a bad choice with asymmetric links Notes: 3.3 Network Layer: Multicast Routing Algorithms 3-14
Reverse Path Forwarding: pruning forwarding tree contains subtrees with no mcast group members no need to forward datagrams down subtree “prune” msgs sent upstream by router with no downstream group members LEGEND S: source R1 router with attached group member R4 router with no attached group member R2 P P Notes: 3.3 Network Layer: Multicast Routing Algorithms 3-15 R5 prune message P links with multicast forwarding R3 R6 R7
Shared-Tree: Steiner Tree Steiner Tree: minimum cost tree connecting all routers with attached group members problem is NP-complete excellent heuristics exists not used in practice: computational complexity information about entire network needed monolithic: rerun whenever a router needs to join/leave Notes: 1. See L. Wei and D. Estrin, “A Comparison of multicast trees and algorithms,” TR USC-CD-93-560, Dept. Computer Science, University of California, Sept 1993 for a comparison of heuristic approaches. 3.3 Network Layer: Multicast Routing Algorithms 3-16
Center-based trees single delivery tree shared by all one router identified as “center” of tree to join: edge router sends unicast join-msg addressed to center router join-msg “processed” by intermediate routers and forwarded towards center join-msg either hits existing tree branch for this center, or arrives at center path taken by join-msg becomes new branch of tree for this router Notes: 1. It’s always nice to see a PhD dissertation with impact. The earliest discussion of center-based trees for multicast appears to be D. Wall, “Mechanisms for Broadcast and Selective Broadcast,” PhD dissertation, Stanford U., June 1980. 3.3 Network Layer: Multicast Routing Algorithms 3-17
Center-based trees: an example Suppose R6 chosen as center: LEGEND R1 router with attached group member R4 3 router with no attached group member R2 2 1 R5 path order in which join messages generated R3 1 Notes: 3.3 Network Layer: Multicast Routing Algorithms 3-18 R6 R7
IP Multicast Architecture Service model Hosts Host-to-router protocol (IGMP) Routers Multicast routing protocols (various)
Internet Group Management Protocol IGMP: “signaling” protocol to establish, maintain, remove groups on a subnet. Objective: keep router up-to-date with group membership of entire LAN Routers need not know who all the members are, only that members exist Each host keeps track of which mcast groups are subscribed to Socket API informs IGMP process of all joins
How IGMP Works On each link, one router is elected the “querier” Routers: Q Hosts: On each link, one router is elected the “querier” Querier periodically sends a Membership Query message to the all-systems group (224.0.0.1), with TTL = 1 On receipt, hosts start random timers (between 0 and 10 seconds) for each multicast group to which they belong
How IGMP Works (cont.) Routers: Q Hosts: G G G G When a host’s timer for group G expires, it sends a Membership Report to group G, with TTL = 1 Other members of G hear the report and stop (suppress) their timers Routers hear all reports, and time out non-responding groups
How IGMP Works (cont.) Normal case: only one report message per group present is sent in response to a query Query interval is typically 60-90 seconds When a host first joins a group, it sends immediate reports, instead of waiting for a query IGMPv2: Hosts may send a “Leave group” message to “all routers” (224.0.0.2) address Querier responds with a Group-specific Query message: see if any group members are present Lower leave latency
IP Multicast Architecture Service model Hosts Host-to-router protocol (IGMP) Routers Multicast routing protocols
Multicast Routing Basic objective – build distribution tree for multicast packets The “leaves” of the distribution tree are the subnets containing at least one group member (detected by IGMP) Multicast service model makes it hard Anonymity Dynamic join/leave
Routing Techniques Flood and prune Link-state multicast protocols Begin by flooding traffic to entire network Prune branches with no receivers Examples: DVMRP, PIM-DM Link-state multicast protocols Routers advertise groups for which they have receivers to entire network Compute trees on demand Example: MOSPF
Routing Techniques(…) Core-based protocols Specify “meeting place” aka “core” or “rendezvous point (RP)” Sources send initial packets to core Receivers join group at core Requires mapping between multicast group address and “meeting place” Examples: CBT, PIM-SM
Routing Techniques (…) Tree building methods: Data-driven: calculate the tree only when the first packet is seen. Eg: DVMRP, MOSPF Control-driven: Build tree in background before any data is transmitted. Eg: CBT Join-styles: Explicit-join: The leaves explicitly join the tree. Eg: CBT, PIM-SM Implicit-join: All subnets are assumed to be receivers unless they say otherwise (eg via tree pruning). Eg: DVMRP, MOSPF
Shared vs. Source-based Trees Separate shortest path tree for each sender (S,G) state at intermediate routers Eg: DVMRP, MOSPF, PIM-DM, PIM-SM Shared trees Single tree shared by all members Data flows on same tree regardless of sender (*,G) state at intermediate routers Eg: CBT, PIM-SM
Source-based Trees Router S Source R Receiver R R R S S R
A Shared Tree Router S Source R Receiver R R RP R S S R
Shared vs. Source-Based Trees Shortest path trees – low delay, better load distribution More state at routers (per-source state) Efficient in dense-area multicast Shared trees Higher delay (bounded by factor of 2), traffic concentration Choice of core affects efficiency Per-group state at routers Efficient for sparse-area multicast
Distance-Vector Multicast Routing DVMRP consists of two major components: A conventional distance-vector routing protocol (like RIP) A protocol for determining how to forward multicast packets, based on the unicast routing table DVMRP router forwards a packet if The packet arrived from the link used to reach the source of the packet Reverse path forwarding check – RPF If downstream links have not pruned the tree
Example Topology G G S G
Flood with Truncated Broadcast G G S G
Prune G G Prune (s,g) Prune (s,g) S G
Graft G G G Report (g) Graft (s,g) S Graft (s,g) G
Steady State G G G S G
DVMRP limitations Like distance-vector protocols, affected by count-to-infinity and transient looping Shares the scaling limitations of RIP. New scaling limitations: (S,G) state in routers: even in pruned parts! Broadcast-and-prune has an initial broadcast. No hierarchy: flat routing domain
Multicast Backbone (MBone) An overlay network of IP multicast-capable routers using DVMRP Tools: sdr (session directory), vic, vat, wb R R R H R Host/router R H R MBone router H Physical link Tunnel Part of MBone
Transport header and data… MBone Tunnels A method for sending multicast packets through multicast-ignorant routers IP multicast packet is encapsulated in a unicast IP packet (IP-in-IP) addressed to far end of tunnel: Tunnel acts like a virtual point-to-point link Intermediate routers see only outer header Tunnel endpoint recognizes IP-in-IP (protocol type = 4) and de-capsulates datagram for processing Each end of tunnel is manually configured with unicast address of the other end IP header, dest = unicast IP header, dest = multicast Transport header and data…
Protocol Independent Multicast (PIM) Support for both shared and per-source trees Dense mode (per-source tree) Similar to DVMRP Sparse mode (shared tree) Core = rendezvous point (RP) Independent of unicast routing protocol Just uses unicast forwarding table
PIM Protocol Overview Basic protocol steps Routers with local members Join toward Rendezvous Point (RP) to join shared tree Routers with local sources encapsulate data in Register messages to RP Routers with local members may initiate data-driven switch to source-specific shortest path trees PIM v.2 Specification (RFC2362)
PIM Example: Build Shared Tree Shared tree after R1,R2 join Source 1 Join message toward RP (*,G) (*,G) RP (*,G) (*,G) (*,G) (*,G) Receiver 1 Receiver 2 Receiver 3
Data Encapsulated in Register Unicast encapsulated data packet to RP in Register Source 1 (*,G) (*,G) RP (*,G) (*,G) (*,G) (*,G) Receiver 1 Receiver 2 Receiver 3 RP de-capsulates, forwards down shared tree
RP Send Join to High Rate Source Shared tree Source 1 Join message toward S1 (S1,G) RP Receiver 1 Receiver 2 Receiver 3
Build Source-Specific Distribution Tree Shared Tree Source 1 Join messages (S1, G) (S1,G),(*,G) (S1,G),(*,G) RP (S1,G),(*,G) Receiver 1 Receiver 2 Receiver 3 Build source-specific tree for high data rate source
Forward On “Longest-match” Entry Shared Tree Source 1 Source 1 Distribution Tree (S1, G) (*, G) (S1,G),(*,G) (S1,G),(*,G) RP (S1,G),(*,G) Receiver 1 Receiver 2 Receiver 3 Source-specific entry is “longer match” for source S1 than is Shared tree entry that can be used by any source
Prune S1 off Shared Tree Source 1 Receiver 1 Receiver 2 Receiver 3 Source 1 Distribution Tree Source 1 Prune S1 RP Receiver 1 Receiver 2 Receiver 3 Prune S1 off shared tree where if S1 and RP entries differ
Reliable Multicast Transport Problems: Retransmission can make reliable multicast as inefficient as replicated unicast Ack-implosion if all destinations ack at once Source does not know # of destinations “Crying baby”: a bad link affects entire group Heterogeneity: receivers, links, group sizes Not all multicast applications need strong reliability of the type provided by TCP. Some can tolerate reordering, delay, etc
Reliability Models Reliability => requires redundancy to recover from uncertain loss or other failure modes. Two types of redundancy: Spatial redundancy: independent backup copies Forward error correction (FEC) codes Problem: requires huge overhead, since the FEC is also part of the packet(s) it cannot recover from erasure of all packets Temporal redundancy: retransmit if packets lost/error Lazy: trades off response time for reliability Design of status reports and retransmission optimization (see next slide) important