Introduction to IP Multicast David Meyer Cisco Systems 0940_03F8_c1 NW97_US_106 3 3
Agenda IP Multicast Technology and Concepts IP Multicast Host-to-Router Protocols IP Multicast Routing Protocols Protocol Independent Multicast—PIM General Multicast Concepts Inter-domain Multicast Routing Issues and Solutions
Introduction to Multicast Why multicast? When sending same data to multiple receivers Better bandwidth utilization Lesser host/router processing Receivers’ addresses unknown Applications Video/audio conferencing Resource discovery/service advertisement Stock distribution Eg. Vat, Vic, IP/TV, Pointcast
Unicast/Multicast Unicast Host Router Multicast Host Router
IP Multicast Service Model RFC 1112 Each multicast group identified by a class D IP address Members of the group could be present anywhere in the Internet Members join and leave the group and indicate this to the routers Senders and receivers are distinct: i.e., a sender need not be a member Routers listen to all multicast addresses and use multicast routing protocols to manage groups
IP Multicast Service Model (Cont.) IP group addresses Class D address—high-order 3 bits are set (224.0.0.0) Range from 224.0.0.0 through 239.255.255.255 Well known addresses designated by IANA Reserved use: 224.0.0.0 through 224.0.0.255 224.0.0.1—all multicast systems on subnet 224.0.0.2—all routers on subnet Transient addresses, assigned and reclaimed dynamically Global scope: 224.0.1.0-238.255.255.255 Limited scope: 239.0.0.0-239.255.255.255 Site-local scope: 239.253.0.0/16 Organization-local scope: 239.192.0.0/14
IP Multicast Service Model (Cont.) Mapping IP group addresses to data link multicast addresses RFC 1112 defines OUI 0x01005e Low-order 23-bits of IP address map into low-order 23 bits of IEEE address (eg. 224.2.2.2–01005e.020202) Ethernet and FDDI use this mapping Token Ring uses functional address-c000.4000.0000
IP Multicast Service Model (Cont.) Host-to-Router Protocols (IGMP) Hosts Routers Multicast Routing Protocols (PIM)
Internet Group Management Protocol—IGMP How hosts tell routers about group membership Routers solicit group membership from directly connected hosts RFC 1112 specifies first version of IGMP IGMP v2 and IGMP v3 enhancements Supported on UNIX systems, PCs, and MACs 0940_03F8_c1 NW97_US_106 9
Internet Group Management Protocol—IGMP IGMP v1 Queries Querier sends IGMP query messages to 224.0.0.1 with ttl = 1 One router on LAN is designated/elected to send queries Query interval 60–120 seconds Reports IGMP report sent by one host suppresses sending by others Restrict to one report per group per LAN Unsolicited reports sent by host, when it first joins the group 0940_03F8_c1 NW97_US_106 10
Periodically Sends IGMP Query to 224.0.0.1 IGMP—Joining a Group 224.2.0.1 224.5.5.5 224.2.0.1 224.2.0.1 Host 1 Host 2 Host 3 Sends Report to 224.2.0.1 Sends Report to 224.5.5.5 Periodically Sends IGMP Query to 224.0.0.1
Internet Group Management Protocol—IGMP IGMP v2: Host sends leave message if it leaves the group and is the last member (reduces leave latency in comparison to v1) Router sends G-specific queries to make sure there are no members present before stopping to forward data for the group for that subnet Standard querier election IGMP v3: In design phase Enables to listen only to a specified subset of the hosts sending to the group
IGMP—Leaving a Group 224.2.0.1 224.2.0.1 224.2.0.1 Host 1 Host 2 Sends Report for 224.2.0.1 Sends Leave for 224.2.0.1 to 224.0.0.2 Sends Leave for 224.5.5.5 to 224.0.0.2 Sends Group Specific IGMP Query to 224.2.0.1 Sends Group Specific IGMP Query to 224.5.5.5
Multicast Routing Protocols (Reverse Path Forwarding) What is RPF? A router forwards a multicast datagram if received on the interface used to send unicast datagrams to the source Unicast B C Receiver Source A F D E Multicast
Multicast Routing Protocols (Reverse Path Forwarding) If the RPF check succeeds, the datagram is forwarded If the RPF check fails, the datagram is typically silently discarded When a datagram is forwarded, it is sent out each interface in the outgoing interface list Packet is never forwarded back out the RPF interface!
Multicast Routing Protocols—Characteristics Shortest Path or Source Distribution Tree Source Notation: (S, G) S = Source G = Group B A D F C E Receiver 1 Receiver 2
Multicast Routing Protocols—Characteristics Shared Distribution Tree Source 1 Notation: (*, G) * = All Sources G = Group Source 2 A B D (Shared Root) F C E Receiver 1 Receiver 2
Multicast Routing Protocols—Characteristics Distribution trees Source tree Uses more memory O(S x G) but you get optimal paths from source to all receivers, minimizes delay Shared tree Uses less memory O(G) but you may get suboptimal paths from source to all receivers, may introduce extra delay Protocols PIM, DVMRP, MOSPF, CBT
Multicast Routing Protocols—Characteristics Types of multicast protocols Dense-mode Broadcast and prune behavior Similar to radio broadcast Sparse-mode Explicit join behavior Similar to pay-per-view
Multicast Routing Protocols—Characteristics Dense-mode protocols Assumes dense group membership Branches that are pruned don’t get data Pruned branches can later be grafted to reduce join latency DVMRP—Distance Vector Multicast Routing Protocol Dense-mode PIM—Protocol Independent Multicast
Multicast Routing Protocols—Characteristics Sparse-mode protocols Assumes group membership is sparsely populated across a large region Uses either source or shared distribution trees Explicit join behavior—assumes no one wants the packet unless asked Joins propagate from receiver to source or Rendezvous Point (Sparse mode PIM) or Core (Core Based Tree)
DVMRP Broadcast and prune Uses own routing table Many implementations Source trees created on demand based on RPF rule Uses own routing table e.g., use of poison reverse Tunnels to overcome incongruent topologies Many implementations mrouted, Bay, … Cisco draft-ietf-idmr-dvmrp-v3-06.{txt,ps}
Dense Mode PIM Broadcast and prune “ideal” for dense groups Source trees created on demand based on RPF rule If the source goes inactive, the tree is torn down Fewer implementations than DVMRP Draft: draft-ietf-idmr-pim-dm-spec-05.txt
Dense Mode PIM Branches that don't care for data are pruned Grafts to join existing source tree Uses asserts to determine forwarder for multi-access LAN Prunes on non-RPF P2P links Rate-limited prunes on RPF P2P links
Dense Mode PIM Example Source Link Data Control D F I B C A E G H Receiver 1 Receiver 2
Dense Mode PIM Example Source Initial Flood of Data and Creation of State D F I B C A E G H Receiver 1 Receiver 2
Dense Mode PIM Example Source Prune to Non-RPF Neighbor D F I B C A E Receiver 1 Receiver 2
Dense Mode PIM Example Source C and D Assert to Determine Forwarder for the LAN, C Wins D F I B C A E G H Asserts Receiver 1 Receiver 2
Dense Mode PIM Example Source I Gets Pruned E’s Prune is Ignored G’s Prune is Overridden D F I B C A E G H Prune Join Override Prune Receiver 1 Receiver 2
Dense Mode PIM Example Source New Receiver, I Sends Graft D F I B C A H Graft Receiver 1 Receiver 2 Receiver 3
Dense Mode PIM Example Source D F I B C A E G H Receiver 1 Receiver 2
Sparse Mode PIM Explicit join model Receivers join to the Rendezvous Point (RP) Senders register with the RP Data flows down the shared tree and goes only to places that need the data from the sources Last hop routers can join source tree if the data rate warrants by sending joins to the source RPF check for the shared tree uses the RP RPF check for the source tree uses the source
Sparse Mode PIM Only one RP is chosen for a particular group RP statically configured or dynamically learned (Auto-RP, PIM v2 candidate RP advertisements) Data forwarded based on the source state (S, G) if it exists, otherwise use the shared state (*, G) Draft: draft-ietf-idmr-pim-sm-specv2-00.txt Draft: draft-ietf-idmr-pim-arch-04.txt
Sparse Mode PIM Example Link Data Control Source A B RP D C E Receiver 1 Receiver 2
Sparse Mode PIM Example Receiver 1 Joins Group G C Creates (*, G) State, Sends (*, G) Join to the RP Source A B RP D Join C E Receiver 1 Receiver 2
Sparse Mode PIM Example RP Creates (*, G) State Source A B RP D C E Receiver 1 Receiver 2
Sparse Mode PIM Example Source Sends Data A Sends Registers to the RP Source Register A B RP D C E Receiver 1 Receiver 2
Sparse Mode PIM Example RP de-encapsulates Registers Forwards Data Down the Shared Tree Sends Joins Towards the Source Source Join Join A B RP D C E Receiver 1 Receiver 2
Sparse Mode PIM Example RP Sends Register-Stop Once Data Arrives Natively Source Register-Stop A B RP D C E Receiver 1 Receiver 2
Sparse Mode PIM Example C Sends (S, G) Joins to Join the Shortest Path (SPT) Tree Source A B RP D (S, G) Join C E Receiver 1 Receiver 2
Sparse Mode PIM Example When C Receives Data Natively, It Sends Prunes Up the RP tree for the Source. RP Deletes (S, G) OIF and Sends Prune Towards the Source Source (S, G) Prune A B RP D (S, G) RP Bit Prune C E Receiver 1 Receiver 2
Sparse Mode PIM Example New Receiver 2 Joins E Creates State and Sends (*, G) Join Source A B RP D (*, G) Join C E Receiver 1 Receiver 2
Sparse Mode PIM Example C Adds Link Towards E to the OIF List of Both (*, G) and (S, G) Data from Source Arrives at E Source A B RP D C E Receiver 1 Receiver 2
Sparse Mode PIM Example New Source Starts Sending D Sends Registers, RP Sends Joins RP Forwards Data to Receivers through Shared Tree Source Register Source 2 A B RP D C E Receiver 1 Receiver 2
Inter-domain Multicast Routing 1037_03F8_c2 NW98_US_114 3 3
Agenda Introduction First Some Basic Technology Basic Host Model Basic Router Model Data Distribution Concepts What Are the Deployment Obstacles What Are the Non-technical Issues What Are the Technical Scaling Issues 2
Agenda (Cont.) Potential Solutions (Cisco Specific) Industry Solutions Multi-level RP, Anycast Clusters, MSDP Using Directory Services Industry Solutions BGMP and MASC Possible Deployment Scenarios References
Introduction—Level Set This presentation focuses on large-scale multicast routing in the Internet The problems/solutions presented are related to inter-ISP deployment of IP multicast We believe the current set of deployed technology is sufficient for enterprise environments
Introduction—Why Would You Want to Deploy IP Multicast? You don’t want the same data traversing your links many times— bandwidth saver You want to join and leave groups dynamically without notifying all data sources—pay-per-view
Introduction—Why Would You Want to Deploy IP Multicast? You want to discover a resource but don’t know who is providing it or if you did, don’t want to configure it— expanding ring search Reduce startup latency for subscribers
Introduction—Why Would ISPs Want to Deploy IP Multicast? Internet Service Providers are seeing revenue potential for deploying IP multicast Initial applications Radio station transmissions Real-time stock quote service Future applications Distance learning Entertainment
Basic Host Model Strive to make the host model simple When sourcing data, just send the data Map network layer address to link layer address Routers will figure out where receivers are and are not When receiving data, need to perform two actions Tell routers what group you’re interested in (via IGMP) Tell your LAN controller to receive for link-layer mapped address
Basic Host Model Hosts can be receivers and not send to the group Hosts can send but not be receivers of the group Or they can be both
Basic Host Model There are some protocol and architectural issues Multiple IP group addresses map into a single link-layer address You need IP-level filtering Hosts join groups, which means they receive traffic from all sources sending to the group Wouldn’t it be better if hosts could say what sources they were willing to receive from
Basic Host Model There are some protocol and architectural issues (continued) You can access control sources but you can’t access control receivers in a scalable way
Basic Router Model Since hosts can send any time to any group, routers must be prepared to receive on all link-layer group addresses And know when to forward or drop packets
Basic Router Model What does a router keep track of? interfaces leading to receivers sources when utilizing source distribution trees prune state depending on the multicast routing protocol (e.g. Dense Mode)
Data Distribution Concepts Routers maintain state to deliver data down a distribution tree Source trees Router keeps (S,G) state so packets can flow from the source to all receivers Trades off low delay from source against router state
Data Distribution Concepts Shared trees Router keeps (*,G) state so packets flow from the root of the tree to all receivers Trades off higher delay from source against less router state
Data Distribution Concepts How is the tree built? On demand, in response to data arrival Dense-mode protocols (PIM-DM and DVMRP) MOSPF Explicit control Sparse-mode protocols (PIM-SM and CBT)
Data Distribution Concepts Building distribution trees requires knowledge of where members are flood data to find out where members are not (Dense-mode protocols) flood group membership information (MOSPF), and build tree as data arrives send explicit joins and keep join state (Sparse-mode protocols)
Data Distribution Concepts Construction of source trees requires knowledge of source locations In dense-mode protocols you learn them when data arrives (at each depth of the tree) Same with MOSPF In sparse-mode protocols you learn them when data arrives on the shared tree (in leaf routers only) Ignore since routing based on direction from RP Pay attention if moving to source tree
Data Distribution Concepts To build a shared tree you need to know where the core (RP) is Can be learned dynamically in the routing protocol (Auto-RP, PIMv2) Can be configured in the routers Could use a directory service
Data Distribution Concepts Source trees make sense for Broadcast radio transmissions Expanding ring search Generic few-sources-to-many-receiver applications High-rate, low-delay application requirements Per source policy from a service provider’s point of view Per source access control
Data Distribution Concepts Shared trees make sense for Many low-rate sources Applications that don’t require low delay Consistent policy and access control across most participants in a group When most of the source trees overlap topologically with the shared tree
Deployment Obstacles— Non-Technical Issues How to bill for the service Is the service what runs on top of multicast? Or is it the transport itself? Do you bill based on sender or receiver, or both? How to control access Should sources be rate-controlled (unlike unicast routing) Should receivers be rate-controlled?
Deployment Obstacles— Non-Technical Issues Making your peers fan out instead of you (save replication in your network) Closest exit vs latest entrance — all a wash Multicast-related security holes Network-wide denial of service attacks Eaves-dropping simpler since receivers are unknown
Deployment Obstacles— Technical Issues Source tree state will become a problem as IP multicast gains popularity When policy and access control per source are the rule rather than the exception Group state will become a problem as IP multicast gains popularity 10,000 three member groups across the Internet
Deployment Obstacles— Technical Issues Hopefully we can upper bound the state in routers based on their switching capacity
Deployment Obstacles— Technical Issues ISPs don’t want to depend on competitor’s RP Do we connect shared trees together? Do we have a single shared tree across domains? Do we use source trees only for inter-domain groups?
Deployment Obstacles— Technical Issues Unicast and multicast topologies may not be congruent across domains Due to physical/topological constraints Due to policy constraints Need inter-domain routing protocol that distinguishes unicast versus multicast policy
How to Control Multicast Routing Table State in the Network? Fundamental problem of learning group membership Flood and Prune DVMRP PIM-DM Broadcast Membership MOSPF DWRs Rendezvous Mechanism PIM-SM BGMP
Rendezvous Mechanism Why not use sparse-mode PIM? Where to put root of shared tree (RP) ISP third-party RP problem If you did use sparse-mode PIM Group-to-RP mappings would have to be distributed throughout the Internet
Rendezvous Mechanism Lets try using sparse-mode PIM for inter-domain multicast Four possibilities for distributing group-to-RP mappings (1) Multi-level RP (2) Anycast clusters (3) MSDP (4) Directory service
Group-to-RP mapping distribution: (1) Multi-level RP Idea: connect shared trees in hierarchy Level-0 RPs are inside domains They propagate joins from downstream routers to a Level-1 RP that may be in another domain Level-0 shared trees connected via a Level-1 RP If multiple Level-1 RPs, iterate up to Level-2 RPs
Group-to-RP mapping distribution: (1) Multi-Level RP Problems Requires PIM protocol changes If you don’t locate the Level-0 RP at the border, intermediate PIM routers think there may be two RPs for the group Still has the third-party problem, there is ultimately one node at the root of the hierarchy Data has to flow all the way to the highest- level RP
Group-to-RP mapping distribution: (2) Anycast clusters Idea: connect shared trees in clusters Shares burden among ISPs Each RP in each domain is a border router Build RP clusters at interconnect points (or in dense-mode clouds) Group allocation is per cluster, not per user or per domain
Group-to-RP mapping distribution: (2) Anycast clusters Closest border router in cluster is used as the RP Routers within a domain will use that domain’s RP Provided you have an RP for that group range at an interconnect point If not, you use the closest RP at the interconnect point (could be RP in another domain)
Group-to-RP mapping distribution: (3) MSDP Idea: connect domains together If you can’t connect shared trees together easily, then don’t Multicast Source Discovery Protocol Rather than connecting trees, connect sources known to all trees
Group-to-RP mapping distribution: (3) MSDP An RP in a domain has a MSDP peering session with an RP in another domain Runs over TCP Source Active (SA) messages indicate active sending sources in a domain Logical topology is built for the sole purpose to distribute SA messages
Group-to-RP mapping distribution: (3) MSDP How it works Source goes active in PIM-SM domain Its packets get PIM registered to domain’s RP RP sends SA message to its MSDP peers Those peers forward the SA to their peers away from the originating RP If a peer in another domain has receivers for the group to which the source is sending, it joins the source (Flood-and-Join model)
Group-to-RP mapping distribution: (3) MSDP No shared tree across domains So each domain can depend solely on its own RP (no third-party problem) Do not need to store SA state at each MSDP peer Could encapsulate data in SA messages for low-rate bursty sources Could cache SA state to speed up join latency
Group-to-RP mapping distribution: (4) Directory Services Idea: enable single shared tree across domains, or use source tree only
Group-to-RP mapping distribution: (4) Directory services a) single shared tree across domains Put RP in client’s domain Optimal placement of the RP if the domain had a multicast source or receiver active Policy for RP is consistent with policy for domain’s unicast prefixes Use directory to find RP address for a given group
Group-to-RP mapping distribution: (4) Directory Services Example Receiver host sends IGMP report for 224.1.1.1 First-hop router DNS resolves 1.1.1.224.pim.mcast.net A record returned with IP address of RP First-hop router sends PIM join toward RP
Group-to-RP mapping distribution: (4) Directory Services All routers have consistent RP addresses via DNS When dynamic DNS is widely deployed it will be easier to change A records In the meantime, use loopback addresses on routers and move them around in your domain
Group-to-RP mapping distribution: (4) Directory Services When domain group allocation exists, a domain can be authoritative for a DNS zone 1.224.pim.mcast.net 128/17.1.224.pim.mcast.net
Group-to-RP mapping distribution: (4) Directory Services (b) avoid using shared trees at all Build PIM-SM source trees across domains Put multiple A records in DNS to describe sources for the group 1.0.2.224.sources.pim.mcast.net IN CNAME dino-ss20 IN CNAME jmeylor-sun dino-ss20 IN A 171.69.58.81 jmeylor-sun IN A 171.69.127.178
Standards Solutions Ultimate scalability of both routing and group allocation can be achieved using BGMP/MASC Use BGP4+ (MBGP) to deal with topology non-congruency
Border Gateway Multicast Protocol (BGMP) Use a PIM-like protocol between domains (“BGP for multicast”) BGMP builds shared tree of domains for a group So we can use a rendezvous mechanism at the domain level Shared tree is bidirectional Root of shared tree of domains is at root domain
Border Gateway Multicast Protocol (BGMP) Runs in routers that border a multicast routing domain Runs over TCP like BGP Joins and prunes travel across domains Can build unidirectional source trees MIGP tells the borders about group membership
Multicast Address Set Claim (MASC) How does one determine the root domain for a given group? Group prefixes are temporarily leased to domains Allocated by ISP who in turn received them from its upstream provider
Multicast Address Set Claim (MASC) Claims for group allocation resolve collisions Group allocations are advertised across domains Lots of machinery for aggregating group allocations
Multicast Address Set Claim (MASC) Tradeoff between aggregation and anticipated demand for group addresses Group prefix allocations are not assigned to domains—they are leased Applications must know that group addresses may go away Work in progress
Using BGP4+ (MBGP) for Non-Congruency Issues Multiprotocol extensions to BGP4— RFC 2283 MBGP allows you to build a unicast RIB and multicast RIB independently with one protocol Can use the existing or new (mulitcast) peering topology MBGP carries unicast prefixes of multicast capable sources
MBGP Deployment Scenarios 1) Single public interconnect for ISPs to multicast peer Each ISP administers their own RP at the interconnect That RP as well as all border routers run MBGP Interconnect runs dense-mode PIM Each ISP runs PIM-SM internally
MBGP Deployment Scenarios 2) multiple interconnect points between ISPs ISPs can multicast peer for any groups as long as their respective RPs are colocated on the same interconnect Else use MSDP so that sources known to RPs at a given interconnect can tell RPs at other interconnects where to join
MBGP Deployment Scenarios 3) address range depends on DNS to rendezvous or build trees ISPs decide which domains will have RPs that they will administer ISPs decide which groups will use source trees and don’t need RPs ISPs administer DNS databases