1 Sonia Fahmy and Minseok Kwon Department of Computer Sciences Purdue University For slides, technical reports, and implementations, please see: Characterizing Overlay Multicast Networks
2 Why Overlays? Overlay networks help overcome deployment barriers to network-level solutions The advantages of overlays include flexibility, adaptivity, and ease of deployment Applications Application-level multicast (e.g., End System Multicast/Narada) Inter-domain routing pathology solutions (e.g., Resilient Overlay Networks) Content distribution Peer-to-peer networks
3 Overlay Multicast Overlay link Source Routers and underlying links Receivers
4 Why Characterize Overlays? Overlay multicast consumes additional network bandwidth and increases latency over IP multicast quantify the overlay performance penalty Little work has been done on characterizing overlay multicast tree structure, especially large trees Such characterization gives insight into overlay properties and their causes, and a deeper understanding of different overlay multicast approaches better overlay design Real data from ESM experiments Simulations Analytical models Characterizing Overlay Networks
5 Our Hypothesis Observations Many high degree high bandwidth routers heavily utilized in upper levels of ESM/TAG trees, which tend to be longer. Many hosts are connected to lower degree low bandwidth routers, clustered close together at lower levels of the trees. This lowers multicast cost Causes Topology (power-law/small-world) Overlay host distribution Overlay protocol (full/partial info/overhead, delay/bandwidth/diameter/degree, source- based/shared tree)
6 Overlay Tree Metrics Overlay cost = number of underlying hops traversed by every overlay link Link stress = total number of identical copies of a packet over the same underlying link Overlay cost = ∑stress(i) for all router-to-router links i Number of hops and delays between parent and child hosts in an overlay tree Degree of hosts = host contribution to the link stress of the host-to-first-router link Degree of routers and hop-by-hop delays of underlying links traversed by overlay links Mean bottleneck bandwidth between the source and receivers Relative Delay Penalty (RDP), mean/longest latency
7 Metrics: Examples Overlay cost = 12 Link stress on A = 2 RDP of B = ( )/20 = 2 Overlay link Source Receivers A B 15 ms 10 ms 20 ms C
8 Overlay Tree Structure Questions What do overlay multicast trees look like? Why? How much additional cost do they incur over IP multicast? Methodology Use overlay trees (65 hosts) in ESM experiments (from CMU) in November Use public traceroute servers and synthesize approximate routes. (Most university hosts are connected to the Internet 2 backbone network) PlanetLab experiments and tree/traceroute data
9 Results: End System Multicast Number of hops between two hosts versus level of host in overlay trees Distributions of per-hop delay for different overlay tree levels (a) Tree level 1 (b) Tree levels 4-6
10 Overlay Tree Structure: Simulations Topologies Contains 4 thousand routers connected in ways consistent with router-level power-law and small-world properties GT-ITM topology with 4 thousand routers Delays and bandwidths according to realistic distributions Overlay multicast algorithms ESM (End System Multicast) [SIGCOMM 2001] A host has the upper degree bound (we use 6) on the number of its neighbors TAG (Topology-Aware Grouping) [extended NOSSDAV 2002] Uses ulimit=6 and bwthresh=100 kbps for partial path matching MDDBST (Minimum Diameter Degree-Bounded Spanning Tree) [NOSSDAV 2001, INFOCOM 2003] Minimizes the number of hops in the longest path, and bounds the degree of hosts in overlay trees (degree bound = edge bw/min bw)
11 Results: Number of Hops Uniform host distribution Non-uniform host distribution MDDBST less clear than ESM because it minimizes max. cost
12 Results: Isolation of Topology Effects Router degreesClustering (small world)
13 Results: Latency and Bandwidth Relative delay penalty (RDP) ESM achieves a good balance, but scalability is a concern Mean bottleneck bandwidth
14 Overlay Multicast Tree Cost Network Model L O (h,k,n) denotes overlay cost for an overlay O when n is the number of hosts We only count hops in router subsequences We use n instead of m Why an underlying tree model? Simple analysis Consistency with real topologies [Radoslavov00] Transformation from a graph to a k-ary tree with minimum cost tree Why least cost tree? Modeling and analysis are simplified Many overlay multicast algorithms optimize a delay-related metric, which is typically also optimized by underlying intra-domain routing protocols A lower bound on the overlay tree cost can be computed h k Source HostReceiver
15 Network Models with Unary Nodes Self-similar Tree Model (k=2, θ=1, h=3) Unary node with only one child Number of unary nodes created between adjacent nodes at levels i-1 and i Branching node To incorporate the number-of-hops distribution, use a self-similar tree model [SODA2002]
16 Receivers at Leaf Nodes k Source α h (a) α k Level l (b) Overlay link Receiver
17 Receivers at Leaf Nodes The overlay cost in (a): The overlay cost in (b): where if otherwise The sum of (a) and (b) n 1-θ is observed where
18 Receivers at Leaf Nodes where θ=0.15
19 Receivers at Leaf or Non-leaf Nodes α β …… …… kpk(1-p) L υ (h-1,k,n) L υ (h-2,k,n) L υ (h-3,k,n) h k(1-p) kp k(1-p) (a) α β kp k(1-p) …… … … kp Level l (A) (B) (b)
20 Receivers at Leaf or Non-leaf Nodes The overlay cost in (a): The overlay cost in (b): where The sum of (a) and (b)
21 Receivers at Leaf or Non-leaf Nodes where θ=0.15
22 Cost Model Validation The analytical results are validated using traceroute- based simulation topologies and our earlier topologies Normalized overly cost via simulations ESM and MDDBST have n 0.8 -n 0.9 ; TAG has a slightly higher cost due to partial path matching Cost with GT-ITM/uniform hosts is slightly higher than with power-law/small-world The normalized overlay tree cost for the real ESM tree is n 0.945
23 Related Work Chuang and Sirbu (1998) found that the ratio between the total number of multicast links and the average unicast path length exhibits a power-law (m 0.8 ) Chalmers and Almeroth (2001) found the ratio to be around m 0.7 and multicast trees have a high frequency of unary nodes Phillips et al.(1999), Adjih et al.(2002) and Mieghem et al.(2001) mathematically model the efficiency of IP multicast Radoslavov (2000) characterized real and generated topologies with respect to neighborhood size growth, robustness, and increase in path lengths due to link failure. They analyzed the impact of topology on heuristic overlay multicast strategies Jin and Bestavros (2002) have shown that both Internet AS-level and router-level graphs exhibit small-world behavior. They also outlined how small-world behavior affects the overlay multicast tree size Overlay multicast algorithms include End System Multicast (2000,2001), CAN-based multicast (2002), MDDBST (2001,2003), TAG (2001), etc.
24 Conclusions We have investigated the efficiency of overlay multicast using theoretical models, experimental data, and simulations. We find that: The number of routers/delay between parent and child hosts tends to decrease as the level of the host in the ESM/TAG overlay tree increaseslower cost Routing features in overlay multicast protocols, non- uniform host distribution, along with power-law and small-world topology characteristics contribute to these phenomena We can quantify potential bandwidth savings of overlay multicast compared to unicast (n 0.9 n 0.8 )
25 Ongoing Work We are conducting larger scale simulations and experimental data analysis using PlanetLab. We are examining other and more dynamic metrics with other overlay protocols, e.g., NICE, Hypercast We will precisely formulate the relationship between the overlay trees, overlay protocols and Internet topology characteristics We are investigating the possibility of inter-overlay cooperation to further reduce the overlay performance penalty