Download presentation
Presentation is loading. Please wait.
Published byJohn Cowan Modified over 11 years ago
1
Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM Research Haifa HotNets, October 5, 2008
2
IP Multicast in Data Centers IPMC is not used in data centers
3
IP Multicast in Data Centers IPMC is not used in data centers Would speed up products that use multicast
4
IP Multicast in Data Centers Why is IP multicast rarely used?
5
IP Multicast in Data Centers Why is IP multicast rarely used? o Limited IPMC scalability on switches/routers and NICs
6
IP Multicast in Data Centers Why is IP multicast rarely used? o Limited IPMC scalability on switches/routers and NICs o Broadcast storms: Loss triggers a horde of NACKs, which triggers more loss, etc. o Disruptive even to non-IPMC applications.
7
IP Multicast in Data Centers IP multicast has a bad reputation
8
IP Multicast in Data Centers IP multicast has a bad reputation o Works great up to a point, after which it breaks catastrophically
9
IP Multicast in Data Centers Bottom line: o Administrators have no control over multicast use... o Without control, they opt for never.
11
Dr. Multicast
12
Dr. Multicast (MCMD) Policy: Permits data center operators to selectively enable and control IPMC Transparency: Standard IPMC interface, system calls are overloaded. Performance: Uses IPMC when possible, otherwise point-to-point unicast Robustness: Distributed, fault-tolerant service
13
Terminology Process: Application that joins logical IPMC groups Logical IPMC group: A virtualized abstraction Physical IPMC group: As usual UDP multi-send: New kernel-level system-call Collection: Set of logical IPMC groups with identical membership
14
Acceptable Use Policy Assume a higher-level network management tool compiles policy into primitives Explicitly allow a process to use IPMC groups o allow-join(process,logical IPMC) o allow-send(process,logical IPMC) UDP multi-send always permitted Additional restraints o max-groups(process,limit) o force-udp(process,logical IPMC)
15
Overview Library module Mapping module Gossip layer Optimization questions Results
16
Transparent. Overloads the IPMC functions o setsockopt(), send(), etc. Translation. Logical IPMC map to a set of P-IPMC/unicast addresses. o Two extremes MCMD Library Module
17
MCMD Agent runs on each machine o Contacted by the library modules o Provides a mapping One agent elected to be a leader: o Allocates IPMC resources according to the current policy MCMD Mapping Role
18
Allocating IPMC resources: An optimization problem Procs L-IPMC MCMD Mapping Role This box intentionally left BLACK Procs Collections L-IPMC
19
Runs system-wide as part of the agent Automatic failure detection Group membership fully replicated via gossip o Node reports its own state o Future: Replicate more selectively o Leader runs optimization algorithm on data and reports the mapping MCMD Gossip Layer
20
But gossip is slow... Implications: o Slow propagation of group membership o Slow propagation of new maps o We assume a low rate of membership churn Remedy: Broadcast module o Leader broadcasts urgent messages o Bounded bandwidth of urgent channel o Trade-off between latency and scalability MCMD Gossip Layer
21
Overview Library module Mapping module Gossip layer Optimization questions Results
22
Optimization Questions Procs L-IPMC BLACK Collections Procs L-IPMC First step: compress logical IPMC groups
23
klk;l Optimization Questions How compressible are subscriptions? o Multi-objective optimization: Minimize number of collections Minimize bandwidth overhead on network o Thm: The general problem is NP-complete o Thm: In uniform random allocation, "little" compression opportunity. o Social preferences o Lots of duplicates due to replication (e.g. for load balancing)
24
klk;l Optimization Questions Which collections get an IPMC address? o Thm: Ordered by decreasing traffic*size, assign P-IPMC addresses greedily, we minimize bandwidth. Tiling heuristic: o Sort L-IPMC by traffic*size o Greedily collapse identical groups o Assign IPMC to collections in reverse order of traffic*size, UDP-multisend to the rest Building tilings incrementally
25
klk;l Experimental Results
26
Insignificant overhead when mapping L- IPMC to P-IPMC. klk;l Overhead (max. throughput)
27
Insignificant overhead when mapping L- IPMC to P-IPMC. klk;l Overhead (CPU utilization)
28
klk;l Network Overhead Gossip Layer uses constant background bandwidth, urgent channel behaves well
29
Latency Latency of propagation of joins/leaves and new maps
30
A malfunctioning node bombards an existing IPMC group. MCMD policy prevents ill-effects klk;l Policy control < Traffic starts <New policy
31
Conclusion IPMC has been a bad citizen...
32
Conclusion IPMC has been a bad citizen... Dr. Multicast has the cure! Opportunity for big performance enhancements and policy control.
33
Thank you!
35
Insignificant overhead when mapping L-IPMC to P-IPMC. klk;l Overhead
36
A malfunctioning node bombards an existing IPMC group. MCMD policy prevents ill-effects klk;l Policy control
37
A malfunctioning node bombards an existing IPMC group. MCMD policy prevents ill-effects klk;l Policy control
38
Linux kernel module increases UDP-multisend throughput by 17% (compared to user-space UDP-multisend) klk;l Overhead
39
klk;l Latency of events Gossip: 99% of nodes aware of change within 9 epochs (now 1 sec)
40
Conclusions Policy: Allows data center operators to enable and control IPMC Transparency: Standard IPMC interface, system calls are overloaded. Performance: Uses IPMC when possible, otherwise point-to-point UDP Robustness: Distributed, fault-tolerant service
41
klk;l Results Library Module o Insignificant slowdown o Linux Kernel module provides 17% speed-up for UDP multi-send
42
klk;l Optimization questions Users Topics This box intentionally left BLACK Users Groups Topics Multi-objective: o Minimize number of groups o Minimize bandwidth overhead on network Thm: This problem is NP-complete o Reduction to Minimum Normal Set Basis
43
MCMD Library Layer Overloads the IPMC functions o setsockopt(), send(), etc. Translates logical IPMC addresses to physical IPMC, or point-to-point UDP packets depending on policy Notifies MCMD immediately about joins/leaves Learns about new mappings from MCMD Keeps statistics about group traffic rates
44
MCMD Library Layer Overloads the IPMC functions o setsockopt(), send(), etc. Translates logical IPMC addresses to physical IPMC, or point-to-point UDP packets depending on policy Caches translation maps Maintains a connection to MCMD for updates
46
Overview Library module Mapping module Gossip layer Optimization questions Results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.