Download presentation
Presentation is loading. Please wait.
1
Measuring a (MapReduce) Data Center Srikanth KandulaSudipta SenguptaAlbert Greenberg Parveen Patel Ronnie Chaiken
2
……… … … … Aggregation Switches Top-of-rack Switch Servers 24-, 48- port 1G to server, 10Gbps up ~ $7K Modular switch Chassis + up to 10 blades >140 10G ports $150K-$200K ToR Agg Typical Data Center Network IP Routers Less bandwidth up the hierarchy Clunky routing e.g., VL2, BCube, FatTree, Portland, DCell
3
What does traffic in a datacenter look like? A realistic model of data center traffic Compare proposals How to measure a datacenter? (Macro-) Who talks to whom? Congestion, its impact (Micro-) Flow details: Sizes, Durations, Inter-arrivals, flux How to measure a datacenter? (Macro-) Who talks to whom? Congestion, its impact (Micro-) Flow details: Sizes, Durations, Inter-arrivals, flux Goal
4
How to measure? ……… … … … 1.SNMP reports per port: in/out octets sample every few minutes miss server- or flow- level info 2.Packet Traces Not native on most switches Hard to set up (port-spans) 3.Sampled NetFlow Use the end-hosts to share load Tradeoff: CPU overhead on switch for detailed traces Auto managed already ToR Agg. Switches Servers Router MapReduce Scripts Distr. FS + = Measured 1500 servers for several months
5
Server From Server To 1Gbps.4 Gbps 3 Mbps 20 Kbps.2 Kbps 0 Who Talks To Whom? Two patterns dominate Most of the communication happens within racks Scatter, Gather Two patterns dominate Most of the communication happens within racks Scatter, Gather
6
Flows are small. 80% of bytes in flows < 200MB are short-lived. 50% of bytes in flows < 25s turnover quickly. median inter-arrival at ToR = 10 -2 s Flows which lead to… Traffic Engineering schemes should react faster, few elephants Localized traffic additional bandwidth alleviates hotspots
7
Congestion, its Impact are links busy? who are the culprits? are apps impacted? Contiguous Duration of >70% link utilization (seconds) 1.8.6.4.2 0Often!
8
Congestion, its Impact are links busy? who are the culprits? are apps impacted? Apps (Extract, Reduce) Marginally Often!
9
Measurement Alternatives Link Utilizations (e.g., from SNMP) Tomography Server 2 Server Traffic Matrix + make do with easier-to-measure data – under-constrained problem heuristics a)gravity
10
Measurement Alternatives Link Utilizations (e.g., from SNMP) Tomography Server 2 Server Traffic Matrix + make do with easier-to-measure data – under-constrained problem heuristics a)gravity b)max sparse
11
Measurement Alternatives Link Utilizations (e.g., from SNMP) Tomography Server 2 Server Traffic Matrix + make do with easier-to-measure data – under-constrained problem heuristics a)gravity b)max sparsec)tomography + Job Information
12
a first look at traffic in a (map-reduce) data center some insights traffic stays mostly within high bandwidth regions flows are small, short-lived and turnover quickly net highly-utilized often with moderate impact on apps. measuring @ end-hosts is feasible, necessary (?) → a model for data center traffic
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.