Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measuring a (MapReduce) Data Center Srikanth KandulaSudipta SenguptaAlbert Greenberg Parveen Patel Ronnie Chaiken.

Similar presentations


Presentation on theme: "Measuring a (MapReduce) Data Center Srikanth KandulaSudipta SenguptaAlbert Greenberg Parveen Patel Ronnie Chaiken."— Presentation transcript:

1 Measuring a (MapReduce) Data Center Srikanth KandulaSudipta SenguptaAlbert Greenberg Parveen Patel Ronnie Chaiken

2 ……… … … … Aggregation Switches Top-of-rack Switch Servers 24-, 48- port 1G to server, 10Gbps up ~ $7K Modular switch Chassis + up to 10 blades >140 10G ports $150K-$200K ToR Agg Typical Data Center Network IP Routers Less bandwidth up the hierarchy Clunky routing e.g., VL2, BCube, FatTree, Portland, DCell

3 What does traffic in a datacenter look like? A realistic model of data center traffic Compare proposals How to measure a datacenter? (Macro-) Who talks to whom? Congestion, its impact (Micro-) Flow details: Sizes, Durations, Inter-arrivals, flux How to measure a datacenter? (Macro-) Who talks to whom? Congestion, its impact (Micro-) Flow details: Sizes, Durations, Inter-arrivals, flux Goal

4 How to measure? ……… … … … 1.SNMP reports per port: in/out octets sample every few minutes miss server- or flow- level info 2.Packet Traces Not native on most switches Hard to set up (port-spans) 3.Sampled NetFlow Use the end-hosts to share load Tradeoff: CPU overhead on switch for detailed traces Auto managed already ToR Agg. Switches Servers Router MapReduce Scripts Distr. FS + = Measured 1500 servers for several months

5 Server From Server To 1Gbps.4 Gbps 3 Mbps 20 Kbps.2 Kbps 0 Who Talks To Whom? Two patterns dominate Most of the communication happens within racks Scatter, Gather Two patterns dominate Most of the communication happens within racks Scatter, Gather

6 Flows are small. 80% of bytes in flows < 200MB are short-lived. 50% of bytes in flows < 25s turnover quickly. median inter-arrival at ToR = 10 -2 s Flows which lead to… Traffic Engineering schemes should react faster, few elephants Localized traffic  additional bandwidth alleviates hotspots

7 Congestion, its Impact are links busy? who are the culprits? are apps impacted? Contiguous Duration of >70% link utilization (seconds) 1.8.6.4.2 0Often!

8 Congestion, its Impact are links busy? who are the culprits? are apps impacted? Apps (Extract, Reduce) Marginally Often!

9 Measurement Alternatives Link Utilizations (e.g., from SNMP) Tomography Server 2 Server Traffic Matrix + make do with easier-to-measure data – under-constrained problem  heuristics a)gravity

10 Measurement Alternatives Link Utilizations (e.g., from SNMP) Tomography Server 2 Server Traffic Matrix + make do with easier-to-measure data – under-constrained problem  heuristics a)gravity b)max sparse

11 Measurement Alternatives Link Utilizations (e.g., from SNMP) Tomography Server 2 Server Traffic Matrix + make do with easier-to-measure data – under-constrained problem  heuristics a)gravity b)max sparsec)tomography + Job Information

12 a first look at traffic in a (map-reduce) data center some insights traffic stays mostly within high bandwidth regions flows are small, short-lived and turnover quickly net highly-utilized often with moderate impact on apps. measuring @ end-hosts is feasible, necessary (?) → a model for data center traffic


Download ppt "Measuring a (MapReduce) Data Center Srikanth KandulaSudipta SenguptaAlbert Greenberg Parveen Patel Ronnie Chaiken."

Similar presentations


Ads by Google