Measuring a (MapReduce) Data Center Srikanth KandulaSudipta SenguptaAlbert Greenberg Parveen Patel Ronnie Chaiken.

Slides:



Advertisements
Similar presentations
Data Center Networking with Multipath TCP
Advertisements

Improving Datacenter Performance and Robustness with Multipath TCP
The VPN-Alyzer When Collecting SNMP and Netflow isnt practical.
SDN + Storage.
PARIS: ProActive Routing In Scalable Data Centers Dushyant Arora, Theophilus Benson, Jennifer Rexford Princeton University.
PRESENTED BY: TING WANG PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric Radhika Niranjan Mysore, Andreas Pamboris, Nathan.
1 Exploring Efficient and Scalable Multicast Routing in Future Data Center Networks Dan Li, Jiangwei Yu, Junbiao Yu, Jianping Wu Tsinghua University Presented.
Improving Datacenter Performance and Robustness with Multipath TCP Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik,
Utilizing Datacenter Networks: Dealing with Flow Collisions Costin Raiciu Department of Computer Science University Politehnica of Bucharest.
Measurement in Networks & SDN Applications. Interesting Questions Who is sending a lot to a subnet? – Heavy Hitters Is someone doing a port Scan? Is someone.
Enabling Flow-level Latency Measurements across Routers in Data Centers Parmjeet Singh, Myungjin Lee Sagar Kumar, Ramana Rao Kompella.
Architecting the Network Part 4 Geoff Huston Chief Scientist, Internet
Profiling Network Performance in Multi-tier Datacenter Applications
Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.
Datacenter Network Topologies
Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,
1 13-Jun-15 S Ward Abingdon and Witney College LAN design CCNA Exploration Semester 3 Chapter 1.
Traffic Engineering for ISP Networks Jennifer Rexford Internet and Networking Systems AT&T Labs - Research; Florham Park, NJ
Impact of BGP Dynamics on Intra-Domain Traffic Patterns in the Sprint IP Backbone Sharad Agarwal, Chen-Nee Chuah, Supratik Bhattacharyya, Christophe Diot.
Data Center Traffic and Measurements Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking.
ProActive Routing In Scalable Data Centers with PARIS Joint work with Dushyant Arora + and Jennifer Rexford* + Arista Networks *Princeton University Theophilus.
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Presented by Gregory Peaker and Tyler Maclean.
Netflow and Botnets Steven M. Bellovin Columbia University 1smb.
Michael Over.  Which devices/links are most unreliable?  What causes failures?  How do failures impact network traffic?  How effective is network.
Jennifer Rexford Fall 2010 (TTh 1:30-2:50 in COS 302) COS 561: Advanced Computer Networks Data.
Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.
A Scalable, Commodity Data Center Network Architecture.
Demystifying and Controlling the Performance of Data Center Networks.
On a New Internet Traffic Matrix (Completion) Problem
OpenFlow Switch Limitations. Background: Current Applications Traffic Engineering application (performance) – Fine grained rules and short time scales.
Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency Interpolation Myungjin Lee †, Nick Duffield‡, Ramana Rao Kompella†
Tomo-gravity Yin ZhangMatthew Roughan Nick DuffieldAlbert Greenberg “A Northern NJ Research Lab” ACM.
Identifying and Using Energy Critical Paths Nedeljko Vasić with Dejan Novaković, Satyam Shekhar, Prateek Bhurat, Marco Canini, and Dejan Kostić EPFL, Switzerland.
Network Sharing Issues Lecture 15 Aditya Akella. Is this the biggest problem in cloud resource allocation? Why? Why not? How does the problem differ wrt.
Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel, Ronnie Chaiken Microsoft Research IMC November, 2009 Abhishek Ray
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 8 – Denial of Service.
Chapter 4. After completion of this chapter, you should be able to: Explain “what is the Internet? And how we connect to the Internet using an ISP. Explain.
Routing & Architecture
Copyright © 2011, Programming Your Network at Run-time for Big Data Applications 張晏誌 指導老師:王國禎 教授.
Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.
David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:
DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang.
© Copyright 2010 Hewlett-Packard Development Company, L.P. 1 Jayaram Mudigonda, HP Labs Praveen Yalagandula, HP Labs Mohammad Al-Fares, UCSD Jeff Mogul,
A Survey on Optical Interconnects for Data Centers Speaker: Shih-Chieh Chien Adviser: Prof Dr. Ho-Ting Wu.
Traffic Engineering for ISP Networks Jennifer Rexford Internet and Networking Systems AT&T Labs - Research; Florham Park, NJ
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
VL2: A Scalable and Flexible Data Center Network Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David.
Demystifying and Controlling the Performance of Big Data Jobs Theophilus Benson Duke University.
Datacenter Network Simulation using ns3
1 Network Measurements and Sampling Nick Duffield, Carsten Lund, and Mikkel Thorup AT&T Labs-Research, Florham Park, NJ.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
GreenCloud: A Packet-level Simulator of Energy-aware Cloud Computing Data Centers Dzmitry Kliazovich ERCIM Fellow University of Luxembourg Apr 16, 2010.
Network Traffic Characteristics of Data Centers in the Wild
Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research.
MMPTCP: A Multipath Transport Protocol for Data Centres 1 Morteza Kheirkhah University of Edinburgh, UK Ian Wakeman and George Parisis University of Sussex,
R2C2: A Network Stack for Rack-scale Computers Paolo Costa, Hitesh Ballani, Kaveh Razavi, Ian Kash Microsoft Research Cambridge EECS 582 – W161.
Scalable Congestion Control Protocol based on SDN in Data Center Networks Speaker : Bo-Han Hua Professor : Dr. Kai-Wei Ke Date : 2016/04/08 1.
MOZART: Temporal Coordination of Measurement (SOSR’ 16)
VL2: A Scalable and Flexible Data Center Network
SDN Traffic Engineering with Segment Routing The Next Evolution
Chen Qian, Xin Li University of Kentucky
Impact of New CC on Cross Traffic
Improving Datacenter Performance and Robustness with Multipath TCP
Network and Services Management
Improving Datacenter Performance and Robustness with Multipath TCP
Dingming Wu+, Yiting Xia+*, Xiaoye Steven Sun+,
Outline Overview of IP History of the Internet - 3-May-19
In-network computation
Data Center Traffic Engineering
Presentation transcript:

Measuring a (MapReduce) Data Center Srikanth KandulaSudipta SenguptaAlbert Greenberg Parveen Patel Ronnie Chaiken

……… … … … Aggregation Switches Top-of-rack Switch Servers 24-, 48- port 1G to server, 10Gbps up ~ $7K Modular switch Chassis + up to 10 blades >140 10G ports $150K-$200K ToR Agg Typical Data Center Network IP Routers Less bandwidth up the hierarchy Clunky routing e.g., VL2, BCube, FatTree, Portland, DCell

What does traffic in a datacenter look like? A realistic model of data center traffic Compare proposals How to measure a datacenter? (Macro-) Who talks to whom? Congestion, its impact (Micro-) Flow details: Sizes, Durations, Inter-arrivals, flux How to measure a datacenter? (Macro-) Who talks to whom? Congestion, its impact (Micro-) Flow details: Sizes, Durations, Inter-arrivals, flux Goal

How to measure? ……… … … … 1.SNMP reports per port: in/out octets sample every few minutes miss server- or flow- level info 2.Packet Traces Not native on most switches Hard to set up (port-spans) 3.Sampled NetFlow Use the end-hosts to share load Tradeoff: CPU overhead on switch for detailed traces Auto managed already ToR Agg. Switches Servers Router MapReduce Scripts Distr. FS + = Measured 1500 servers for several months

Server From Server To 1Gbps.4 Gbps 3 Mbps 20 Kbps.2 Kbps 0 Who Talks To Whom? Two patterns dominate Most of the communication happens within racks Scatter, Gather Two patterns dominate Most of the communication happens within racks Scatter, Gather

Flows are small. 80% of bytes in flows < 200MB are short-lived. 50% of bytes in flows < 25s turnover quickly. median inter-arrival at ToR = s Flows which lead to… Traffic Engineering schemes should react faster, few elephants Localized traffic  additional bandwidth alleviates hotspots

Congestion, its Impact are links busy? who are the culprits? are apps impacted? Contiguous Duration of >70% link utilization (seconds) Often!

Congestion, its Impact are links busy? who are the culprits? are apps impacted? Apps (Extract, Reduce) Marginally Often!

Measurement Alternatives Link Utilizations (e.g., from SNMP) Tomography Server 2 Server Traffic Matrix + make do with easier-to-measure data – under-constrained problem  heuristics a)gravity

Measurement Alternatives Link Utilizations (e.g., from SNMP) Tomography Server 2 Server Traffic Matrix + make do with easier-to-measure data – under-constrained problem  heuristics a)gravity b)max sparse

Measurement Alternatives Link Utilizations (e.g., from SNMP) Tomography Server 2 Server Traffic Matrix + make do with easier-to-measure data – under-constrained problem  heuristics a)gravity b)max sparsec)tomography + Job Information

a first look at traffic in a (map-reduce) data center some insights traffic stays mostly within high bandwidth regions flows are small, short-lived and turnover quickly net highly-utilized often with moderate impact on apps. end-hosts is feasible, necessary (?) → a model for data center traffic