Network Traffic Characteristics of Data Centers in the Wild

Slides:



Advertisements
Similar presentations
U NDERSTANDING D ATA C ENTER T RAFFIC C HARACTERISTICS Theophilus Benson 1, Ashok Anand 1, Aditya Akella 1, Ming Zhang 2 University Of Wisconsin – Madison.
Advertisements

Network Systems Sales LLC
SDN Controller Challenges
SDN + Storage.
PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric. Presented by: Vinuthna Nalluri Shiva Srivastava.
Data Center Fabrics. Forwarding Today Layer 3 approach: – Assign IP addresses to hosts hierarchically based on their directly connected switch. – Use.
Improving Datacenter Performance and Robustness with Multipath TCP Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik,
Multi-Layer Switching Layers 1, 2, and 3. Cisco Hierarchical Model Access Layer –Workgroup –Access layer aggregation and L3/L4 services Distribution Layer.
Inter-DC Measurements; App Workloads: Google, Facebook, Microsoft Aditya Akella Lecture 11.
1 Network Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
1 LAN Traffic Measurements Carey Williamson Department of Computer Science University of Calgary.
60 GHz Flyways: Adding multi-Gbps wireless links to data centers
Datacenter Network Topologies
Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
Copyright © 2005 Department of Computer Science CPSC 641 Winter Network Traffic Measurement A focus of networking research for 20+ years Collect.
Copyright © 2005 Department of Computer Science CPSC 641 Winter LAN Traffic Measurements Some of the first network traffic measurement papers were.
Rethinking Internet Traffic Management: From Multiple Decompositions to a Practical Protocol Jiayue He Princeton University Joint work with Martin Suchara,
Measuring a (MapReduce) Data Center Srikanth KandulaSudipta SenguptaAlbert Greenberg Parveen Patel Ronnie Chaiken.
Data Center Traffic and Measurements Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
ProActive Routing In Scalable Data Centers with PARIS Joint work with Dushyant Arora + and Jennifer Rexford* + Arista Networks *Princeton University Theophilus.
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Presented by Gregory Peaker and Tyler Maclean.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Data-Center Traffic Management COS 597E: Software Defined Networking.
Michael Over.  Which devices/links are most unreliable?  What causes failures?  How do failures impact network traffic?  How effective is network.
Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.
A Scalable, Commodity Data Center Network Architecture.
Demystifying and Controlling the Performance of Data Center Networks.
Datacenter Networks Mike Freedman COS 461: Computer Networks
Modeling and Evaluation of Fibre Channel Storage Area Networks Xavier Molero, Federico Silla, Vicente Santonia and Jose Duato.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicITE I Chapter 6 1 Introducing Routing and Switching in the Enterprise – Chapter 1 Networking.
Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel, Ronnie Chaiken Microsoft Research IMC November, 2009 Abhishek Ray
Department of Computer Science Engineering SRM University
VL2 – A Scalable & Flexible Data Center Network Authors: Greenberg et al Presenter: Syed M Irteza – LUMS CS678: 2 April 2013.
Top-Down Network Design Chapter Nine Developing Network Management Strategies Oppenheimer.
David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:
© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicITE I Chapter 6 1 Identifying Application Impacts on Network Design Designing and Supporting.
CloudNaaS: A Cloud Networking Platform for Enterprise Applications Theophilus Benson*, Aditya Akella*, Anees Shaikh +, Sambit Sahu + (*University of Wisconsin,
A Survey on Optical Interconnects for Data Centers Speaker: Shih-Chieh Chien Adviser: Prof Dr. Ho-Ting Wu.
A.SATHEESH Department of Software Engineering Periyar Maniammai University Tamil Nadu.
VL2: A Scalable and Flexible Data Center Network Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David.
Demystifying and Controlling the Performance of Big Data Jobs Theophilus Benson Duke University.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Connecting to the Network Introduction to Networking Concepts.
Service Level Monitoring. Measuring Network Delay, Jitter, and Packet-loss  Multi-media applications are sensitive to transmission characteristics of.
Authors: Xiaoqiao Meng, Vasileio Pappas and Li Zhang
정하경 MMLAB Fundamentals of Internet Measurement: a Tutorial Nevil Brownlee, Chris Lossley, “Fundamentals of Internet Measurement: a Tutorial,” CMG journal.
Data Center Load Balancing T Seminar Kristian Hartikainen Aalto University, Helsinki, Finland
Internet Measurement and Analysis Vinay Ribeiro Shriram Sarvotham Rolf Riedi Richard Baraniuk Rice University.
Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research.
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
1 Traffic Engineering By Kavitha Ganapa. 2 Introduction Traffic engineering is concerned with the issue of performance evaluation and optimization of.
MMPTCP: A Multipath Transport Protocol for Data Centres 1 Morteza Kheirkhah University of Edinburgh, UK Ian Wakeman and George Parisis University of Sussex,
R2C2: A Network Stack for Rack-scale Computers Paolo Costa, Hitesh Ballani, Kaveh Razavi, Ian Kash Microsoft Research Cambridge EECS 582 – W161.
For more course tutorials visit NTC 406 Entire Course NTC 406 Week 1 Individual Assignment Network Requirements Analysis Paper NTC 406.
VL2: A Scalable and Flexible Data Center Network
Chen Qian, Xin Li University of Kentucky
Yiting Xia, T. S. Eugene Ng Rice University
CIS 700-5: The Design and Implementation of Cloud Networks
Modeling and Evaluation of Fibre Channel Storage Area Networks
ECE 544: Traffic engineering (supplement)
Improving Datacenter Performance and Robustness with Multipath TCP
Improving Datacenter Performance and Robustness with Multipath TCP
NTHU CS5421 Cloud Computing
CPSC 641: Network Measurement
湖南大学-信息科学与工程学院-计算机与科学系
NTHU CS5421 Cloud Computing
VL2: A Scalable and Flexible Data Center Network
Ron Carovano Manager, Business Development F5 Networks
CPSC 641: Network Measurement
Towards Predictable Datacenter Networks
Presentation transcript:

Network Traffic Characteristics of Data Centers in the Wild Theophilus Benson*, Aditya Akella*, David A. Maltz+ *University of Wisconsin, Madison + Microsoft Research

The Importance of Data Centers “A 1-millisecond advantage in trading applications can be worth $100 million a year to a major brokerage firm” Internal users Line-of-Business apps Production test beds External users Web portals Web services Multimedia applications Chat/IM

The Case for Understanding Data Center Traffic Better understanding  better techniques Better traffic engineering techniques Avoid data losses Improve app performance Better Quality of Service techniques Better control over jitter Allow multimedia apps Better energy saving techniques Reduce data center’s energy footprint Reduce operating expenditures Initial stab network level traffic + app relationships

Canonical Data Center Architecture Core (L3) Aggregation (L2) Edge (L2) Top-of-Rack Rack of application servers Point out 3 layers (core, gag, edge), show how 2-Tier and 3-Tier are different Show links captured by SNMP Show links captured by packet sniffers Application servers

Dataset: Data Centers Studied DC Role DC Name Location Number Devices Universities EDU1 US-Mid 22 EDU2 36 EDU3 11 Private Enterprise PRV1 97 PRV2 US-West 100 Commercial Clouds CLD1 562 CLD2 763 CLD3 US-East 612 CLD4 S. America 427 CLD5 10 data centers 3 classes Universities Private enterprise Clouds Internal users Univ/priv Small Local to campus External users Large Globally diverse Data from 10 date centers -- 5 clouds, 3 uni, 2 enterprise Big data centers used for clouds Enterprise/uni data located closer to premise of users

Dataset: Collection SNMP Packet Traces Topology Poll SNMP MIBs Bytes-in/bytes-out/discards > 10 Days Averaged over 5 mins Packet Traces Cisco port span 12 hours Topology Cisco Discovery Protocol DC Name SNMP Packet Traces Topology EDU1 Yes EDU2 EDU3 PRV1 PRV2 CLD1 No CLD2 CLD3 CLD4 CLD5

Canonical Data Center Architecture Core (L3) SNMP & Topology From ALL Links Aggregation (L2) Packet Sniffers Edge (L2) Top-of-Rack Rack of application servers Point out 3 layers (core, gag, edge), show how 2-Tier and 3-Tier are different Show links captured by SNMP Show links captured by packet sniffers Application servers

Applications Start at bottom BroID tool for identification Analyze running applications Use packet traces BroID tool for identification Quantify amount of traffic from each app

Applications Differences between various bars App mix found in the dc impact the results we look at next Differences between various bars Clustering of applications PRV2_2 hosts secured portions of applications PRV2_3 hosts unsecure portions of applications

Analyzing Packet Traces Transmission patterns of the applications Properties of packet crucial for Understanding effectiveness of techniques ON-OFF traffic at edges Binned in 15 and 100 m. secs We observe that ON-OFF persists Important for understanding effective assumptions and types of techniques that can be used

Data-Center Traffic is Bursty Understanding arrival process Range of acceptable models What is the arrival process? Heavy-tail for the 3 distributions ON, OFF times, Inter-arrival, Lognormal across all data centers Different from Pareto of WAN Need new models Data Center Off Period Dist ON periods Inter-arrival Prv2_1 Lognormal Prv2_2 Prv2_3 Prv2_4 EDU1 Weibull EDU2 EDU3 What are you curvingfitting for log normal parameters or the actual distributions Not fit == different We == conventionally

Packet Size Distribution Bimodal (200B and 1400B) Small packets TCP acknowledgements Keep alive packets Persistent connections  important to apps

Canonical Data Center Architecture Core (L3) Aggregation (L2) Edge (L2) Top-of-Rack Rack of application servers Point out 3 layers (core, gag, edge), show how 2-Tier and 3-Tier are different Show links captured by SNMP Show links captured by packet sniffers Application servers

Intra-Rack Versus Extra-Rack Quantify amount of traffic using interconnect Perspective for interconnect analysis Extra-Rack Edge Application servers Intra-Rack intra-rack - traffic within extra-rack – traffic leaving Use equation to show how derived Extra-Rack = Sum of Uplinks Intra-Rack = Sum of Server Links – Extra-Rack

Intra-Rack Versus Extra-Rack Results Clouds: most traffic stays within a rack (75%) Colocation of apps and dependent components Other DCs: > 50% leaves the rack Un-optimized placement

Extra-Rack Traffic on DC Interconnect Utilization: core > agg > edge Aggregation of many onto few Tail of core utilization differs Hot-spots  links with > 70% util Prevalence of hot-spots differs across data centers

Persistence of Core Hot-Spots Low persistence: PRV2, EDU1, EDU2, EDU3, CLD1, CLD3 High persistence/low prevalence: PRV1, CLD2 2-8% are hotspots > 50% High persistence/high prevalence: CLD4, CLD5 15% are hotspots > 50%

Prevalence of Core Hot-Spots 0.6% 0.0% 6.0% 0.0% 24.0% 0.0% 0 10 20 30 40 50 Time (in Hours) Low persistence: very few concurrent hotspots High persistence: few concurrent hotspots High prevalence: < 25% are hotspots at any time

Observations from Interconnect Links utils low at edge and agg Core most utilized Hot-spots exists (> 70% utilization) < 25% links are hotspots Loss occurs on less utilized links (< 70%) Implicating momentary bursts Time-of-Day variations exists Variation an order of magnitude larger at core Apply these results to evaluate DC design requirements

Assumption 1: Larger Bisection Need for larger bisection VL2 [Sigcomm ‘09], Monsoon [Presto ‘08],Fat-Tree [Sigcomm ‘08], Portland [Sigcomm ‘09], Hedera [NSDI ’10] Congestion at oversubscribed core links

Argument for Larger Bisection Need for larger bisection VL2 [Sigcomm ‘09], Monsoon [Presto ‘08],Fat-Tree [Sigcomm ‘08], Portland [Sigcomm ‘09], Hedera [NSDI ’10] Congestion at oversubscribed core links Increase core links and eliminate congestion

Calculating Bisection Demand Core Bisection Links (bottleneck) Aggregation Edge App Links intra-rack - traffic within extra-rack – traffic leaving Use equation to show how derived Application servers If Σ traffic (App ) > 1 then more device are Σ capacity(Bisection needed at the bisection

Bisection Demand Need to better utilize existing network Given our data: current applications and DC design NO, more bisection is not required Aggregate bisection is only 30% utilized Need to better utilize existing network Load balance across paths Migrate VMs across racks

Related Works IMC ‘09 [Kandula`09] Cloud measurements [Wang’10,Li’10] Traffic is unpredictable Most traffic stays within a rack Cloud measurements [Wang’10,Li’10] Study application performance End-2-End measurements

Insights Gained 75% of traffic stays within a rack (Clouds) Applications are not uniformly placed Half packets are small (< 200B) Keep alive integral in application design At most 25% of core links highly utilized Effective routing algorithm to reduce utilization Load balance across paths and migrate VMs Questioned popular assumptions Do we need more bisection? No Is centralization feasible? Yes

Looking Forward Currently 2 DC networks: data & storage What is the impact of convergence? Public cloud data centers What is the impact of random VM placement? Current work is network centric What role does application play?

Questions? Email: tbenson@cs.wisc.edu