Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel, Ronnie Chaiken Microsoft Research IMC November, 2009 Abhishek Ray

Slides:

Advertisements

Similar presentations

EdgeNet2006 Summit1 Virtual LAN as A Network Control Mechanism Tzi-cker Chiueh Computer Science Department Stony Brook University.

Advertisements

Advanced Technology Laboratories page 1 Network Performance Monitoring at Small Time Scales Dina Papagiannaki, Rene Cruz, Christophe Diot.

Bandwidth Estimation Workshop 2003 Evaluating pathrate and pathload with realistic cross-traffic Ravi Prasad Manish Jain Constantinos Dovrolis (ravi, jain,

Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.

Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Presented by Shaddi.

Stratos: A Network-Aware Orchestration Layer for Middleboxes in the Cloud Aditya Akella, Aaron Gember, Anand Krishnamurthy, Saul St. John University of.

Cloud Control with Distributed Rate Limiting Raghaven et all Presented by: Brian Card CS Fall Kinicki 1.

Doc.: IEEE /0604r1 Submission May 2014 Slide 1 Modeling and Evaluating Variable Bit rate Video Steaming for ax Date: Authors:

Flowlet Switching Srikanth Kandula Shan Sinha & Dina Katabi.

Fast, Memory-Efficient Traffic Estimation by Coincidence Counting Fang Hao 1, Murali Kodialam 1, T. V. Lakshman 1, Hui Zhang 2, 1 Bell Labs, Lucent Technologies.

A Flexible Model for Resource Management in Virtual Private Networks Presenter: Huang, Rigao Kang, Yuefang.

Profiling Network Performance in Multi-tier Datacenter Applications Minlan Yu Princeton University 1 Joint work with Albert Greenberg,

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.

Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.

SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.

FairCloud: Sharing the Network in Cloud Computing Computer Communication Review(2012) Arthur : Lucian Popa Arvind Krishnamurthy Sylvia Ratnasamy Ion Stoica.

Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems Ted J. Wasserman (IBM Corp. / Queen’s University) Pat.

An Empirical Study of Real Audio Traffic A. Mena and J. Heidemann USC/Information Sciences Institute In Proceedings of IEEE Infocom Tel-Aviv, Israel March.

Profiling Network Performance in Multi-tier Datacenter Applications

Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

On the Constancy of Internet Path Properties Yin Zhang, Nick Duffield AT&T Labs Vern Paxson, Scott Shenker ACIRI Internet Measurement Workshop 2001 Presented.

Internet Research Needs a Critical Perspective Towards Models –Sally Floyd –IMA Workshop, January 2004.

Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

Network Traffic Measurement and Modeling CSCI 780, Fall 2005.

Measuring a (MapReduce) Data Center Srikanth KandulaSudipta SenguptaAlbert Greenberg Parveen Patel Ronnie Chaiken.

End-to-End Issues. Route Diversity  Load balancing o Per packet splitting o Per flow splitting  Spill over  Route change o Failure o policy  Route.

Bandwidth Allocation in a Self-Managing Multimedia File Server Vijay Sundaram and Prashant Shenoy Department of Computer Science University of Massachusetts.

Unconstrained Endpoint Profiling (Googling the Internet)‏ Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.

Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.

Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.

RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao.

Traffic Matrix Estimation for Traffic Engineering Mehmet Umut Demircin.

New Challenges in Cloud Datacenter Monitoring and Management

Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency Interpolation Myungjin Lee †, Nick Duffield‡, Ramana Rao Kompella†

Tomo-gravity Yin ZhangMatthew Roughan Nick DuffieldAlbert Greenberg “A Northern NJ Research Lab” ACM.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Sharing the Data Center Network Alan Shieh, Srikanth Kandula, Albert Greenberg, Changhoon Kim, Bikas Saha Microsoft Research, Cornell University, Windows.

Shannon Lab 1AT&T – Research Traffic Engineering with Estimated Traffic Matrices Matthew Roughan Mikkel Thorup

P.1Service Control Technologies for Peer-to-peer Traffic in Next Generation Networks Part2: An Approach of Passive Peer based Caching to Mitigate P2P Inter-domain.

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

Copyright © 2011, Programming Your Network at Run-time for Big Data Applications 張晏誌指導老師：王國禎教授.

Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]

NetPilot: Automating Datacenter Network Failure Mitigation Xin Wu, Daniel Turner, Chao-Chih Chen, David A. Maltz, Xiaowei Yang, Lihua Yuan, Ming Zhang.

Understanding the Performance of TCP Pacing Amit Aggarwal, Stefan Savage, Thomas Anderson Department of Computer Science and Engineering University of.

Scalable Multi-Class Traffic Management in Data Center Backbone Networks Amitabha Ghosh (UtopiaCompression) Sangtae Ha (Princeton) Edward Crabbe (Google)

ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.

VL2: A Scalable and Flexible Data Center Network Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David.

Multiplicative Wavelet Traffic Model and pathChirp: Efficient Available Bandwidth Estimation Vinay Ribeiro.

Bandwidth Estimation Workshop 2003 Evaluating pathrate and pathload with realistic cross-traffic Ravi Prasad Manish Jain Constantinos Dovrolis (ravi, jain,

Surviving Failures in Bandwidth Constrained Datacenters Authors: Peter Bodik Ishai Menache Mosharaf Chowdhury Pradeepkumar Mani David A.Maltz Ion Stoica.

정하경 MMLAB Fundamentals of Internet Measurement: a Tutorial Nevil Brownlee, Chris Lossley, “Fundamentals of Internet Measurement: a Tutorial,” CMG journal.

Toward Efficient and Simplified Distributed Data Intensive Computing IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 6, JUNE 2011PPT.

Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.

Quality of Service Schemes for IEEE Wireless LANs-An Evaluation 主講人 : 黃政偉.

CHARACTERIZING CLOUD COMPUTING HARDWARE RELIABILITY Authors: Kashi Venkatesh Vishwanath ; Nachiappan Nagappan Presented By: Vibhuti Dhiman.

Network Anomaly Detection Using Autonomous System Flow Aggregates Thienne Johnson 1,2 and Loukas Lazos 1 1 Department of Electrical and Computer Engineering.

1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.

MMPTCP: A Multipath Transport Protocol for Data Centres 1 Morteza Kheirkhah University of Edinburgh, UK Ian Wakeman and George Parisis University of Sussex,

Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.

VL2: A Scalable and Flexible Data Center Network

Data Center Architectures

Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.

Measurement-based Design

ECE 544: Traffic engineering (supplement)

Unconstrained Endpoint Profiling (Googling the Internet)‏

Modeling and Evaluating Variable Bit rate Video Steaming for ax

Towards Predictable Datacenter Networks

Approximate Mean Value Analysis of a Database Grid Application

Presentation transcript:

Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel, Ronnie Chaiken Microsoft Research IMC November, 2009 Abhishek Ray THE NATURE OF DATACENTER: MEASUREMENTS & ANALYSIS

Outline Introduction Data & Methodology Application Traffic Characteristics Tomography Conclusion

Introduction Analysis and mining of data sets Processing around some petabytes of data This paper has tried to describe characteristics of traffic Detailed view of traffic Congestion conditions and patterns

Contribution Measurement Instrumentation Measures traffic at data centers rather than switches Traffic characteristics Flow, congestion and rate of change of traffic mix. Tomography Inference Accuracy Performs  Clusters =1500 servers  Rack = 20 2 months

Data & Methodology ISPs SNMP Counters Sampled Flow Deep packet Inspection Data Center Measurements at Server  Servers, Storage and network  Linkage of network traffic with application level logs

Socket level events at each servers ETW – Event Tracing for Windows One per application read or write Aggregates over several packets

ETW – Event tracing for Windows

Application Workload SQL Programming language like Scope 3 phases of different types Extract Partition Aggregate Combine Short interactive programs to long running programs

Traffic Characteristics

Patterns

Work-Seeks-BW and Scatter-Gather patterns in datacenter traffic exchanged b/w server pairs

Work-seeks-bandwidth Within same servers Within servers in same rack Within servers in same VLAN Scatter-gather-patterns Data is divided into small parts and each servers works on particular part Aggregated

How much traffic is exchanged between server pairs?

Server pair with same rack are more likely to exchange more bytes Probability of exchanging no traffic 89 % - servers within same rack 99.5 % - servers in different rack

How many other servers does a server correspond with?

Sever either talks to all other servers with the same rack Servers doesn’t talk to servers outside the rack or talks 1-10 % outside servers.

Congestion within the Datacenter Congestion within the Datacenter

N/W at as high an utilization as possible without adversely affecting throughput Low network utilization indicate Application by nature demands more of other resources such as CPU and disk than the network Applications can be re-written to make better use of available network bandwidth

Where and when the congestion happens in data center

Congestion Rate  86 % - 10 seconds  15 % seconds Short congestion periods are highly correlated across many tens of links and are due to brief spurts of high demand from the application Long lasting congestion periods tend to be more localized to a small set of links

Length of Congestion Events

Compares the rates of flows that overlap high utilization periods with the rates of all flows

Impact of high utilization

Read failure - Job is killed Congestion To attribute network traffic to the applications that generate it, they merge the network event logs with logs at the application-level that describe which job and phase were active at that time

Reduce phase - Data in each partition that is present at multiple servers in the cluster has to be pulled to the server that handles the reduce for the partition e.g. count the number of records that begin with ‘A’ Extract phase – Extracting the data Largest amount of data Evaluation phase – Problem Conclusion – High utilization epochs are caused by application demand and have a moderate negative impact to job performance

Flow Characteristics Flow Characteristics

Traffic mix changes frequently

How traffic changes over time within the data center

Change in traffic 10 th and 90 th percentiles are 37 % and 149 % the median change in traffic is roughly 82 %  even when the total traffic in the matrix remains the same, the server pairs that are involved in these traffic exchanges change appreciably

Short bursts cause spikes at the shorter time-scale (in dashed line) that smooth out at the longer time scale (in solid line) whereas gradual changes appear conversely, smoothed out at shorter time-scales yet pronounced on the longer time-scale Variability - key aspect for data center

Inter-arrival times in the entire cluster, at Top-of-Rack switches and at servers

Inter-arrivals at both servers and top-of-rack switches have spaced apart by roughly 15 ms This is likely due to the stop-and-go behavior of the application that rate-limits the creation of new flows  Median arrival rate of all flows in the cluster is 10 5 fl ows per second or 100 flows in every millisecond

Tomography N/W tomography methods to infer traffic matrices If the methods used in ISP n/w is applicable to datacenters, it would help to unravel the nature of traffic Why? Data flow volume is quadratic n(n - 1) – no. of links measurements are fewer Assumptions - Gravity model - Amount of traffic a node (origin) would send to another node (destination) is proportional to the traffic volume received by the destination Scalability

Methodology Computes ground truth TM and measure how well the TM estimated by tomography from these link counts approximates the true TM

Tomogravity and Spare Maximization

Tomogravity - Communication likely to be B/W nodes with same job rather than all nodes, whereas gravity model, not being aware of these job-clusters, introduces traffic across clusters, resulting in many non-zero TM entries Spare maximization – Error rate starts from several hundreds

Comparison the TMs by various tomography methods with the ground truth

Ground TMs are sparser than tomogravity estimated TMs, and denser than sparsity maximized estimated TMs

Conclusion Capture both Macroscopic patterns – which servers talk to which others, when and for what reasons Microscopic characteristics – flow durations, inter-arrival times Tighter coupling between network, computing, and storage in datacenter applications Congestion and negative application impact do occur, demanding improvement - better understanding of traffic and mechanisms that steer demand

My Take More data should be examined over a period of 1 year instead of 2 months I would certainly like to see some mining of data and application running at datacenters of companies like Google, Yahoo etc

Related Work T. Benson, A. Anand, A. Akella, andM. Zhang: Understanding Datacenter Traffic Characteristics, In SIGCOMMWREN workshop, A. Greenberg, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. Maltz, P. Patel, and S. Sengupta: VL 2 : A Scalable and Flexible Data Center Network, In ACM SIGCOMM, 2009.

Thank You