Profiling Network Performance in Multi-tier Datacenter Applications Minlan Yu Princeton University 1 Joint work with Albert Greenberg,

Slides:



Advertisements
Similar presentations
Programmable Measurement Architecture for Data Centers Minlan Yu University of Southern California 1.
Advertisements

TCP Vegas: New Techniques for Congestion Detection and Control.
Doc.: IEEE /0604r1 Submission May 2014 Slide 1 Modeling and Evaluating Variable Bit rate Video Steaming for ax Date: Authors:
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Copyright © 2005 Department of Computer Science 1 Solving the TCP-incast Problem with Application-Level Scheduling Maxim Podlesny, University of Waterloo.
Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.
Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat David Andersen, Greg Ganger, Garth Gibson, Brian Mueller* Carnegie Mellon University, *Panasas.
Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring Peng Sun Minlan Yu, Michael J. Freedman, Jennifer Rexford Princeton University.
Profiling Network Performance in Multi-tier Datacenter Applications
1 Is the Round-trip Time Correlated with the Number of Packets in Flight? Saad Biaz, Auburn University Nitin H. Vaidya, University of Illinois at Urbana-Champaign.
Defense: Christopher Francis, Rumou duan Data Center TCP (DCTCP) 1.
CSEE W4140 Networking Laboratory Lecture 7: TCP flow control and congestion control Jong Yul Kim
Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
TDC375 Winter 03/04 John Kristoff - DePaul University 1 Network Protocols Transmission Control Protocol (TCP)
Computer Networks Transport Layer. Topics F Introduction  F Connection Issues F TCP.
17/10/2003TCP performance over ad-hoc mobile networks. 1 LCCN – summer 2003 Uri Silbershtein Roi Dayagi Nir Hasson.
TCP. Learning objectives Reliable Transport in TCP TCP flow and Congestion Control.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
Transport Level Protocol Performance Evaluation for Bulk Data Transfers Matei Ripeanu The University of Chicago Abstract:
Jennifer Rexford Princeton University MW 11:00am-12:20pm Wide-Area Traffic Management COS 597E: Software Defined Networking.
On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research
Junxian Huang 1 Feng Qian 2 Yihua Guo 1 Yuanyuan Zhou 1 Qiang Xu 1 Z. Morley Mao 1 Subhabrata Sen 2 Oliver Spatscheck 2 1 University of Michigan 2 AT&T.
Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
ICTCP: Incast Congestion Control for TCP in Data Center Networks∗
Performance Diagnosis and Improvement in Data Center Networks
IA-TCP A Rate Based Incast- Avoidance Algorithm for TCP in Data Center Networks Communications (ICC), 2012 IEEE International Conference on 曾奕勳.
Advanced Network Architecture Research Group 2001/11/149 th International Conference on Network Protocols Scalable Socket Buffer Tuning for High-Performance.
ISO Layer Model Lecture 9 October 16, The Need for Protocols Multiple hardware platforms need to have the ability to communicate. Writing communications.
TCP Throughput Collapse in Cluster-based Storage Systems
CS540/TE630 Computer Network Architecture Spring 2009 Tu/Th 10:30am-Noon Sue Moon.
©NEC Laboratories America 1 Huadong Liu (U. of Tennessee) Hui Zhang, Rauf Izmailov, Guofei Jiang, Xiaoqiao Meng (NEC Labs America) Presented by: Hui Zhang.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
TCP1 Transmission Control Protocol (TCP). TCP2 Outline Transmission Control Protocol.
Advanced Network Architecture Research Group 2001/11/74 th Asia-Pacific Symposium on Information and Telecommunication Technologies Design and Implementation.
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
Requirements for Simulation and Modeling Tools Sally Floyd NSF Workshop August 2005.
Rate Control Rate control tunes the packet sending rate. No more than one packet can be sent during each packet sending period. Additive Increase: Every.
Copyright © Lopamudra Roychoudhuri
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 ICIR AT&T Labs – Research
1 Evaluating NGI performance Matt Mathis
TCP: Transmission Control Protocol Part II : Protocol Mechanisms Computer Network System Sirak Kaewjamnong Semester 1st, 2004.
1 CS 4396 Computer Networks Lab TCP – Part II. 2 Flow Control Congestion Control Retransmission Timeout TCP:
Janey C. Hoe Laboratory for Computer Science at MIT 노상훈, Pllab.
The Macroscopic behavior of the TCP Congestion Avoidance Algorithm.
Transmission Control Protocol (TCP) BSAD 146 Dave Novak Sources: Network+ Guide to Networks, Dean 2013.
Performance Interactions Between P-HTTP and TCP Implementation John Heidemann USC/Information Sciences Institute May 19, 1997 Presentation Baekcheol Jang.
ECE 4110 – Internetwork Programming
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
Studies of LHCb Trigger Readout Network Design Karol Hennessy University College Dublin Karol Hennessy University College Dublin.
11 CS716 Advanced Computer Networks By Dr. Amir Qayyum.
© 2006 Andreas Haeberlen, MPI-SWS 1 Monarch: A Tool to Emulate Transport Protocol Flows over the Internet at Large Andreas Haeberlen MPI-SWS / Rice University.
MOZART: Temporal Coordination of Measurement (SOSR’ 16)
TCP Performance Monitoring
Topics discussed in this section:
OTCP: SDN-Managed Congestion Control for Data Center Networks
The Transport Layer (TCP)
Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers
TCP Vegas: New Techniques for Congestion Detection and Avoidance
Presented by Kristen Carlson Accardi
Transport Protocols over Circuits/VCs
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
TCP Congestion Control at the Network Edge
TCP Congestion Control at the Network Edge
Anant Mudambi, U. Virginia
TCP: Transmission Control Protocol Part II : Protocol Mechanisms
Presentation transcript:

Profiling Network Performance in Multi-tier Datacenter Applications Minlan Yu Princeton University 1 Joint work with Albert Greenberg, Dave Maltz, Jennifer Rexford, Lihua Yuan, Srikanth Kandula, Changhoon Kim

2 Data Center Architecture Applications inside Data Centers Multi-tier Applications Front end Server Aggregator … Aggregator Worker … …

Challenges of Datacenter Diagnosis Multi-tier applications – Tens of hundreds of application components – Tens of thousands of servers Evolving applications – Add new features, fix bugs – Change components while app is still in operation Human factors – Developers may not understand network well – Nagle’s algorithm, delayed ACK, etc. 3

Where are the Performance Problems? Network or application? – App team: Why low throughput, high delay? – Net team: No equipment failure or congestion Network and application! -- their interactions – Network stack is not configured correctly – Small application writes delayed by TCP – TCP incast: synchronized writes cause packet loss 4 A diagnosis tool to understand network-application interactions A diagnosis tool to understand network-application interactions

Today’s Diagnosis Methods Ad-hoc, application specific – Dig out throughput/delay problems from app logs Significant overhead and coarse grained – Capture packet trace for manual inspection – Use switch counters to check link utilization 5 A diagnosis tool that runs everywhere, all the time A diagnosis tool that runs everywhere, all the time

Full Knowledge of Data Centers Direct access to network stack – Directly measure rather than relying on inference – E.g., # of fast retransmission packets Application-server mapping – Know which application runs on which servers – E.g., which app to blame for sending a lot of traffic Network topology and routing – Know which application uses which resource – E.g., which app is affected if a link is congested 6

SNAP: Scalable Net-App Profiler 7

Outline SNAP architecture – Passively measure real-time network stack info – Systematically identify performance problems – Correlate across connections to pinpoint problems SNAP deployment – Operators: Characterize performance problems – Developers: Identify problems for applications SNAP validation and overhead 8

SNAP Architecture Step 1: Network-stack measurements 9

What Data to Collect? Goals: – Fine-grained: in milliseconds or seconds – Low overhead: low CPU overhead and data volume – Generic across applications Two types of data: – Poll TCP statistics  Network performance – Event-driven socket logging  App expectation – Both exist in today’s linux and windows systems 10

TCP statistics Instantaneous snapshots – #Bytes in the send buffer – Congestion window size, receiver window size – Snapshots based on Poisson sampling Cumulative counters – #FastRetrans, #Timeout – RTT estimation: #SampleRTT, #SumRTT – RwinLimitTime – Calculate difference between two polls 11

SNAP Architecture Step 2: Performance problem classification 12

Life of Data Transfer Application generates the data Copy data to send buffer TCP sends data to the network Receiver receives the data and ACK 13 Sender App Send Buffer Receiver Network

Classifying Socket Performance – Bottlenecked by CPU, disk, etc. – Slow due to app design (small writes) – Send buffer not large enough – Fast retransmission – Timeout – Not reading fast enough (CPU, disk, etc.) – Not ACKing fast enough (Delayed ACK) 14 Sender App Send Buffer Receiver Network

Identifying Performance Problems – Not any other problems – Send buffer is almost full – #Fast retransmission – #Timeout – RwinLimitTime – Delayed ACK diff(SumRTT) > diff(SampleRTT)*MaxQueuingDelay 15 Sender App Send Buffer Receiver Network Direct measure Sampling Inference

SNAP Architecture Step 3: Correlation across connections 16

Pinpoint Problems via Correlation 17 Correlation over shared switch/link/host – Packet loss for all the connections going through one switch/host – Pinpoint the problematic switch

Pinpoint Problems via Correlation 18 Correlation over application – Same application has problem on all machines – Report aggregated application behavior

Correlation Algorithm Input: – A set of connections (shared resource or app) – Correlation interval M, Aggregation interval t Solution: Correlation interval M Aggregation interval t time(t1,c1..c6)time(t2,c1..c6)time(t3,c1..c6) … time(t1,c1..c6)time(t2,c1..c6)time(t3,c1..c6) … Linear correlation across connections 19

SNAP Architecture 20

SNAP Deployment 21

SNAP Deployment Production data center – 8K machines, 700 applications – Ran SNAP for a week, collected petabytes of data Operators: Profiling the whole data center – Characterize the sources of performance problems – Key problems in the data center Developers: Profiling individual applications – Pinpoint problems in app software, network stack, and their interactions 22

Performance Problem Overview A small number of apps suffer from significant performance problems 23 Problems>5% of the time> 50% of the time Sender app567 apps551 Send buffer11 Network306 Recv win limit228 Delayed ACK154144

Performance Problem Overview Delayed ACK should be disabled – ~2% of conns have delayed ACK > 99% of the time – 129 delay-sensitive apps have delayed ACK > 50% of the time 24 Data Data+ACK A B ACK B has data to send A has data to send B doesn’t have data to send

Classifying Socket Performance – Bottlenecked by CPU, disk, etc. – Slow due to app design (small writes) – Send buffer not large enough – Fast retransmission – Timeout – Not reading fast enough (CPU, disk, etc.) – Not ACKing fast enough (Delayed ACK) 25 Sender App Send Buffer Receiver Network

Send Buffer and Recv Window Problems on a single connection 26 App process … Write Bytes TCP Send Buffer App process … Read Bytes TCP Recv Buffer Some apps use default 8KB Fixed max size 64KB not enough for some apps

Need Buffer Autotuning Problems of sharing buffer at a single host – More send buffer problems on machines with more connections – How to set buffer size cooperatively? Auto-tuning send buffer and recv window – Dynamically allocate buffer across applications – Based on congestion window of each app – Tune send buffer and recv window together 27

Classifying Socket Performance – Bottlenecked by CPU, disk, etc. – Slow due to app design (small writes) – Send buffer not large enough – Fast retransmission – Timeout – Not reading fast enough (CPU, disk, etc.) – Not ACKing fast enough (Delayed ACK) 28 Sender App Send Buffer Receiver Network

Packet Loss in a Day in the Datacenter Packet loss burst every hour 2-4 am is the backup time 29

Types of Packet Loss vs. Throughput Small traffic, not enough packets to trigger FastRetrans Mostly FastRetrans Why still timeouts? One point for each connection at each interval Why peak at 1M/sec? 30 More FastRetrans More Timeouts Operators should reduce the number and effect of packet loss (especially timeouts) for small flows

Recall: SNAP diagnosis SNAP diagnosis steps: – Correlate connection performance to pinpoint applications with problems – Expose socket and TCP stats – Find out root cause with operators and developers – Propose potential solutions 31 Sender App Send Buffer Receiver Network

Spread Writes over Multiple Connections SNAP diagnosis: – More timeouts than fast retransmission – Small packet sending rate Root cause: – Two connections to avoid head-of-line blocking – Low-rate small requests gets more timeouts 32 Req Respons e

Spread Writes over Multiple Connections SNAP diagnosis: – More timeouts than fast retransmission – Small packet sending rate Root cause: – Two connections to avoid head-of-line blocking – Low-rate small requests gets more timeouts Solution: – Use one connection; Assign ID to each request – Combine data to reduce timeouts 33 Req 3Req 2 Req 1 Response 2

Congestion Window Allows Sudden Bursts SNAP diagnosis: – Significant packet loss – Congestion window is too large after an idle period Root cause: – Slow start restart is disabled 34

Slow Start Restart Slow start restart – Reduce congestion window size if the connection is idle to prevent sudden burst 35 t Window Drops after an idle time

Slow Start Restart However, developers disabled it because: – Intentionally increase congestion window over a persistent connection to reduce delay – E.g., if congestion window is large, it just takes 1 RTT to send 64 KB data Potential solution: – New congestion control for delay sensitive traffic 36

Classifying Socket Performance – Bottlenecked by CPU, disk, etc. – Slow due to app design (small writes) – Send buffer not large enough – Fast retransmission – Timeout – Not reading fast enough (CPU, disk, etc.) – Not ACKing fast enough (Delayed ACK) 37 Sender App Send Buffer Receiver Network

Timeout and Delayed ACK SNAP diagnosis – Congestion window drops to one after a timeout – Followed by a delayed ACK Solution: – Congestion window drops to two 38

200ms ACK Delay W1: write() less than MSS W2: write() less than MSS Nagle and Delayed ACK TCP/IPApp Network TCP segment with W1 TCP segment with W2 ACK for W1 TCP/IP App read() W1 read() W2 SNAP diagnosis – Delayed ACK and small writes 39

Receiver Socket send buffer Send Buffer and Delayed ACK Application buffer Application 1. Send complete Network Stack 2. ACK With Send Buffer Receiver Application buffer Application 2. Send complete Network Stack 1. ACK Set Send Buffer to zero 40 SNAP diagnosis: Delayed ACK and send buffer = 0

SNAP Validation and Overhead 41

Correlation Accuracy – Inject two real problems – Mix labeled data with real production data – Correlation over shared machine – Successfully identified those labled machines 2.7% of machines have ACC > % of machines have ACC >

SNAP Overhead Data volume – Socket logs: 20 Bytes per socket – TCP statistics: 120 Bytes per connection per poll CPU overhead – Log socket calls: event-driven, < 5% – Read TCP table – Poll TCP statistics 43

Reducing CPU Overhead CPU overhead – Polling TCP statistics and reading TCP table – Increase with number of connections and polling freq. – E.g., 35% for polling 5K connections with 50 ms interval 5% for polling 1K connections with 500 ms interval Adaptive tuning of polling frequency – Reduce polling frequency to stay within a target CPU – Devote more polling to more problematic connections 44

Conclusion A simple, efficient way to profile data centers – Passively measure real-time network stack information – Systematically identify components with problems – Correlate problems across connections Deploying SNAP in production data center – Characterize data center performance problems Help operators improve platform and tune network – Discover app-net interactions Help developers to pinpoint app problems 45

Class Discussion Does TCP fit for data centers? – How to optimize TCP for data centers? – What should new transport protocol be? How to diagnose data center performance problems? – What kind of network/application data do we need? – How to diagnose virtualized environment? – How to perform active measurement? 46

Backup 47

T-RAT: TCP Rate Analysis Tool Goal – Analyze TCP packet traces – determine rate-limiting factors for different connections Seven classes of rate-limiting factors 48