Copyright © 2005 Department of Computer Science 1 Solving the TCP-incast Problem with Application-Level Scheduling Maxim Podlesny, University of Waterloo.

Slides:



Advertisements
Similar presentations
Martin Suchara, Ryan Witt, Bartek Wydrowski California Institute of Technology Pasadena, U.S.A. TCP MaxNet Implementation and Experiments on the WAN in.
Advertisements

A Switch-Based Approach to Starvation in Data Centers Alex Shpiner and Isaac Keslassy Department of Electrical Engineering, Technion. Gabi Bracha, Eyal.
Deconstructing Datacenter Packet Transport Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker Stanford University.
1 TCP Vegas: New Techniques for Congestion Detection and Avoidance Lawrence S. Brakmo Sean W. O’Malley Larry L. Peterson Department of Computer Science.
Web Server Benchmarking Using the Internet Protocol Traffic and Network Emulator Carey Williamson, Rob Simmonds, Martin Arlitt et al. University of Calgary.
PFabric: Minimal Near-Optimal Datacenter Transport Mohammad Alizadeh Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker.
Congestion Control: TCP & DC-TCP Swarun Kumar With Slides From: Prof. Katabi, Alizadeh et al.
Introduction 1 Lecture 14 Transport Layer (Transmission Control Protocol) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer.
Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat David Andersen, Greg Ganger, Garth Gibson, Brian Mueller* Carnegie Mellon University, *Panasas.
Congestion control in data centers
Chapter 3 Transport Layer slides are modified from J. Kurose & K. Ross CPE 400 / 600 Computer Communication Networks Lecture 12.
Modeling of Web/TCP Transfer Latency Yujian Peter Li January 22, 2004 M. Sc. Committee: Dr. Carey Williamson Dr. Wayne Eberly Dr. Elena Braverman Department.
Congestion Control Tanenbaum 5.3, /12/2015Congestion Control (A Loss Based Technique: TCP)2 What? Why? Congestion occurs when –there is no reservation.
Copyright © 2005 Department of Computer Science 111 The Edge of Smartness Carey Williamson Department of Computer Science University of Calgary
1 Lecture 10: TCP Performance Slides adapted from: Congestion slides for Computer Networks: A Systems Approach (Peterson and Davis) Chapter 3 slides for.
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
1 Minseok Kwon and Sonia Fahmy Department of Computer Sciences Purdue University {kwonm, TCP Increase/Decrease.
A Switch-Based Approach to Starvation in Data Centers Alex Shpiner Joint work with Isaac Keslassy Faculty of Electrical Engineering Faculty of Electrical.
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
Internet and Intranet Protocols and Applications Section V: Network Application Performance Lecture 11: Why the World Wide Wait? 4/11/2000 Arthur P. Goldberg.
1 Manpreet Singh, Prashant Pradhan* and Paul Francis * MPAT: Aggregate TCP Congestion Management as a Building Block for Internet QoS.
Performance and Robustness Testing of Explicit-Rate ABR Flow Control Schemes Milan Zoranovic Carey Williamson October 26, 1999.
Low-Rate TCP Denial of Service Defense Johnny Tsao Petros Efstathopoulos Tutor: Guang Yang UCLA 2003.
ICTCP: Incast Congestion Control for TCP in Data Center Networks∗
Practical TDMA for Datacenter Ethernet
Introduction 1 Lecture 14 Transport Layer (Congestion Control) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer Science.
IA-TCP A Rate Based Incast- Avoidance Algorithm for TCP in Data Center Networks Communications (ICC), 2012 IEEE International Conference on 曾奕勳.
TCP & Data Center Networking
TCP Incast in Data Center Networks
Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.
3: Transport Layer3b-1 Principles of Congestion Control Congestion: r informally: “too many sources sending too much data too fast for network to handle”
Detail: Reducing the Flow Completion Time Tail in Datacenter Networks SIGCOMM PIGGY.
TCP Throughput Collapse in Cluster-based Storage Systems
Copyright © 2005 Department of Computer Science May 14, 2011Networks Conference11 The Edge of Smartness Carey Williamson iCORE Chair and Professor Department.
Elephants, Mice, and Lemmings! Oh My! Fred Baker Fellow 25 July 2014 Making life better in data centers and high speed computing.
Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control!
B 李奕德.  Abstract  Intro  ECN in DCTCP  TDCTCP  Performance evaluation  conclusion.
CA-RTO: A Contention- Adaptive Retransmission Timeout I. Psaras, V. Tsaoussidis, L. Mamatas Demokritos University of Thrace, Xanthi, Greece This study.
Parameswaran, Subramanian
Presented by Rajan Includes slides presented by Andrew Sprouse, Northeastern University CISC 856 TCP/IP and Upper Layer Protocols Date:May 03, 2011.
U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.
3: Transport Layer3-1 Where we are in chapter 3 Last time: r TCP m Reliable transfer m Flow control m Connection management r principles of congestion.
1 On Class-based Isolation of UDP, Short-lived and Long-lived TCP Flows by Selma Yilmaz Ibrahim Matta Computer Science Department Boston University.
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
Transport Layer3-1 TCP throughput r What’s the average throughout of TCP as a function of window size and RTT? m Ignore slow start r Let W be the window.
Transport Layer 3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March
Computer Networking Lecture 18 – More TCP & Congestion Control.
TCP: Transmission Control Protocol Part II : Protocol Mechanisms Computer Network System Sirak Kaewjamnong Semester 1st, 2004.
1 Sonia FahmyPurdue University TCP Congestion Control Sonia Fahmy Department of Computer Sciences Purdue University
1 Computer Networks Congestion Avoidance. 2 Recall TCP Sliding Window Operation.
Advance Computer Networks Lecture#09 & 10 Instructor: Engr. Muhammad Mateen Yaqoob.
H. OhsakiITCom A control theoretical analysis of a window-based flow control mechanism for TCP connections with different propagation delays Hiroyuki.
Chapter 11.4 END-TO-END ISSUES. Optical Internet Optical technology Protocol translates availability of gigabit bandwidth in user-perceived QoS.
TCP as a Reliable Transport. How things can go wrong… Lost packets Corrupted packets Reordered packets …Malicious packets…
@Yuan Xue A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their.
Revisiting Transport Congestion Control Jian He UT Austin 1.
ICTCP: Incast Congestion Control for TCP in Data Center Networks By: Hilfi Alkaff.
Chapter 3 outline 3.1 transport-layer services
OTCP: SDN-Managed Congestion Control for Data Center Networks
TCP Vegas: New Techniques for Congestion Detection and Avoidance
SCTP v/s TCP – A Comparison of Transport Protocols for Web Traffic
Lecture 19 – TCP Performance
Carnegie Mellon University, *Panasas Inc.
Providing QoS through Active Domain Management
Congestion Control in SDN-Enabled Networks
The War Between Mice & Elephants by, Matt Hartling & Sumit Kumbhar
TCP Congestion Control
Congestion Control in SDN-Enabled Networks
Evaluation of Objectivity/AMS on the Wide Area Network
When to use and when not to use BBR:
Presentation transcript:

Copyright © 2005 Department of Computer Science 1 Solving the TCP-incast Problem with Application-Level Scheduling Maxim Podlesny, University of Waterloo Carey Williamson, University of Calgary

Copyright © 2005 Department of Computer Science 22 Motivation 2 Emerging IT paradigms –Data centers, grid computing, HPC, multi-core –Cluster-based storage systems, SAN, NAS –Large-scale data management “in the cloud” –Data manipulation via “services-oriented computing” Cost and efficiency advantages from IT trends, economy of scale, specialization marketplace Performance advantages from parallelism –Partition/aggregation, MapReduce, BigTable, Hadoop –Think RAID at Internet scale! (1000x)

Copyright © 2005 Department of Computer Science 33 Problem Statement High-speed, low-latency network (RTT ≤ 0.1 ms) Highly-multiplexed link (e.g., 1000 flows) Highly-synchronized flows on bottleneck link Limited switch buffer size (e.g., 32 KB) How to provide high goodput for data center applications? TCP retransmission timeouts TCP throughput degradation N

Copyright © 2005 Department of Computer Science 444 Related Work E. Krevat et al., “On Application-based Approaches to Avoiding TCP Throughput Collapse in Cluster-based Storage Systems”, Proceedings of SuperComputing 2007 A. Phanishayee et al., “Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems”, Proceedings of FAST 2008 Y. Chen et al., “Understanding TCP Incast Throughput Collapse in Datacenter Networks”, WREN 2009 V. Vasudevan et al., “Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication”, Proceedings of ACM SIGCOMM 2009 M. Alizadeh et al., “Data Center TCP”, Proc. ACM SIGCOMM 2010 A. Shpiner et al., “A Switch-based Approach to Throughput Collapse and Starvation in Data Centers”, IWQoS 2010

Copyright © 2005 Department of Computer Science 55 Summary Data centers have specific network characteristics TCP-incast throughput collapse problem emerges Possible solutions: –Tweak TCP timers and/or parameters for this environment –Redesign (or replace!) TCP in this environment –Rewrite applications for this environment (Facebook) –Increase switch buffer sizes (extra queueing delay!) –Smart edge coordination for uploads/downloads Summary of Related Work

Copyright © 2005 Department of Computer Science 6 Data Center System Model N servers Logical data block (S) (e.g., 1 MB) Server Request Unit (SRU) (e.g., 32 KB) N packet size S_DATA small buffer B link capacity C switch client

Copyright © 2005 Department of Computer Science 7 Performance Comparisons  Internet vs. data center network: Internet propagation delay: ms data center propagation delay: 0.1 ms packet size 1 KB, link capacity 1 Gbps -> packet transmission time is 0.01 ms

Copyright © 2005 Department of Computer Science 88 Summary Determine maximum TCP flow concurrency (n) that can be supported without any packet loss Arrange the servers into k groups of (at most) n servers each, by staggering the group scheduling Analysis Overview (1 of 2)

Copyright © 2005 Department of Computer Science 99 Summary Determine maximum TCP flow concurrency (n) that can be supported without any packet loss –Determine flow size in packets (based on SRU and MSS) –Determine maximum outstanding packets per flow (W max ) –Determine max flow concurrency (based on B and W max ) Arrange the servers into k groups of (at most) n servers each, by staggering the group scheduling Analysis Overview (2 of 2)

Copyright © 2005 Department of Computer Science 10 Summary Recall TCP slow start dynamics: –Initial TCP congestion window (cwnd) is 1 packet –Acks cause cwnd to double every RTT (1, 2, 4, 8, 16…) Consider TCP transfer of an arbitrary SRU (e.g., 21) Determine peak power-of-2 cwnd value (W A ) Determine “residual window” for the last RTT (W B ) W max depends on both W A and W B (e.g., W A + W B /2 ) Determining W max

Copyright © 2005 Department of Computer Science 11 Scheduling Overview n nn nnn N

Copyright © 2005 Department of Computer Science 12 Scheduling Details Using lossless scheduling of server responses: maximum n servers responding simultaneously, with k groups of responding servers scheduled Using lossless scheduling of server responses: maximum n servers responding simultaneously, with k groups of responding servers scheduled Server i (1 <= i <= N) starts responding at: Server i (1 <= i <= N) starts responding at:

Copyright © 2005 Department of Computer Science 13 Theoretical Results Maximum goodput of an application in a data center with lossless scheduling is: where: S - size of a logical data block T - actual completion time of an SRU - SRU completion time used for scheduling k – how many groups of servers to use d max - real system scheduling variance Maximum goodput of an application in a data center with lossless scheduling is: where: S - size of a logical data block T - actual completion time of an SRU - SRU completion time used for scheduling k – how many groups of servers to use d max - real system scheduling variance

Copyright © 2005 Department of Computer Science 14 Solution Analytical Model Results

Copyright © 2005 Department of Computer Science 15 Results for 10 KB Fixed SRU Size (1 of 2)

Copyright © 2005 Department of Computer Science 16 Results for 10 KB Fixed SRU Size (2 of 2)

Copyright © 2005 Department of Computer Science 17 Results for Varied SRU Size (1 MB / N)

Copyright © 2005 Department of Computer Science 18 Effect of TCP Timer Granularity

Copyright © 2005 Department of Computer Science 19 Summary and Conclusion  Application-level scheduling for TCP- incast throughput collapse  Main idea: scheduling responses of servers so that there are no losses  Maximum goodput with lossless scheduling Non-monotonic goodput, highly-sensitive to network configuration parameters

Copyright © 2005 Department of Computer Science 20 Future Work  Implementing and testing our solution in real data centers  Evaluating our solution for different application traffic scenarios