Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London.

Slides:



Advertisements
Similar presentations
Congestion Control and Fairness Models Nick Feamster CS 4251 Computer Networking II Spring 2008.
Advertisements

TCP Variants.
Simulation-based Comparison of Tahoe, Reno, and SACK TCP Kevin Fall & Sally Floyd Presented: Heather Heiman September 10, 2002.
TCP Vegas: New Techniques for Congestion Detection and Control.
24-1 Chapter 24. Congestion Control and Quality of Service (part 1) 23.1 Data Traffic 23.2 Congestion 23.3 Congestion Control 23.4 Two Examples.
Congestion Control Created by M Bateman, A Ruddle & C Allison As part of the TCP View project.
Restricted Slow-Start for TCP William Allcock 1,2, Sanjay Hegde 3 and Rajkumar Kettimuthu 1,2 1 Argonne National Laboratory 2 The University of Chicago.
EE 122: Congestion Control The Sequel October 1, 2003.
Presentation by Joe Szymanski For Upper Layer Protocols May 18, 2015.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #7 TCP New Reno Vs. Reno.
1 Internet Networking Spring 2002 Tutorial 10 TCP NewReno.
1 689 Lecture 2 Review of Last Lecture Networking basics TCP/UDP review.
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
TCP Congestion Control TCP sources change the sending rate by modifying the window size: Window = min {Advertised window, Congestion Window} In other words,
1 TCP Transport Control Protocol Reliable In-order delivery Flow control Responds to congestion “Nice” Protocol.
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
Transport: TCP Manpreet Singh (Slides borrowed from various sources on the web)
1 Internet Networking Spring 2004 Tutorial 10 TCP NewReno.
1 K. Salah Module 6.1: TCP Flow and Congestion Control Connection establishment & Termination Flow Control Congestion Control QoS.
TCP Congestion Control
Introduction 1 Lecture 14 Transport Layer (Congestion Control) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer Science.
TCP: flow and congestion control. Flow Control Flow Control is a technique for speed-matching of transmitter and receiver. Flow control ensures that a.
Transport Layer 4 2: Transport Layer 4.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Transport Layer3-1 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles.
Implementing High Speed TCP (aka Sally Floyd’s) Yee-Ting Li & Gareth Fairey 1 st October 2002 DataTAG CERN (Kinda!)
Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control!
CSE 461 University of Washington1 Topic How TCP implements AIMD, part 1 – “Slow start” is a component of the AI portion of AIMD Slow-start.
ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.
MaxNet NetLab Presentation Hailey Lam Outline MaxNet as an alternative to TCP Linux implementation of MaxNet Demonstration of fairness, quick.
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
1 TCP - Part II Relates to Lab 5. This is an extended module that covers TCP data transport, and flow control, congestion control, and error control in.
Lecture 9 – More TCP & Congestion Control
What is TCP? Connection-oriented reliable transfer Stream paradigm
Transport Layer 3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March
CS640: Introduction to Computer Networks Aditya Akella Lecture 15 TCP – III Reliability and Implementation Issues.
Computer Networking Lecture 18 – More TCP & Congestion Control.
1 CS 4396 Computer Networks Lab TCP – Part II. 2 Flow Control Congestion Control Retransmission Timeout TCP:
CS640: Introduction to Computer Networks Aditya Akella Lecture 15 TCP – III Reliability and Implementation Issues.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Transport Layer 3- Midterm score distribution. Transport Layer 3- TCP congestion control: additive increase, multiplicative decrease Approach: increase.
TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
Hamilton Institute Evaluating TCP Congestion Control Doug Leith Hamilton Institute Ireland Thanks: Robert Shorten, Yee Ting Lee, Baruch Even.
TCP Congestion Control 컴퓨터공학과 인공지능 연구실 서 영우. TCP congestion control2 Contents 1. Introduction 2. Slow-start 3. Congestion avoidance 4. Fast retransmit.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
Sandeep Kakumanu Smita Vemulapalli Gnan
Other Methods of Dealing with Congestion
TCP - Part II Relates to Lab 5. This is an extended module that covers TCP flow control, congestion control, and error control in TCP.
Chapter 3 outline 3.1 transport-layer services
Chapter 6 TCP Congestion Control
Introduction to Congestion Control
TCP Vegas: New Techniques for Congestion Detection and Avoidance
Chapter 3 outline 3.1 Transport-layer services
TCP - Part II Relates to Lab 5. This is an extended module that covers TCP flow control, congestion control, and error control in TCP.
Lecture 19 – TCP Performance
Other Methods of Dealing with Congestion
Other Methods of Dealing with Congestion
Chapter 6 TCP Congestion Control
CS640: Introduction to Computer Networks
TCP Congestion Control
Transport Layer: Congestion Control
Chapter 3 outline 3.1 Transport-layer services
TCP flow and congestion control
Review of Internet Protocols Transport Layer
Presentation transcript:

Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

Introduction Transport of Data for next generation applications Transport of Data for next generation applications Network hardware is capable of Gigabits per second Network hardware is capable of Gigabits per second Current ‘ Vanilla ’ TCP not capable over long distances and high throughputs Current ‘ Vanilla ’ TCP not capable over long distances and high throughputs New TCP Stacks have been introduced to rectify problem New TCP Stacks have been introduced to rectify problem Investigation into the performance, bottlenecks and deploy-ability of new algorithms Investigation into the performance, bottlenecks and deploy-ability of new algorithms

Transmission Control Protocol Connection orientated Connection orientated Reliable Transport of Data Reliable Transport of Data Window based Window based Congestion and Flow Control to prevent network collapse Congestion and Flow Control to prevent network collapse Provides ‘ fairness ’ between competing streams Provides ‘ fairness ’ between competing streams 20 Years old 20 Years old Originally designed for kbit/sec pipes Originally designed for kbit/sec pipes

TCP Algorithms Based on two algorithms to determine rate at which data is to be sent Based on two algorithms to determine rate at which data is to be sent Slowstart: probe for initial bandwidth Slowstart: probe for initial bandwidth Congestion Avoidance: maintain a steady state transfer rate Congestion Avoidance: maintain a steady state transfer rate Focus on Steady State: probe for increases in available bandwidth, whilst backing off if congestion is detected (through loss). Focus on Steady State: probe for increases in available bandwidth, whilst backing off if congestion is detected (through loss). Maintained through a ‘ congestion window ’ cwnd that regulates the number of unacknowledged packets allowed on connection. Maintained through a ‘ congestion window ’ cwnd that regulates the number of unacknowledged packets allowed on connection. Size of window approx equals Bandwidth delay product Size of window approx equals Bandwidth delay product Determines the appropriate window size to set to obtain a bandwidth under a certain delay Determines the appropriate window size to set to obtain a bandwidth under a certain delay Window = Bandwidth x Delay Window = Bandwidth x Delay

Algorithms Congestion Avoidance Congestion Avoidance For every packet (ack) received by sender For every packet (ack) received by sender Cwnd  cwnd + 1/cwnd Cwnd  cwnd + 1/cwnd For when loss is detected (through dupacks) For when loss is detected (through dupacks) Cwnd  cwnd / 2 Cwnd  cwnd / 2 Growth of cwnd determined by: Growth of cwnd determined by: the RTT of the connection the RTT of the connection When rtt is high, cwnd grows slowly (because of acking) When rtt is high, cwnd grows slowly (because of acking) The loss rate on the line The loss rate on the line High loss means that cwnd never achieved a large value High loss means that cwnd never achieved a large value Capacity of the link Capacity of the link Allows for large cwnd value (when low loss) Allows for large cwnd value (when low loss)

Current Methods of Achieving High Throughput Advantages Advantages Achieves good throughput Not changes to kernels required Disadvantages Disadvantages Have to manually tune the number of flows May induce extra loss on lossy networks Need to reprogram/recompile software

New TCP Stacks Modify the congestion control algorithm to improve response times Modify the congestion control algorithm to improve response times All based on modifying the cwnd growth and decrease values All based on modifying the cwnd growth and decrease values Define: Define: a = increase of data packets per window of acks a = increase of data packets per window of acks b = decrease factor upon congestion b = decrease factor upon congestion To maintain compatibility (and hence network stability and fairness), for small cwnd values: To maintain compatibility (and hence network stability and fairness), for small cwnd values: Mode switch from Vanilla to New TCP Mode switch from Vanilla to New TCP

HSTCP Designed by Sally Floyd Designed by Sally Floyd Determine a and b as a function of cwnd Determine a and b as a function of cwnd a  a(cwnd) a  a(cwnd) b  b(cwnd) b  b(cwnd) Gradual improvement in throughput as we approach larger bandwidth delay products Gradual improvement in throughput as we approach larger bandwidth delay products Current implementation focused on performance upto 10Gb/sec – set linear relation between loss and throughput (response function) Current implementation focused on performance upto 10Gb/sec – set linear relation between loss and throughput (response function)

Scalable TCP Designed by Tom Kelly Designed by Tom Kelly Define a and b to be constant: Define a and b to be constant: a: cwnd  cwnd + a (per ack) a: cwnd  cwnd + a (per ack) b: cwnd  cwnd – b x cwnd b: cwnd  cwnd – b x cwnd Intrinsic scaling property that has the same performance over any link (beyond the initial threshold) Intrinsic scaling property that has the same performance over any link (beyond the initial threshold) Recommended settings Recommended settings a = 1/100 a = 1/100 b = 1/8 b = 1/8

H-TCP Designed by Doug Leith and Robert Shorten Designed by Doug Leith and Robert Shorten Define a mode switch so that after congestion we do normal Vanilla Define a mode switch so that after congestion we do normal Vanilla After a predefined period ∆ L, switch to a high performance a After a predefined period ∆ L, switch to a high performance a ∆ i ≤ ∆ L : a = 1 ∆ i ≤ ∆ L : a = 1 ∆ I > ∆ L : a = 1 + (∆ - ∆ L ) + [(∆ - ∆ L )/20] 2 ∆ I > ∆ L : a = 1 + (∆ - ∆ L ) + [(∆ - ∆ L )/20] 2 Upon loss drop by Upon loss drop by | [B i max (k+1) - B i max (k)] / B i max (k) | > 0.2: b = 0.5 | [B i max (k+1) - B i max (k)] / B i max (k) | > 0.2: b = 0.5 Else: b = RTT min /RTT max Else: b = RTT min /RTT max

Implementation All New Stacks have own implementation All New Stacks have own implementation Small differences between implementations means that we are comparing the kernel differences rather than just the algorithmic differences Small differences between implementations means that we are comparing the kernel differences rather than just the algorithmic differences Lead to development of ‘ test platform ’ kernel  altAIMD Lead to development of ‘ test platform ’ kernel  altAIMD Implements all three stacks via simple sysctl switch. Implements all three stacks via simple sysctl switch. Also incorporates switches for certain undesirable kernel ‘ features ’ Also incorporates switches for certain undesirable kernel ‘ features ’ moderate_cwnd() moderate_cwnd() IFQ IFQ Added extra features for testing/evaluation purposes Added extra features for testing/evaluation purposes Appropriate Byte Counting (RFC3465) Appropriate Byte Counting (RFC3465) Inducible packet loss (at recv) Inducible packet loss (at recv) Web100 TCP logging (cwnd etc) Web100 TCP logging (cwnd etc)

Networks Under Test Networks Networks Cisco 7600 Junipe r StarLight CERN Cisco 7600 Mancheste r UCL DataTAG MB-NG Bottleneck Capacity1Gb/sec RTT120msec Bottleneck Capacity1Gb/sec RTT6msec

Graph/Demo Mode switch between stacks on constant packet drop Mode switch between stacks on constant packet drop Vanilla TCPScalable TCPHS-TCP {{{

Comparison against theory Response function Response function

Self Similar Background Tests Results skewed Results skewed Not comparing differences in TCP algorithms! Not comparing differences in TCP algorithms! Not useful results! Not useful results!

SACK … Look into what ’ s happening at the algorithmic level: Look into what ’ s happening at the algorithmic level: Strange hiccups in cwnd  only correlation is SACK arrivals Strange hiccups in cwnd  only correlation is SACK arrivals Scalable TCP on MB-NG with 200mbit/sec CBR Background

SACKS Supplies the sender information about what segments the recv has Supplies the sender information about what segments the recv has Sender infers the missing packets to resend Sender infers the missing packets to resend Aids recovery during loss and prevents timeouts Aids recovery during loss and prevents timeouts Current implementation in 2.4 and 2.6 does a walk through the entire sack list for each SACK Current implementation in 2.4 and 2.6 does a walk through the entire sack list for each SACK Very cpu intensive Very cpu intensive Can be interrupted by arrival of next SACK which causes the SACK implementation to misbehave Can be interrupted by arrival of next SACK which causes the SACK implementation to misbehave Tests conducted with Tom Kelly ’ s SACK fast-path patch Tests conducted with Tom Kelly ’ s SACK fast-path patch Improves SACK processing, but still not sufficient Improves SACK processing, but still not sufficient

SACK Processing overhead Periods of web100 silence due to high cpu utilization Periods of web100 silence due to high cpu utilization Logging done in userspace – kernel time taken up by tcp sack processing Logging done in userspace – kernel time taken up by tcp sack processing TCP resets cwnd TCP resets cwnd

Congestion Window Moderation Linux TCP implementation adds ‘ feature ’ of moderate_cwnd() Linux TCP implementation adds ‘ feature ’ of moderate_cwnd() Idea is to prevent large bursts of data packets under ‘ dubious ’ conditions Idea is to prevent large bursts of data packets under ‘ dubious ’ conditions When an ACK acknowledges more than 3 packets (typically 2) When an ACK acknowledges more than 3 packets (typically 2) Adjusts cwnd to known number of packets ‘ in- flight ’ (plus extra 3 packets) Adjusts cwnd to known number of packets ‘ in- flight ’ (plus extra 3 packets) Under large cwnd sizes (high bandwidth delay products), throughput can be diminished as result Under large cwnd sizes (high bandwidth delay products), throughput can be diminished as result

CPU Load and Throughput

moderate_cwnd OFFmoderate_cwnd ON moderate_cwnd(): Vanilla TCP CWND Throughput 90% TCP AF

moderate_cwnd(): HS-TCP 70% TCP AF 90% TCP AF moderate_cwnd OFF moderate_cwnd ON

moderate_cwnd OFF moderate_cwnd ON 70% TCP AF 90% TCP AF moderate_cwnd(): Scalable-TCP

Multiple Streams Aggregate BW CoV

10 TCP Flows versus Self-Similar Background Aggregate BW CoV

10 TCP Flows versus Self-Similar Background BG Loss per TCP BW

Impact Fairness: ratio of throughput achieved by one stack against another Fairness: ratio of throughput achieved by one stack against another Means that a fairness against vanilla tcp is defined by how much more throughput a new stacks gets more than vanilla Means that a fairness against vanilla tcp is defined by how much more throughput a new stacks gets more than vanilla Doesn ’ t really consider deploy-ability of the stacks in real life – how does these stacks affect the existing traffic? (mostly vanilla tcp) Doesn ’ t really consider deploy-ability of the stacks in real life – how does these stacks affect the existing traffic? (mostly vanilla tcp) Redefine fairness in terms of the Impact: Redefine fairness in terms of the Impact: Consider the affect of the background traffic only under different stacks Consider the affect of the background traffic only under different stacks Vary against number of TCP Flows to determine impact(vanilla flows) Vary against number of TCP Flows to determine impact(vanilla flows) throughput of n-Vanilla flows throughput of (n-1) Vanilla flows + 1 new TCP flow BW impact =

Impact of 1 TCP Flow Throughput Impact Throughput

1 New TCP Impact CoV

Impact of 10 TCP Flows Throughput Impact Throughput

10 TCP Flows Impact CoV

WAN Tests

Summary Comparison of actual TCP differences through test platform kernel Comparison of actual TCP differences through test platform kernel Problems with SACK implementations mean that it is difficult under loss to maintain high throughput (>500Mbit/sec) Problems with SACK implementations mean that it is difficult under loss to maintain high throughput (>500Mbit/sec) Other problems exist with kernel implementation that hinder performance Other problems exist with kernel implementation that hinder performance Compare stacks under different artificial (and hence repeatable) conditions Compare stacks under different artificial (and hence repeatable) conditions Single stream: Single stream: Multiple stream: Multiple stream: Need to study over wider range of networks Need to study over wider range of networks Move tests onto real production environments Move tests onto real production environments