Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London.

Similar presentations


Presentation on theme: "Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London."— Presentation transcript:

1 Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

2 Introduction Transport of Data for next generation applications Transport of Data for next generation applications Network hardware is capable of Gigabits per second Network hardware is capable of Gigabits per second Current ‘ Vanilla ’ TCP not capable over long distances and high throughputs Current ‘ Vanilla ’ TCP not capable over long distances and high throughputs New TCP Stacks have been introduced to rectify problem New TCP Stacks have been introduced to rectify problem Investigation into the performance, bottlenecks and deploy-ability of new algorithms Investigation into the performance, bottlenecks and deploy-ability of new algorithms

3 Transmission Control Protocol Connection orientated Connection orientated Reliable Transport of Data Reliable Transport of Data Window based Window based Congestion and Flow Control to prevent network collapse Congestion and Flow Control to prevent network collapse Provides ‘ fairness ’ between competing streams Provides ‘ fairness ’ between competing streams 20 Years old 20 Years old Originally designed for kbit/sec pipes Originally designed for kbit/sec pipes

4 TCP Algorithms Based on two algorithms to determine rate at which data is to be sent Based on two algorithms to determine rate at which data is to be sent Slowstart: probe for initial bandwidth Slowstart: probe for initial bandwidth Congestion Avoidance: maintain a steady state transfer rate Congestion Avoidance: maintain a steady state transfer rate Focus on Steady State: probe for increases in available bandwidth, whilst backing off if congestion is detected (through loss). Focus on Steady State: probe for increases in available bandwidth, whilst backing off if congestion is detected (through loss). Maintained through a ‘ congestion window ’ cwnd that regulates the number of unacknowledged packets allowed on connection. Maintained through a ‘ congestion window ’ cwnd that regulates the number of unacknowledged packets allowed on connection. Size of window approx equals Bandwidth delay product Size of window approx equals Bandwidth delay product Determines the appropriate window size to set to obtain a bandwidth under a certain delay Determines the appropriate window size to set to obtain a bandwidth under a certain delay Window = Bandwidth x Delay Window = Bandwidth x Delay

5 Algorithms Congestion Avoidance Congestion Avoidance For every packet (ack) received by sender For every packet (ack) received by sender Cwnd  cwnd + 1/cwnd Cwnd  cwnd + 1/cwnd For when loss is detected (through dupacks) For when loss is detected (through dupacks) Cwnd  cwnd / 2 Cwnd  cwnd / 2 Growth of cwnd determined by: Growth of cwnd determined by: the RTT of the connection the RTT of the connection When rtt is high, cwnd grows slowly (because of acking) When rtt is high, cwnd grows slowly (because of acking) The loss rate on the line The loss rate on the line High loss means that cwnd never achieved a large value High loss means that cwnd never achieved a large value Capacity of the link Capacity of the link Allows for large cwnd value (when low loss) Allows for large cwnd value (when low loss)

6 Current Methods of Achieving High Throughput Advantages Advantages Achieves good throughput Not changes to kernels required Disadvantages Disadvantages Have to manually tune the number of flows May induce extra loss on lossy networks Need to reprogram/recompile software

7 New TCP Stacks Modify the congestion control algorithm to improve response times Modify the congestion control algorithm to improve response times All based on modifying the cwnd growth and decrease values All based on modifying the cwnd growth and decrease values Define: Define: a = increase of data packets per window of acks a = increase of data packets per window of acks b = decrease factor upon congestion b = decrease factor upon congestion To maintain compatibility (and hence network stability and fairness), for small cwnd values: To maintain compatibility (and hence network stability and fairness), for small cwnd values: Mode switch from Vanilla to New TCP Mode switch from Vanilla to New TCP

8 HSTCP Designed by Sally Floyd Designed by Sally Floyd Determine a and b as a function of cwnd Determine a and b as a function of cwnd a  a(cwnd) a  a(cwnd) b  b(cwnd) b  b(cwnd) Gradual improvement in throughput as we approach larger bandwidth delay products Gradual improvement in throughput as we approach larger bandwidth delay products Current implementation focused on performance upto 10Gb/sec – set linear relation between loss and throughput (response function) Current implementation focused on performance upto 10Gb/sec – set linear relation between loss and throughput (response function)

9 Scalable TCP Designed by Tom Kelly Designed by Tom Kelly Define a and b to be constant: Define a and b to be constant: a: cwnd  cwnd + a (per ack) a: cwnd  cwnd + a (per ack) b: cwnd  cwnd – b x cwnd b: cwnd  cwnd – b x cwnd Intrinsic scaling property that has the same performance over any link (beyond the initial threshold) Intrinsic scaling property that has the same performance over any link (beyond the initial threshold) Recommended settings Recommended settings a = 1/100 a = 1/100 b = 1/8 b = 1/8

10 H-TCP Designed by Doug Leith and Robert Shorten Designed by Doug Leith and Robert Shorten Define a mode switch so that after congestion we do normal Vanilla Define a mode switch so that after congestion we do normal Vanilla After a predefined period ∆ L, switch to a high performance a After a predefined period ∆ L, switch to a high performance a ∆ i ≤ ∆ L : a = 1 ∆ i ≤ ∆ L : a = 1 ∆ I > ∆ L : a = 1 + (∆ - ∆ L ) + [(∆ - ∆ L )/20] 2 ∆ I > ∆ L : a = 1 + (∆ - ∆ L ) + [(∆ - ∆ L )/20] 2 Upon loss drop by Upon loss drop by | [B i max (k+1) - B i max (k)] / B i max (k) | > 0.2: b = 0.5 | [B i max (k+1) - B i max (k)] / B i max (k) | > 0.2: b = 0.5 Else: b = RTT min /RTT max Else: b = RTT min /RTT max

11 Implementation All New Stacks have own implementation All New Stacks have own implementation Small differences between implementations means that we are comparing the kernel differences rather than just the algorithmic differences Small differences between implementations means that we are comparing the kernel differences rather than just the algorithmic differences Lead to development of ‘ test platform ’ kernel  altAIMD Lead to development of ‘ test platform ’ kernel  altAIMD Implements all three stacks via simple sysctl switch. Implements all three stacks via simple sysctl switch. Also incorporates switches for certain undesirable kernel ‘ features ’ Also incorporates switches for certain undesirable kernel ‘ features ’ moderate_cwnd() moderate_cwnd() IFQ IFQ Added extra features for testing/evaluation purposes Added extra features for testing/evaluation purposes Appropriate Byte Counting (RFC3465) Appropriate Byte Counting (RFC3465) Inducible packet loss (at recv) Inducible packet loss (at recv) Web100 TCP logging (cwnd etc) Web100 TCP logging (cwnd etc)

12 Networks Under Test Networks Networks Cisco 7600 Junipe r StarLight CERN Cisco 7600 Mancheste r UCL DataTAG MB-NG Bottleneck Capacity1Gb/sec RTT120msec Bottleneck Capacity1Gb/sec RTT6msec

13 Graph/Demo Mode switch between stacks on constant packet drop Mode switch between stacks on constant packet drop Vanilla TCPScalable TCPHS-TCP {{{

14 Comparison against theory Response function Response function

15 Self Similar Background Tests Results skewed Results skewed Not comparing differences in TCP algorithms! Not comparing differences in TCP algorithms! Not useful results! Not useful results!

16 SACK … Look into what ’ s happening at the algorithmic level: Look into what ’ s happening at the algorithmic level: Strange hiccups in cwnd  only correlation is SACK arrivals Strange hiccups in cwnd  only correlation is SACK arrivals Scalable TCP on MB-NG with 200mbit/sec CBR Background

17 SACKS Supplies the sender information about what segments the recv has Supplies the sender information about what segments the recv has Sender infers the missing packets to resend Sender infers the missing packets to resend Aids recovery during loss and prevents timeouts Aids recovery during loss and prevents timeouts Current implementation in 2.4 and 2.6 does a walk through the entire sack list for each SACK Current implementation in 2.4 and 2.6 does a walk through the entire sack list for each SACK Very cpu intensive Very cpu intensive Can be interrupted by arrival of next SACK which causes the SACK implementation to misbehave Can be interrupted by arrival of next SACK which causes the SACK implementation to misbehave Tests conducted with Tom Kelly ’ s SACK fast-path patch Tests conducted with Tom Kelly ’ s SACK fast-path patch Improves SACK processing, but still not sufficient Improves SACK processing, but still not sufficient

18 SACK Processing overhead Periods of web100 silence due to high cpu utilization Periods of web100 silence due to high cpu utilization Logging done in userspace – kernel time taken up by tcp sack processing Logging done in userspace – kernel time taken up by tcp sack processing TCP resets cwnd TCP resets cwnd

19 Congestion Window Moderation Linux TCP implementation adds ‘ feature ’ of moderate_cwnd() Linux TCP implementation adds ‘ feature ’ of moderate_cwnd() Idea is to prevent large bursts of data packets under ‘ dubious ’ conditions Idea is to prevent large bursts of data packets under ‘ dubious ’ conditions When an ACK acknowledges more than 3 packets (typically 2) When an ACK acknowledges more than 3 packets (typically 2) Adjusts cwnd to known number of packets ‘ in- flight ’ (plus extra 3 packets) Adjusts cwnd to known number of packets ‘ in- flight ’ (plus extra 3 packets) Under large cwnd sizes (high bandwidth delay products), throughput can be diminished as result Under large cwnd sizes (high bandwidth delay products), throughput can be diminished as result

20 CPU Load and Throughput

21 moderate_cwnd OFFmoderate_cwnd ON moderate_cwnd(): Vanilla TCP CWND Throughput 90% TCP AF

22 moderate_cwnd(): HS-TCP 70% TCP AF 90% TCP AF moderate_cwnd OFF moderate_cwnd ON

23 moderate_cwnd OFF moderate_cwnd ON 70% TCP AF 90% TCP AF moderate_cwnd(): Scalable-TCP

24 Multiple Streams Aggregate BW CoV

25 10 TCP Flows versus Self-Similar Background Aggregate BW CoV

26 10 TCP Flows versus Self-Similar Background BG Loss per TCP BW

27 Impact Fairness: ratio of throughput achieved by one stack against another Fairness: ratio of throughput achieved by one stack against another Means that a fairness against vanilla tcp is defined by how much more throughput a new stacks gets more than vanilla Means that a fairness against vanilla tcp is defined by how much more throughput a new stacks gets more than vanilla Doesn ’ t really consider deploy-ability of the stacks in real life – how does these stacks affect the existing traffic? (mostly vanilla tcp) Doesn ’ t really consider deploy-ability of the stacks in real life – how does these stacks affect the existing traffic? (mostly vanilla tcp) Redefine fairness in terms of the Impact: Redefine fairness in terms of the Impact: Consider the affect of the background traffic only under different stacks Consider the affect of the background traffic only under different stacks Vary against number of TCP Flows to determine impact(vanilla flows) Vary against number of TCP Flows to determine impact(vanilla flows) throughput of n-Vanilla flows throughput of (n-1) Vanilla flows + 1 new TCP flow BW impact =

28 Impact of 1 TCP Flow Throughput Impact Throughput

29 1 New TCP Impact CoV

30 Impact of 10 TCP Flows Throughput Impact Throughput

31 10 TCP Flows Impact CoV

32 WAN Tests

33 Summary Comparison of actual TCP differences through test platform kernel Comparison of actual TCP differences through test platform kernel Problems with SACK implementations mean that it is difficult under loss to maintain high throughput (>500Mbit/sec) Problems with SACK implementations mean that it is difficult under loss to maintain high throughput (>500Mbit/sec) Other problems exist with kernel implementation that hinder performance Other problems exist with kernel implementation that hinder performance Compare stacks under different artificial (and hence repeatable) conditions Compare stacks under different artificial (and hence repeatable) conditions Single stream: Single stream: Multiple stream: Multiple stream: Need to study over wider range of networks Need to study over wider range of networks Move tests onto real production environments Move tests onto real production environments


Download ppt "Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London."

Similar presentations


Ads by Google