Congestion models for bursty TCP traffic Damon Wischik + Mark Handley University College London DARPA grant W911NF
History of TCP Transmission Control Protocol 1974: First draft of TCP/IP [“A protocol for packet network interconnection”, Vint Cerf and Robert Kahn] 1983: ARPANET switches on TCP/IP 1986: Congestion collapse 1988: Congestion control for TCP [“Congestion avoidance and control”, Van Jacobson] “A Brief History of the Internet”, the Internet Society
TCP algorithm if (seqno > _last_acked) { if (!_in_fast_recovery) { _last_acked = seqno; _dupacks = 0; inflate_window(); send_packets(now); _last_sent_time = now; return; } if (seqno < _recover) { uint32_t new_data = seqno - _last_acked; _last_acked = seqno; if (new_data < _cwnd) _cwnd -= new_data; else _cwnd=0; _cwnd += _mss; retransmit_packet(now); send_packets(now); return; } uint32_t flightsize = _highest_sent - seqno; _cwnd = min(_ssthresh, flightsize + _mss); _last_acked = seqno; _dupacks = 0; _in_fast_recovery = false; send_packets(now); return; } if (_in_fast_recovery) { _cwnd += _mss; send_packets(now); return; } _dupacks++; if (_dupacks!=3) { send_packets(now); return; } _ssthresh = max(_cwnd/2, (uint32_t)(2 * _mss)); retransmit_packet(now); _cwnd = _ssthresh + 3 * _mss; _in_fast_recovery = true; _recover = _highest_sent; } time [0-8 sec] traffic rate [0-100 kB/sec]
Motivation We want higher throughput for TCP flows This requires faster routers and lower packet drop probabilities –high-throughput TCP flows with large round trip time are especially sensitive to drop, since it takes them a long time to recover x x x packet drops throughput 100 Mb/s 0 time 60 seconds
Motivation We want higher throughput for TCP flows This requires faster routers and lower packet drop probabilities –high-throughput TCP flows with large round trip time are especially sensitive to drop, since it takes them a long time to recover Such a network is hard to build –Buffering becomes an ever-harder challenge as router speeds increase DRAM access speeds double every 10 years; this cannot keep up with ever-faster linecards –Larger buffers aren’t even very good at reducing drop probability!
Motivation We want higher throughput for TCP flows This requires faster routers and lower packet drop probabilities –high-throughput TCP flows with large round trip time are especially sensitive to drop, since it takes them a long time to recover Such a network is hard to build –Buffering becomes an ever-harder challenge as router speeds increase Larger buffers aren’t even very good at reducing drop probability! Objectives –Understand better the nature of congestion at core routers –Redesign TCP based on this understanding –Rethink the buffer size for core routers
Three Modes of Congestion Theory predicts three qualitatively different modes of congestion, depending on buffer size. –TCP uses feedback control to adjust its rate –The feedback loop is mediated by queues at routers –By changing buffer size, we change the nature of the traffic and the mode of congestion A major difference between the modes is synchronization + + = + + = aggregate traffic rate individual flow rates all flows get drops at the same time: synchronization drops are evenly spread: desynchronization
Mode I: small buffers e.g. a buffer of 25 packets System is stable TCP flows are desynchronized Queue size oscillations are very rapid Steady losses Primal fluid model drop prob queue size util queue size [0—5sec]
Mode II: intermediate buffers e.g. the McKeown √N rule System is unstable TCP flows are synchronized Queue size flips suddenly from empty to full, or full to empty Queue-based AQM cannot work Buffer is small enough that RTT is approximately constant Primal fluid model drop prob queue size util queue size [0—5sec]
Mode III: large buffers e.g. the bandwidth-delay- product rule of thumb System is unstable (although it can be stabilized by e.g. RED) TCP flows are synchronized Queue size varies fluidly RTT varies Primal-dual fluid model drop prob queue size util queue size [0—5sec]
Conclusion We therefore proposed –A buffer of only 30 packets is sufficient for a core router, regardless of the line rate. –Random queue size fluctuations are only ever of the order of 30 packets; larger buffers just lead to persistent queues and synchronization –A buffer of 30 packets gives >95% utilization, and keeps the system stable Other researchers ran simulations with buffers this small—and found very poor performance
Problem: TCP burstiness Slow access links serve to pace out TCP packets Fast access links allow a TCP flow to send its entire window back-to- back We had only simulated slow access links, and our theory only covered paced TCP traffic. Other researchers simulated faster access links. no. of packets sent [0—25] time [0—5s] slow access linksfast access links
TCP burstiness Slow access links serve to pace out TCP packets Fast access links allow a TCP flow to send its entire window back-to- back slow access linksfast access links drop prob queue size util
TCP burstiness Slow access links serve to pace out TCP packets Fast access links allow a TCP flow to send its entire window back-to- back Queueing theory suggests that queueing behaviour is governed by the buffer size B when TCP traffic is paced, but that it is governed by B/W for very bursty TCP traffic, where W is the mean window size For bursty traffic, the buffer should be up to 15 times bigger than we proposed slow access linksfast access links drop prob queue size util B=300pkt is intermediateB=300pkt is small
TCP burstiness Slow access links serve to pace out TCP packets Fast access links allow a TCP flow to send its entire window back-to- back The aggregate of paced TCP traffic looks Poisson over short timescales. This drives our original model of the three modes of congestion, and is supported by theory and measurements [Bell Labs 2002, CAIDA 2004] We predict that the aggregate of very bursty TCP traffic should look like a batch Poisson process. slow access linksfast access links drop prob queue size util
Limitations/concerns Surely bottlenecks are at the access network, not the core network? –Unwise to rely on this! –The small-buffer theory seems to work for as few as 200 flows We need more measurement of short-timescale Internet traffic statistics Limited validation of predictions about buffer size [McKeown et al. at Stanford, Level3, Internet2] Proper validation needs –goodly amount of traffic –full measurement kit –ability to control buffer size
Conclusion There are three qualitatively different modes of congestion –Buffer size determines which mode is in operation, for paced traffic –Buffer size divided by mean window size determines the mode, for very bursty traffic –We have a collection of rules of thumb for quantifying whether a buffer is small, intermediate or big, and for quantifying how burstiness depends on access speeds These modes of congestion have several consequences –UTILIZATION very small buffers can cut maximum utilization by 20%; synchronization in intermediate buffers can cut it by 4% –SYNCHRONIZED LOSSES intermediate and large buffers lead to synchronized losses, which are detrimental to real-time traffic –QUEUEING DELAY large buffers lead to queueing delay; and while end systems can recover from loss, they can never recover lost time