Isaac Keslassy (Technion) Guido Appenzeller & Nick McKeown (Stanford) Sizing Router Buffers Isaac Keslassy (Technion) Guido Appenzeller & Nick McKeown (Stanford)
Routers Need Packet Buffers It’s well known that routers need packet buffers It’s less clear why and how much Goal of this work is to answer the question: How much buffering do routers need? Given that queueing delay is the only variable part of packet delay in the Internet, you’d think we’d know the answer already!
How Much Buffer Does a Router Need? Source Router Destination C 2T Universally applied rule-of-thumb: A router needs a buffer size: 2T is the two-way propagation delay (or just 250ms) C is capacity of bottleneck link Context Mandated in backbone and edge routers. Appears in RFPs and IETF architectural guidelines. Usually referenced to Villamizar and Song: “High Performance TCP in ANSNET”, CCR, 1994. Already known by inventors of TCP [Van Jacobson, 1988]. Has major consequences for router design.
Example 10Gb/s linecard Memory technologies Requires 300Mbytes of buffering. Read and write 40 byte packet every 32ns. Memory technologies DRAM: require 4 devices, but too slow. SRAM: require 80 devices, 1kW, $2000. Problem gets harder at 40Gb/s Hence RLDRAM, FCRAM, etc.
Main Result in This Talk The rule of thumb is wrong for a core router today Required buffer is instead of
Outline of this Talk The “Rule-of-Thumb” on Buffer Sizing is incorrect Where the rule of thumb comes from Why it is incorrect for a core router in the Internet today Real Buffer Requirements in case of Congestion Real Buffer Requirements without Congestion Experimental results from real Networks
Only W=2 packets may be outstanding TCP Only W=2 packets may be outstanding Router Source Dest C’ > C C TCP Congestion Window controls the sending rate Sender sends packets, receiver sends ACKs Sending rate is controlled by Window W, At any time, only W unacknowledged packets may be outstanding The sending rate of TCP is
For every W ACKs received, Single TCP Flow Router with large enough buffers for full link utilization For every W ACKs received, send W+1 packets B Source Dest C’ > C C t Window size RTT
Required buffer is height of sawtooth t
Origin of rule-of-thumb Before and after reducing window size, the sending rate of the TCP sender is the same Inserting the rate equation we get The RTT is part transmission delay T and part queueing delay B/C . We know that after reducing the window, the queueing delay is zero.
Rule-of-thumb Rule-of-thumb makes sense for one flow Typical backbone link has > 20,000 flows Does the rule-of-thumb still hold? Answer: If flows are perfectly synchronized, then Yes. If flows are desynchronized then No.
Outline of this Talk The “Rule-of-Thumb” on Buffer Sizing is incorrect Real Buffer Requirements in case of Congestion Correct buffer requirements for a congested router Result: Real Buffer Requirements without Congestion Experimental results from real Networks
If flows are synchronized t Aggregate window has same dynamics Therefore buffer occupancy has same dynamics Rule-of-thumb still holds.
When are Flows Synchronized? Small numbers of flows tend to synchronize Large aggregates of flows are not synchronized For > 200 flows, synchronization disappears Measurements in the core give no indication of synchronization
If flows are not synchronized Probability Distribution B Buffer Size
Central Limit Theorem CLT tells us that the more variables (congestion windows of flows) we have, the narrower the Gaussian (fluctuation of sum of windows) Width of Gaussian decreases with Buffer size should also decrease with
Required buffer size Simulation
Summary Flows in the core are desynchronized For desynchronized flows, routers need only buffers of
Outline of this Talk The “Rule-of-Thumb” on Buffer Sizing is incorrect Real Buffer Requirements in case of Congestion Real Buffer Requirements without Congestion Correct buffer requirements for an over-provisioned network Result: Even smaller buffers Experimental results from real Networks
Short Flows So far we were assuming a congested router with long flows in congestion avoidance mode. What about flows in slow start? Do buffer requirements differ? Answer: Yes, however: Required buffer in such cases is independent of line speed and RTT (same for 1Mbit/s or 40 Gbit/s) In mixes of flows, long flows drive buffer requirements Short flow result relevant for uncongested routers
A single, short-lived TCP flow Flow length 62 packets, RTT ~140 ms 32 Flow Completion Time (FCT) 16 8 4 fin ack received syn 2 RTT
Average Queue length (S is burst distribution of flows)
Queue Distribution We derived closed-form estimates of the queue distribution using Effective Bandwidth Gives very good closed form approximation Buffer requirements for short flows Small & independent of line speed and RTT In mixes of flows, long flows dominate buffer requirements
Outline of this Talk The “Rule-of-Thumb” on Buffer Sizing is incorrect Real Buffer Requirements in case of Congestion Real Buffer Requirements without Congestion Results from Real Networks Lab results with a physical router Experiments on production networks with real traffic
Experimental Evaluation Overview Simulation with ns2 Over 10,000 simulations that cover range of settings Simulation time 30s to 5 minutes Bandwidth 10 Mb/s - 1 Gb/s Latency 20ms -250 ms, Physical router Cisco GSR with OC3 line card In collaboration with University of Wisconsin Experimental results presented here Long Flows - Utilization Mixes of flows - Flow Completion Time (FCT) Mixes of flows - Heavy Tailed Flow Distribution Short Flows – Queue Distribution
Long Flows - Utilization (I) Small Buffers are sufficient - OC3 Line, ~100ms RTT 99.9% 99.5% 2× 98.0%
Long Flows – Utilization (II) Model vs. ns2 vs Long Flows – Utilization (II) Model vs. ns2 vs. Physical Router GSR 12000, OC3 Line Card TCP Flows Router Buffer Link Utilization Pkts RAM Model Sim Exp 100 0.5 x 1 x 2 x 3 x 64 129 258 387 1Mb 2Mb 4Mb 8Mb 96.9% 99.9% 100% 94.7% 99.3% 99.8% 94.9% 98.1% 99.7% 400 32 128 192 512kb 99.2% 99.5%
Short Flows – Queue Distribution Model vs Short Flows – Queue Distribution Model vs. Physical Router, OC3 Line Card
Experiments with live traffic (I) Stanford University Gateway Link from internet to student dormitories Estimated 400 concurrent flows, 25 Mb/s 7200 VXR (shared memory router) TCP Flows Router Buffer Link Utilization Pkts Model Exp 400 0.8 x 1.2 x 1.5 x >>2 x 46 65 85 500 95.9% 99.5% 99.9% 100% 97.4% 97.6% 98.5% Thanks to Sunia Yang, Wayne Sung and the Stanford Backbone Team
Thanks to Stanislav Shalunov of Internet2 and Guy Almes (now at NSF) Experiment with live traffic (II) Internet2 link Indianapolis to Kansas City Link Setup 10Gb/s link, T640 Default Buffer: ~1000 ms Flows of 1 Gb/s Loss requirement < 10-8 Experiment Reduced buffer to 10 ms (1%) - nothing happened Reduced buffer to 5 ms (0.5%) - nothing happened Next: buffer of 2ms (0.2%) Experiment ongoing… Thanks to Stanislav Shalunov of Internet2 and Guy Almes (now at NSF)
Outline The Rule of Thumb The buffer requirements for a congested router Buffer requirements for short flows (slow-start) Experimental Verification Conclusion
Impact on Router Design 10Gb/s linecard with 200,000 x 56kb/s flows Rule-of-thumb: Buffer = 2.5Gbits Requires external, slow DRAM Becomes: Buffer = 6Mbits Can use on-chip, fast SRAM Completion time halved for short-flows 40Gb/s linecard with 40,000 x 1Mb/s flows Rule-of-thumb: Buffer = 10Gbits Becomes: Buffer = 50Mbits For more details… “Sizing Router Buffers – Guido Appenzeller, Isaac Keslassy and Nick McKeown, to appear at SIGCOMM 2004
Open Questions Since buffers can be made much smaller than the rule-of-thumb, can we make all-optical buffers? How small can buffers be? What is the congestion control algorithm that minimizes the buffer size?