Download presentation
Presentation is loading. Please wait.
Published byMonica Campbell Modified over 9 years ago
1
FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu
2
Acknowledgments Caltech Bunn, Choe, Doyle, Hegde, Jayaraman, Newman, Ravot, Singh, X. Su, J. Wang, Xia UCLA Paganini, Z. Wang CERN Martin SLAC Cottrell Internet2 Almes, Shalunov MIT Haystack Observatory Lapsley, Whitney TeraGrid Linda Winkler Cisco Aiken, Doraiswami, McGugan, Yip Level(3) Fernes LANL Wu
3
Outline Motivation & approach FAST architecture Window control algorithm Experimental evaluation skip: theoretical foundation
4
Performance at large windows capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps, 10Gbps; 100 ms round trip latency; 100 flows J. Wang (Caltech, June 02) ns-2 simulation 10Gbps 27% txq=100txq=10000 95% 1G Linux TCP Linux TCP FAST 19% average utilization capacity = 1Gbps; 180 ms round trip latency; 1 flow C. Jin, D. Wei, S. Ravot, etc (Caltech, Nov 02) DataTAG Network: CERN (Geneva) – StarLight (Chicago) – SLAC/Level3 (Sunnyvale) txq=100
5
Congestion control x i (t) p l (t) Example congestion measure p l (t) Loss (Reno) Queueing delay (Vegas)
6
TCP/AQM Congestion control is a distributed asynchronous algorithm to share bandwidth It has two components TCP: adapts sending rate (window) to congestion AQM: adjusts & feeds back congestion information They form a distributed feedback control system Equilibrium & stability depends on both TCP and AQM And on delay, capacity, routing, #connections p l (t) x i (t) TCP: Reno Vegas AQM: DropTail RED REM/PI AVQ
7
Difficulties at large window Equilibrium problem Packet level: AI too slow, MD too drastic Flow level: required loss probability too small Dynamic problem Packet level: must oscillate on binary signal Flow level: unstable at large window 5
8
Packet & flow level ACK: W W + 1/W Loss: W W – 0.5W Packet level Reno TCP Flow level Equilibrium Dynamics pkts (Mathis formula)
9
Reno TCP Packet level Designed and implemented first Flow level Understood afterwards Flow level dynamics determines Equilibrium: performance, fairness Stability Design flow level equilibrium & stability Implement flow level goals at packet level
10
Reno TCP Packet level Designed and implemented first Flow level Understood afterwards Flow level dynamics determines Equilibrium: performance, fairness Stability Packet level design of FAST, HSTCP, STCP guided by flow level properties
11
Packet level ACK: W W + 1/W Loss: W W – 0.5W Reno AIMD(1, 0.5) ACK: W W + a(w)/W Loss: W W – b(w)W HSTCP AIMD(a(w), b(w)) ACK: W W + 0.01 Loss: W W – 0.125W STCP MIMD(a, b) FAST
12
Flow level: Reno, HSTCP, STCP, FAST Similar flow level equilibrium = 1.225 (Reno), 0.120 (HSTCP), 0.075 (STCP) pkts/sec (Mathis formula)
13
Flow level: Reno, HSTCP, STCP, FAST Different gain and utility U i They determine equilibrium and stability Different congestion measure p i Loss probability (Reno, HSTCP, STCP) Queueing delay (Vegas, FAST) Common flow level dynamics! window adjustment control gain flow level goal =
14
Implementation strategy Common flow level dynamics window adjustment control gain flow level goal = Small adjustment when close, large far away Need to estimate how far current state is wrt target Scalable Window adjustment independent of p i Depends only on current window Difficult to scale
15
Outline Motivation & approach FAST architecture Window control algorithm Experimental evaluation skip: theoretical foundation
16
Architecture RTT timescale Loss recovery <RTT timescale
17
Architecture Each component designed independently upgraded asynchronously
18
Architecture Each component designed independently upgraded asynchronously Window Control
19
Uses delay as congestion measure Delay provides finer congestion info Dealy scales correctly with network capacity Can operate with low queuing delay FAST-TCP basic idea Loss CWindow Queue Delay FAST Loss Based TCP
20
Window control algorithm Full utilization regardless of bandwidth-delay product Globally stable exponential convergence Fairness weighted proportional fairness parameter
21
Window control algorithm target backlogmeasured backlog
22
Outline Motivation & approach FAST architecture Window control algorithm Experimental evaluation Abilene-HENP network Haystack Observatory DummyNet
23
Abilene Test OC48 OC192 (Yang Xia, Harvey Newman, Caltech) Periodic losses every 10mins
24
(Yang Xia, Harvey Newman, Caltech) Periodic losses every 10mins
25
(Yang Xia, Harvey Newman, Caltech) Periodic losses every 10mins FAST backs off to make room for Reno
26
“ Ultrascale ” protocol development: FAST TCP FAST TCP Based on TCP Vegas Uses end-to-end delay and loss to dynamically adjust the congestion window Defines an explicit equilibrium Linux TCP Westwood+ BIC TCP FAST BW use 30% BW use 50%BW use 79% Capacity = OC-192 9.5Gbps; 264 ms round trip latency; 1 flow BW use 40% (Yang Xia, Caltech)
27
Haystack Experiments Lapsley, MIT Haystack
28
Haystack - 1 Flow (Atlanta-> Japan) Iperf used to generate traffic. Sender is a Xeon 2.6 Ghz Window was constant: Burstiness in rate due to Host processing and ack spacing. Lapsley, MIT Haystack
29
Haystack – 2 Flows from 1 machine (Atlanta -> Japan) Lapsley, MIT Haystack
30
Timeout All outstanding packets marked as lost. 1.SACKs reduce lost packets 2. Lost packets retransmitted slowly as cwnd is capped at 1 (bug). Linux Loss Recovery
31
DummyNet Experiments Experiments using emulated network. 800 Mbps emulated bottleneck in DummyNet. Sender PC Dual Xeon 2.6Ghz 2Gb Intel GbE Linux 2.4.22 DummyNet PC Dual Xeon 3.06Ghz 2Gb FreeBSD 5.1 800Mbps Receiver PC Dual Xeon 2.6Ghz 2Gb Intel GbE Linux 2.4.22
32
Dynamic sharing: 3 flows FASTLinux Dynamic sharing on Dummynet capacity = 800Mbps delay=120ms 3 flows iperf throughput Linux 2.4.x (HSTCP: UCL)
33
Dynamic sharing: 3 flows FASTLinux HSTCP BIC Steady throughput
34
FASTLinux throughput loss queue STCPHSTCP Dynamic sharing on Dummynet capacity = 800Mbps delay=120ms 14 flows iperf throughput Linux 2.4.x (HSTCP: UCL) 30min
35
FASTLinux throughput loss queue HSTCP 30min Room for mice ! HSTCP BIC
36
Average Queue vs Buffer Size Dummynet capacity = 800Mbps Delay =200ms 1 flows Buffer size: 50, …, 8000 pkts (S. Hedge, B. Wydrowski, etc, Caltech)
37
Is large queue necessary for high throughput?
38
FAST TCP: motivation, architecture, algorithms, performance. IEEE Infocom March 2004 -release: April 2004 Source freely available for any non-profit use netlab.caltech.edu/FAST
39
Aggregate throughput ideal performance Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts
40
Aggregate throughput small window 800pkts large window 8000 Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts
41
Fairness Jain’s index HSTCP ~ Reno Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts
42
Stability Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts stable in diverse scenarios
43
FAST TCP: motivation, architecture, algorithms, performance. IEEE Infocom March 2004 -release: April 2004 Source freely available for any non-profit use netlab.caltech.edu/FAST
44
BACKUP Slides
45
IP Rights Caltech owns IP rights applicable more broadly than TCP leave all options open IP freely available if FAST TCP becomes IETF standard Code available on FAST website for any non-commercial use
46
WAN in Lab Caltech: John Doyle, Raj Jayaraman, George Lee, Steven Low (PI), Harvey Newman, Demetri Psaltis, Xun Su, Yang Xia Cisco: Bob Aiken, Vijay Doraiswami, Chris McGugan, Steven Yip netlab.caltech.edu NSF
47
Key Personnel Steven Low, CS/EE Harvey Newman, Physics John Doyle, EE/CDS Demetri Psaltis, EE Cisco Bob Aiken Vijay Doraiswami Chris McGugan Steven Yip Raj Jayaraman, CS Xun Su, Physics Yang Xia, Physics George Lee, CS 2 grad students 3 summer students Cisco engineers
48
Spectrum of tools log(cost) log(abstraction) mathsimulationemulationlive nkWANiLab NS SSFNet QualNet JavaSim Mathis formula Optimization Control theory Nonlinear model Stocahstic model DummyNet EmuLab ModelNet WAIL PlanetLab Abilene NLR DataTAG CENIC WAIL etc ? …we use them all
49
Spectrum of tools mathsimulationemulationlive nk WANiLab DistanceHigh SpeedHigh Low RealismHigh Low TrafficHighLow ConfigurableLowMediumHigh MonitoringLowMediumHigh CostHighMediumLow Critical in development e.g. Web100
50
Goal State-of-the-art hybrid WAN High speed, large distance 2.5G 10G 50 – 200ms Wireless devices connected by optical core Controlled & repeatable experiments Reconfigurable & evolvable Built in monitoring capability
51
WAN in Lab 5-year plan 6 Cisco ONS15454 4 routers 10s servers Wireless devices 800km fiber ~100ms RTT V. Doraiswami (Cisco) R. Jayaraman (Caltech)
52
WAN in Lab Year-1 plan 3 Cisco ONS 15454 2 routers 10s servers Wireless devices V. Doraiswami (Cisco) R. Jayaraman (Caltech)
53
Hybrid Network Scenarios: Ad hoc network Cellular network Sensor network How optical core supports wireless edges? X. Su (Caltech)
54
Experiments Transport & network layer TCP, AQM, TCP/IP interaction Wireless hybrid networking Wireless media delivery Fixed wireless access Sensor networks Optical control plane Grid computing UltraLight
55
WAN in Lab Capacity: 2.5 – 10 Gbps Delay: 0 – 100 ms round trip Delay: 0 – 400 ms round trip Configurable & evolvable Topology, rate, delays, routing Always at cutting edge Flexible, active debugging Passive monitoring, AQM Integral part of R&A networks Transition from theory, implementation, demonstration, deployment Transition from lab to marketplace Global resource Part of global infrastructure UltraLight led by Newman Unique capabilities Calren2/Abilene Chicago Amsterdam CERN Geneva SURFNet StarLight WAN in Lab Caltech research & production networks Multi-Gbps 50-200ms delay Experiment
56
Network debugging Performance problems in real network Simulation will miss Emulation might miss Live network hard to debug WAN in Lab Passive monitoring inside network Active debugging possible
57
Passive monitoring Fiber splitter DAG RAID Timestamp Header GPS Monitor No overhead on system Can capture full info at OC48 UofWaikato’s DAG card captures at OC48 speed Can filter if necessary Disk speed = 2.5Gbps*40/1500 = 66Mbps Monitors synchronized by GPS or cheaper alternatives Data stored for offline analysis D. Wei (Caltech)
58
Passive monitoring D. Wei (Caltech) Fiber splitter DAG RAID Timestamp Header GPS Monitor Server router monitor Web100, MonALISA
59
UltraLight testbed UltraLight team (Newman)
60
Status Hardware Optical transport design: finalized IP infrastructure design: finalized (almost) Wireless infrastructure design: finalized Price negotiation/ordering/delivery: summer 04 Software Passive monitoring: summer student Management software: 2005 - Physical lab Renovation: to be completed by summer 04
61
20072006200520032004 hardware design physical building fund raising NSF funds 10/03 Status usable testbed 12/04 monitoring traffic generation connected UltraLight useful testbed 12/05 ARO funds 5/04 expansion support management
62
CS Dept Jorgensen Lab Net Lab WAN in Lab G. Lee, R. Jayaraman, E. Nixon (Caltech)
63
Summary Testbed driven by research agenda Rich and strong networking effort Integrated approach: theory + implementation + experiments “A network that can break” Integral part of real testbeds Part of global infrastructure UltraLight led by Harvey Newman (Caltech) Integrated monitoring & measurement facility Fiber splitter passive monitors MonALISA
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.