Presentation is loading. Please wait.

Presentation is loading. Please wait.

FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Similar presentations


Presentation on theme: "FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu."— Presentation transcript:

1 FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

2 Acknowledgments  Caltech Bunn, Choe, Doyle, Hegde, Jayaraman, Newman, Ravot, Singh, X. Su, J. Wang, Xia  UCLA Paganini, Z. Wang  CERN Martin  SLAC Cottrell  Internet2 Almes, Shalunov  MIT Haystack Observatory Lapsley, Whitney  TeraGrid Linda Winkler  Cisco Aiken, Doraiswami, McGugan, Yip  Level(3) Fernes  LANL Wu

3 Outline  Motivation & approach  FAST architecture  Window control algorithm  Experimental evaluation skip: theoretical foundation

4 Performance at large windows capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps, 10Gbps; 100 ms round trip latency; 100 flows J. Wang (Caltech, June 02) ns-2 simulation 10Gbps 27% txq=100txq=10000 95% 1G Linux TCP Linux TCP FAST 19% average utilization capacity = 1Gbps; 180 ms round trip latency; 1 flow C. Jin, D. Wei, S. Ravot, etc (Caltech, Nov 02) DataTAG Network: CERN (Geneva) – StarLight (Chicago) – SLAC/Level3 (Sunnyvale) txq=100

5 Congestion control x i (t) p l (t) Example congestion measure p l (t) Loss (Reno) Queueing delay (Vegas)

6 TCP/AQM  Congestion control is a distributed asynchronous algorithm to share bandwidth  It has two components TCP: adapts sending rate (window) to congestion AQM: adjusts & feeds back congestion information  They form a distributed feedback control system Equilibrium & stability depends on both TCP and AQM And on delay, capacity, routing, #connections p l (t) x i (t) TCP: Reno Vegas AQM: DropTail RED REM/PI AVQ

7 Difficulties at large window  Equilibrium problem Packet level: AI too slow, MD too drastic Flow level: required loss probability too small  Dynamic problem Packet level: must oscillate on binary signal Flow level: unstable at large window 5

8 Packet & flow level ACK: W  W + 1/W Loss: W  W – 0.5W  Packet level Reno TCP  Flow level Equilibrium Dynamics pkts (Mathis formula)

9 Reno TCP  Packet level Designed and implemented first  Flow level Understood afterwards  Flow level dynamics determines Equilibrium: performance, fairness Stability  Design flow level equilibrium & stability  Implement flow level goals at packet level

10 Reno TCP  Packet level Designed and implemented first  Flow level Understood afterwards  Flow level dynamics determines Equilibrium: performance, fairness Stability Packet level design of FAST, HSTCP, STCP guided by flow level properties

11 Packet level ACK: W  W + 1/W Loss: W  W – 0.5W  Reno AIMD(1, 0.5) ACK: W  W + a(w)/W Loss: W  W – b(w)W  HSTCP AIMD(a(w), b(w)) ACK: W  W + 0.01 Loss: W  W – 0.125W  STCP MIMD(a, b)  FAST

12 Flow level: Reno, HSTCP, STCP, FAST  Similar flow level equilibrium  = 1.225 (Reno), 0.120 (HSTCP), 0.075 (STCP) pkts/sec (Mathis formula)

13 Flow level: Reno, HSTCP, STCP, FAST  Different gain  and utility U i They determine equilibrium and stability  Different congestion measure p i Loss probability (Reno, HSTCP, STCP) Queueing delay (Vegas, FAST)  Common flow level dynamics! window adjustment control gain flow level goal =

14 Implementation strategy  Common flow level dynamics window adjustment control gain flow level goal =  Small adjustment when close, large far away Need to estimate how far current state is wrt target Scalable  Window adjustment independent of p i Depends only on current window Difficult to scale

15 Outline  Motivation & approach  FAST architecture  Window control algorithm  Experimental evaluation skip: theoretical foundation

16 Architecture RTT timescale Loss recovery <RTT timescale

17 Architecture Each component  designed independently  upgraded asynchronously

18 Architecture Each component  designed independently  upgraded asynchronously Window Control

19 Uses delay as congestion measure Delay provides finer congestion info Dealy scales correctly with network capacity Can operate with low queuing delay FAST-TCP basic idea Loss CWindow Queue Delay FAST Loss Based TCP

20 Window control algorithm  Full utilization regardless of bandwidth-delay product  Globally stable exponential convergence  Fairness weighted proportional fairness parameter 

21 Window control algorithm target backlogmeasured backlog

22 Outline  Motivation & approach  FAST architecture  Window control algorithm  Experimental evaluation Abilene-HENP network Haystack Observatory DummyNet

23 Abilene Test OC48 OC192 (Yang Xia, Harvey Newman, Caltech) Periodic losses every 10mins

24 (Yang Xia, Harvey Newman, Caltech) Periodic losses every 10mins

25 (Yang Xia, Harvey Newman, Caltech) Periodic losses every 10mins FAST backs off to make room for Reno

26 “ Ultrascale ” protocol development: FAST TCP FAST TCP  Based on TCP Vegas  Uses end-to-end delay and loss to dynamically adjust the congestion window  Defines an explicit equilibrium Linux TCP Westwood+ BIC TCP FAST BW use 30% BW use 50%BW use 79% Capacity = OC-192 9.5Gbps; 264 ms round trip latency; 1 flow BW use 40% (Yang Xia, Caltech)

27 Haystack Experiments Lapsley, MIT Haystack

28 Haystack - 1 Flow (Atlanta-> Japan) Iperf used to generate traffic. Sender is a Xeon 2.6 Ghz Window was constant: Burstiness in rate due to Host processing and ack spacing. Lapsley, MIT Haystack

29 Haystack – 2 Flows from 1 machine (Atlanta -> Japan) Lapsley, MIT Haystack

30 Timeout All outstanding packets marked as lost. 1.SACKs reduce lost packets 2. Lost packets retransmitted slowly as cwnd is capped at 1 (bug). Linux Loss Recovery

31 DummyNet Experiments  Experiments using emulated network.  800 Mbps emulated bottleneck in DummyNet. Sender PC Dual Xeon 2.6Ghz 2Gb Intel GbE Linux 2.4.22 DummyNet PC Dual Xeon 3.06Ghz 2Gb FreeBSD 5.1 800Mbps Receiver PC Dual Xeon 2.6Ghz 2Gb Intel GbE Linux 2.4.22

32 Dynamic sharing: 3 flows FASTLinux Dynamic sharing on Dummynet  capacity = 800Mbps  delay=120ms  3 flows  iperf throughput  Linux 2.4.x (HSTCP: UCL)

33 Dynamic sharing: 3 flows FASTLinux HSTCP BIC Steady throughput

34 FASTLinux throughput loss queue STCPHSTCP Dynamic sharing on Dummynet  capacity = 800Mbps  delay=120ms  14 flows  iperf throughput  Linux 2.4.x (HSTCP: UCL) 30min

35 FASTLinux throughput loss queue HSTCP 30min Room for mice ! HSTCP BIC

36 Average Queue vs Buffer Size Dummynet  capacity = 800Mbps  Delay =200ms  1 flows  Buffer size: 50, …, 8000 pkts (S. Hedge, B. Wydrowski, etc, Caltech)

37 Is large queue necessary for high throughput?

38  FAST TCP: motivation, architecture, algorithms, performance. IEEE Infocom March 2004  -release: April 2004 Source freely available for any non-profit use netlab.caltech.edu/FAST

39 Aggregate throughput ideal performance Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

40 Aggregate throughput small window 800pkts large window 8000 Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

41 Fairness Jain’s index HSTCP ~ Reno Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

42 Stability Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts stable in diverse scenarios

43  FAST TCP: motivation, architecture, algorithms, performance. IEEE Infocom March 2004  -release: April 2004 Source freely available for any non-profit use netlab.caltech.edu/FAST

44 BACKUP Slides

45 IP Rights  Caltech owns IP rights applicable more broadly than TCP leave all options open  IP freely available if FAST TCP becomes IETF standard  Code available on FAST website for any non-commercial use

46 WAN in Lab Caltech: John Doyle, Raj Jayaraman, George Lee, Steven Low (PI), Harvey Newman, Demetri Psaltis, Xun Su, Yang Xia Cisco: Bob Aiken, Vijay Doraiswami, Chris McGugan, Steven Yip netlab.caltech.edu NSF

47 Key Personnel  Steven Low, CS/EE  Harvey Newman, Physics  John Doyle, EE/CDS  Demetri Psaltis, EE Cisco  Bob Aiken  Vijay Doraiswami  Chris McGugan  Steven Yip  Raj Jayaraman, CS  Xun Su, Physics  Yang Xia, Physics  George Lee, CS  2 grad students  3 summer students  Cisco engineers

48 Spectrum of tools log(cost) log(abstraction) mathsimulationemulationlive nkWANiLab NS SSFNet QualNet JavaSim Mathis formula Optimization Control theory Nonlinear model Stocahstic model DummyNet EmuLab ModelNet WAIL PlanetLab Abilene NLR DataTAG CENIC WAIL etc ? …we use them all

49 Spectrum of tools mathsimulationemulationlive nk WANiLab DistanceHigh SpeedHigh Low RealismHigh Low TrafficHighLow ConfigurableLowMediumHigh MonitoringLowMediumHigh CostHighMediumLow Critical in development e.g. Web100

50 Goal State-of-the-art hybrid WAN  High speed, large distance 2.5G  10G 50 – 200ms  Wireless devices connected by optical core  Controlled & repeatable experiments  Reconfigurable & evolvable  Built in monitoring capability

51 WAN in Lab  5-year plan  6 Cisco ONS15454  4 routers  10s servers  Wireless devices  800km fiber  ~100ms RTT V. Doraiswami (Cisco) R. Jayaraman (Caltech)

52 WAN in Lab  Year-1 plan  3 Cisco ONS 15454  2 routers  10s servers  Wireless devices V. Doraiswami (Cisco) R. Jayaraman (Caltech)

53 Hybrid Network Scenarios:  Ad hoc network  Cellular network  Sensor network How optical core supports wireless edges? X. Su (Caltech)

54 Experiments  Transport & network layer TCP, AQM, TCP/IP interaction  Wireless hybrid networking Wireless media delivery Fixed wireless access Sensor networks  Optical control plane  Grid computing UltraLight

55  WAN in Lab Capacity: 2.5 – 10 Gbps Delay: 0 – 100 ms round trip Delay: 0 – 400 ms round trip  Configurable & evolvable Topology, rate, delays, routing Always at cutting edge  Flexible, active debugging Passive monitoring, AQM  Integral part of R&A networks Transition from theory, implementation, demonstration, deployment Transition from lab to marketplace  Global resource Part of global infrastructure UltraLight led by Newman Unique capabilities Calren2/Abilene Chicago Amsterdam CERN Geneva SURFNet StarLight WAN in Lab Caltech research & production networks Multi-Gbps 50-200ms delay Experiment

56 Network debugging  Performance problems in real network Simulation will miss Emulation might miss Live network hard to debug  WAN in Lab Passive monitoring inside network Active debugging possible

57 Passive monitoring Fiber splitter DAG RAID Timestamp Header GPS Monitor  No overhead on system  Can capture full info at OC48 UofWaikato’s DAG card captures at OC48 speed Can filter if necessary Disk speed = 2.5Gbps*40/1500 = 66Mbps  Monitors synchronized by GPS or cheaper alternatives  Data stored for offline analysis D. Wei (Caltech)

58 Passive monitoring D. Wei (Caltech) Fiber splitter DAG RAID Timestamp Header GPS Monitor Server router monitor Web100, MonALISA

59 UltraLight testbed UltraLight team (Newman)

60 Status  Hardware Optical transport design: finalized IP infrastructure design: finalized (almost) Wireless infrastructure design: finalized Price negotiation/ordering/delivery: summer 04  Software Passive monitoring: summer student Management software: 2005 -  Physical lab Renovation: to be completed by summer 04

61 20072006200520032004 hardware design physical building fund raising NSF funds 10/03 Status usable testbed 12/04 monitoring traffic generation connected UltraLight useful testbed 12/05 ARO funds 5/04 expansion support management

62 CS Dept Jorgensen Lab Net Lab WAN in Lab G. Lee, R. Jayaraman, E. Nixon (Caltech)

63 Summary  Testbed driven by research agenda Rich and strong networking effort Integrated approach: theory + implementation + experiments “A network that can break”  Integral part of real testbeds Part of global infrastructure UltraLight led by Harvey Newman (Caltech)  Integrated monitoring & measurement facility Fiber splitter passive monitors MonALISA


Download ppt "FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu."

Similar presentations


Ads by Google