Bartek Wydrowski Steven Low

Slides:

Advertisements

Similar presentations

Martin Suchara, Ryan Witt, Bartek Wydrowski California Institute of Technology Pasadena, U.S.A. TCP MaxNet Implementation and Experiments on the WAN in.

Advertisements

Congestion Control and Fairness Models Nick Feamster CS 4251 Computer Networking II Spring 2008.

Helping TCP Work at Gbps Cheng Jin the FAST project at Caltech

One More Bit Is Enough Yong Xia, RPI Lakshmi Subramanian, UCB Ion Stoica, UCB Shiv Kalyanaraman, RPI SIGCOMM’ 05, Philadelphia, PA 08 / 23 / 2005.

FAST TCP Anwis Das Ajay Gulati Slides adapted from : IETF presentation slides Link:

Internet Protocols Steven Low CS/EE netlab.CALTECH.edu October 2004 with J. Doyle, L. Li, A. Tang, J. Wang.

Cheng Jin David Wei Steven Low FAST TCP: design and experiments.

Congestion Control Created by M Bateman, A Ruddle & C Allison As part of the TCP View project.

TCP Congestion Control Dina Katabi & Sam Madden nms.csail.mit.edu/~dina 6.033, Spring 2014.

CS 268: Lecture 7 (Beyond TCP Congestion Control) Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University.

1 End to End Bandwidth Estimation in TCP to improve Wireless Link Utilization S. Mascolo, A.Grieco, G.Pau, M.Gerla, C.Casetti Presented by Abhijit Pandey.

Restricted Slow-Start for TCP William Allcock 1,2, Sanjay Hegde 3 and Rajkumar Kettimuthu 1,2 1 Argonne National Laboratory 2 The University of Chicago.

Ahmed El-Hassany CISC856: CISC 856 TCP/IP and Upper Layer Protocols Slides adopted from: Injong Rhee, Lisong Xu.

Router-assisted congestion control Lecture 8 CS 653, Fall 2010.

CUBIC : A New TCP-Friendly High-Speed TCP Variant Injong Rhee, Lisong Xu Member, IEEE v 0.2.

CS268: Beyond TCP Congestion Control Ion Stoica February 9, 2004.

Advanced Computer Networking Congestion Control for High Bandwidth-Delay Product Environments (XCP Algorithm) 1.

Congestion Control An Overview -Jyothi Guntaka. Congestion  What is congestion ?  The aggregate demand for network resources exceeds the available capacity.

XCP: Congestion Control for High Bandwidth-Delay Product Network Dina Katabi, Mark Handley and Charlie Rohrs Presented by Ao-Jan Su.

One More Bit Is Enough Yong Xia, RPI Lakshminarayanan Subramanian, UCB Ion Stoica, UCB Shivkumar Kalyanaraman, RPI SIGCOMM’05, August 22-26, 2005, Philadelphia,

The War Between Mice and Elephants Presented By Eric Wang Liang Guo and Ibrahim Matta Boston University ICNP

Control Theory in TCP Congestion Control and new “FAST” designs. Fernando Paganini and Zhikui Wang UCLA Electrical Engineering July Collaborators:

Congestion Control Tanenbaum 5.3, /12/2015Congestion Control (A Loss Based Technique: TCP)2 What? Why? Congestion occurs when –there is no reservation.

High speed TCP’s. Why high-speed TCP? Suppose that the bottleneck bandwidth is 10Gbps and RTT = 200ms. Bandwidth delay product is packets (1500.

Cheng Jin David Wei Steven Low FAST TCP: Motivation, Architecture, Algorithms, Performance.

Comparison between TCPWestwood and eXplicit Control Protocol (XCP) Jinsong Yang Shiva Navab CS218 Project - Fall 2003.

Heterogeneous Congestion Control Protocols Steven Low CS, EE netlab.CALTECH.edu with A. Tang, J. Wang, D. Wei, Caltech M. Chiang, Princeton.

FAST TCP in Linux Cheng Jin David Wei

1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.

1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.

Comparison of MaxNet and XCP: Network Congestion Control using explicit signalling Speaker: Bartek Wydrowski Compiled from work by: Lachlan Andrew (2),

Presented by Anshul Kantawala 1 Anshul Kantawala FAST TCP: From Theory to Experiments C. Jin, D. Wei, S. H. Low, G. Buhrmaster, J. Bunn, D. H. Choe, R.

Congestion Control for High Bandwidth-delay Product Networks Dina Katabi, Mark Handley, Charlie Rohrs.

FAST Protocols for Ultrascale Networks netlab.caltech.edu/FAST Internet: distributed feedback control system  TCP: adapts sending rate to congestion 

Congestion Control for High Bandwidth-Delay Product Environments Dina Katabi Mark Handley Charlie Rohrs.

Utility, Fairness, TCP/IP Steven Low CS/EE netlab.CALTECH.edu Feb 2004.

1 MaxNet and TCP Reno/RED on mice traffic Khoa Truong Phan Ho Chi Minh city University of Technology (HCMUT)

CS/EE 145A Congestion Control Netlab.caltech.edu/course.

CSE 461 University of Washington1 Topic How TCP implements AIMD, part 1 – “Slow start” is a component of the AI portion of AIMD Slow-start.

1 Lecture 14 High-speed TCP connections Wraparound Keeping the pipeline full Estimating RTT Fairness of TCP congestion control Internet resource allocation.

FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

FAST TCP in Linux Cheng Jin David Wei Steven Low California Institute of Technology.

High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.

1 IEEE Meeting July 19, 2006 Raj Jain Modeling of BCN V2.0 Jinjing Jiang and Raj Jain Washington University in Saint Louis Saint Louis, MO

Acknowledgments S. Athuraliya, D. Lapsley, V. Li, Q. Yin (UMelb) S. Adlakha (UCLA), J. Doyle (Caltech), K. Kim (SNU/Caltech), F. Paganini (UCLA), J. Wang.

Congestion Control for High Bandwidth-Delay Product Networks D. Katabi (MIT), M. Handley (UCL), C. Rohrs (MIT) – SIGCOMM’02 Presented by Cheng.

WB-RTO: A Window-Based Retransmission Timeout Ioannis Psaras Demokritos University of Thrace, Xanthi, Greece.

TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.

Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.

Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003

The Macroscopic behavior of the TCP Congestion Avoidance Algorithm.

Scalable Laws for Stable Network Congestion Control Fernando Paganini UCLA Electrical Engineering IPAM Workshop, March Collaborators: Steven Low,

TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot

XCP: eXplicit Control Protocol Dina Katabi MIT Lab for Computer Science

FAST Protocols for High Speed Network David netlab, Caltech For HENP WG, Feb 1st 2003.

FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu GNEW, CERN, March 2004.

Masaki Hirabaru (NICT) and Jin Tanaka (KDDI) Impact of Bottleneck Queue on Long Distant TCP Transfer August 25, 2005 NOC-Network Engineering Session Advanced.

Congestion Control for High Bandwidth-Delay Product Networks Dina Katabi, Mark Handley, Charlie Rohrs Presented by Yufei Chen.

Chapter 3 outline 3.1 transport-layer services

Introduction to Congestion Control

CS 268: Lecture 6 Scott Shenker and Ion Stoica

Transport Protocols over Circuits/VCs

TCP Congestion Control

Lecture 19 – TCP Performance

Fast TCP Matt Weaver CS622 Fall 2007.

So far, On the networking side, we looked at mechanisms to links hosts using direct linked networks and then forming a network of these networks. We introduced.

FAST TCP : From Theory to Experiments

TCP Congestion Control

Transport Layer: Congestion Control

Presentation transcript:

Bartek Wydrowski Steven Low FAST TCP Bartek Wydrowski Steven Low FAST TCP: motivation, architecture, algorithms, performance July 18, 2003: IETF Meeting, Vienna (Allison Mankin) July 31, 2003: ISI/USC, CA (Aaron Falk) Aug 5, 2003: Internet2 Joint Tech Meeting, University of Kansas, KS Aug 15, 2003: Disney Digital Network Roundtable (plenary), CA (Howard Liu) June 8, 2004: Google (Urs Hoelzle, VP, Operations) July 11, 2004: HP Labs, Palo Alto, CA (Xiaoyun Zhu) July 12, 2004: IBM Research, Almaden, CA (Moidin Mohiudin) Dec 7, 2004: Cisco visit (Fred Baker, Graham Holmes, Chris McGugan, Chas Smith) Feb 13, 2004: Internet2 Joint Tech Meeting, Salt Lake City, Utah (Paul Love, James Williams) netlab.CALTECH.edu

Acks & Collaborators PSC Caltech UCLA StarLight CERN SLAC Internet2 Bunn, Choe, Doyle, Hegde, Jin, Li, Low Newman, Papadoupoulous, Ravot, Singh, Tang, J. Wang, Wei, Wydrowski, Xia UCLA Paganini, Z. Wang StarLight deFanti, Winkler CERN Martin SLAC Cottrell PSC Mathis Internet2 Almes, Shalunov Abilene GigaPoP’s GATech, NCSU, PSC, Seattle, Washington Cisco Aiken, Doraiswami, McGugan, Smith, Yip Level(3) Fernes LANL Wu

Outline Background, motivation FAST TCP MaxNet, SUPA FAST Architecture and algorithms Experimental evaluations Loss recovery MaxNet, SUPA FAST

Performance at large windows ns-2 simulation 27% txq=100 txq=10000 95% 1G Linux TCP Linux TCP FAST 19% average utilization capacity = 1Gbps; 180 ms round trip latency; 1 flow C. Jin, D. Wei, S. Ravot, etc (Caltech, Nov 02) DataTAG Network: CERN (Geneva) – StarLight (Chicago) – SLAC/Level3 (Sunnyvale) 10Gbps capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps, 10Gbps; 100 ms round trip latency; 100 flows J. Wang (Caltech, June 02)

Average Queue vs Buffer Size Dummynet capacity = 800Mbps Delay =200ms 1 flows Buffer size: 50, …, 8000 pkts (S. Hedge, B. Wydrowski, etc, Caltech)

Is large queue necessary for high throughput?

Congestion control Example congestion measure pl(t) Loss (Reno) xi(t) Example congestion measure pl(t) Loss (Reno) Queueing delay (Vegas)

TCP/AQM pl(t) xi(t) AQM: DropTail RED REM/PI AVQ TCP: Reno Vegas Congestion control is a distributed asynchronous algorithm to share bandwidth It has two components TCP: adapts sending rate (window) to congestion AQM: adjusts & feeds back congestion information They form a distributed feedback control system Equilibrium & stability depends on both TCP and AQM And on delay, capacity, routing, #connections

Packet & flow level Reno TCP Packet level Flow level Equilibrium ACK: W  W + 1/W Loss: W  W – 0.5W Packet level Flow level Equilibrium Dynamics pkts (Mathis formula)

Reno TCP Packet level Flow level Flow level dynamics determines Designed and implemented first Flow level Understood afterwards Flow level dynamics determines Equilibrium: performance, fairness Stability Design flow level equilibrium & stability Implement flow level goals at packet level

Reno TCP Packet level Flow level Flow level dynamics determines Designed and implemented first Flow level Understood afterwards Flow level dynamics determines Equilibrium: performance, fairness Stability Packet level design of FAST, HSTCP, STCP guided by flow level properties

Packet level Reno AIMD(1, 0.5) HSTCP AIMD(a(w), b(w)) STCP MIMD(a, b) ACK: W  W + 1/W Loss: W  W – 0.5W Reno AIMD(1, 0.5) ACK: W  W + a(w)/W Loss: W  W – b(w)W HSTCP AIMD(a(w), b(w)) ACK: W  W + 0.01 Loss: W  W – 0.125W STCP MIMD(a, b) FAST

Flow level: Reno, HSTCP, STCP, FAST Similar flow level equilibrium pkts/sec a = 1.225 (Reno), 0.120 (HSTCP), 0.075 (STCP)

Flow level: Reno, HSTCP, STCP, FAST Common flow level dynamics! window adjustment control gain flow level goal = Different gain k and utility Ui They determine equilibrium and stability Different congestion measure pi Loss probability (Reno, HSTCP, STCP) Queueing delay (Vegas, FAST)

Implementation strategy Common flow level dynamics window adjustment control gain flow level goal = Small adjustment when close, large far away Need to estimate how far current state is wrt target Scalable Window adjustment independent of pi Depends only on current window Difficult to scale

Difficulties at large window Equilibrium problem Packet level: AI too slow, MD too drastic Flow level: required loss probability too small Dynamic problem Packet level: must oscillate on binary signal Flow level: unstable at large window 5

Problem: no target Reno: AIMD (1, 0.5) HSTCP: AIMD (a(w), b(w)) ACK: W  W + 1/W Loss: W  W – 0.5W HSTCP: AIMD (a(w), b(w)) ACK: W  W + a(w)/W Loss: W  W – b(w)W STCP: MIMD (1/100, 1/8) ACK: W  W + 0.01 Loss: W  W – 0.125W

Solution: estimate target FAST Slow Start FAST Conv Equil Loss Rec Scalable to any w*

Difficulties at large window Equilibrium problem Packet level: AI too slow, MD too drastic Flow level: required loss probability too small Dynamic problem Packet level: must oscillate on binary signal Flow level: unstable at large window

Problem: binary signal TCP oscillation

Solution: multibit signal FAST stabilized

Stable: 20ms delay Window Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking

Stable: 20ms delay Window Queue Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking

Unstable: 200ms delay Window Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking

Unstable: 200ms delay Window Queue Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking

Flow level (in)stability is robust 30% noise avg delay 16ms 20ms 30% noise avg delay 208ms 200ms

Difficulties at large window Equilibrium problem Packet level: AI too slow, MD too drastic Flow level: required loss probability too small Dynamic problem Packet level: must oscillate on binary signal Flow level: unstable at large window Use multi-bit signal ! Stablize flow dynamics !

Stability: Reno/RED x y Rf(s) F1 G1 TCP Network AQM FN GL q p Rb’(s) Small t Small c Large N RED: Small r Large delay Theorem (Low et al, Infocom’02) Reno/RED is locally stable if

Stability: scalable control F1 FN G1 GL Rf(s) Rb’(s) TCP Network AQM x y q p Theorem (Paganini, Doyle, L, CDC’01) Provided R is full rank, feedback loop is locally stable for arbitrary delay, capacity, load and topology

Stability: FAST x y Rf(s) F1 G1 TCP Network AQM FN GL q p Rb’(s) Application Stabilized TCP with current routers Queueing delay as congestion measure has right scaling Incremental deployment with ECN

Outline Background, motivation FAST TCP MaxNet, SUPA FAST Architecture and algorithms Experimental evaluations Loss recovery MaxNet, SUPA FAST

Architecture <RTT timescale RTT timescale Loss recovery

Architecture Each component designed independently upgraded asynchronously

Architecture Window Control Each component designed independently upgraded asynchronously Window Control

Window control algorithm Full utilization regardless of bandwidth-delay product Globally stable exponential convergence Fairness weighted proportional fairness parameter a

Window control algorithm

Window control algorithm target backlog measured backlog

Window control algorithm Theorem (Infocom04, CDC04, Infocom05) Mapping from w(t) to w(t+1) is contraction Global exponential convergence Full utilization after finite time Utility function: ai log xi (proportional fairness)

Outline Background, motivation FAST TCP MaxNet, SUPA FAST Architecture and algorithms Experimental evaluations Loss recovery MaxNet, SUPA FAST

Dynamic sharing: 3 flows FAST Linux Dynamic sharing on Dummynet capacity = 800Mbps delay=120ms 3 flows iperf throughput Linux 2.4.x (HSTCP: UCL)

Dynamic sharing: 3 flows FAST Linux Steady throughput HSTCP BIC

Dynamic sharing on Dummynet queue FAST Linux loss 30min throughput Dynamic sharing on Dummynet capacity = 800Mbps delay=120ms 14 flows iperf throughput Linux 2.4.x (HSTCP: UCL) HSTCP STCP

queue Room for mice ! FAST Linux loss 30min throughput HSTCP HSTCP BIC

Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts Aggregate throughput small window 800pkts large 8000 Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts Fairness Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts Stability stable in diverse scenarios Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts Responsiveness Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

I2LSR, SC2004 Bandwidth Challenge Harvey Newman’s group, Caltech http://dnae.home.cern.ch/dnae/lsr4-nov04 OC48 OC192 November 8, 2004 Caltech and CERN transferred 2,881 GBytes in one hour (6.86Gbps) between Geneva - US - Geneva (25,280 km) through LHCnet/DataTag, Abilene and CENIC backbones using 18 FAST TCP streams on Linux 2.6.9 kernel with 9000KB MTU at 174 Pbm/s

Internet2 Abilene Weather Map OC48 OC192 7.1G: GENV-PITS-LOSA-SNVA-STTL-DNVR-KSCY-HSTON-ATLA-WASH-NYCM-CHIN-GENV Newman’s group, Caltech

“Ultrascale” protocol development: FAST TCP Based on TCP Vegas Uses end-to-end delay and loss to dynamically adjust the congestion window Defines an explicit equilibrium Capacity = OC-192 9.5Gbps; 264 ms round trip latency; 1 flow BW use 30% BW use 40% BW use 50% BW use 79% Linux TCP Westwood+ BIC TCP FAST (Yang Xia, Caltech)

Periodic losses every 10mins FAST backs off to make room for Reno (Yang Xia, Harvey Newman, Caltech)

Experiment by Yusung Kim KAIST, Korea, Oct 2004 Linux Experiment by Yusung Kim KAIST, Korea, Oct 2004 Dummynet Capacity = 622Mbps Delay=200ms Router buffer size = 1BDP (11,000 pkts) 1 flow Application: iperf BIC, FAST, HSTCP, STCP, Reno (Linux), CUBIC http://netsrv.csc.ncsu.edu/yskim/single_traffic/curves/

All can achieve high throughput except Reno RTT RTT = 400ms double baseRTT FAST Throughput Yusung Kim, KAIST, Korea 10/2004 All can achieve high throughput except Reno FAST adds negligible queueing delay Loss-based control (almost) fills buffer … adding delay and reducing ability to absorb bursts BIC HSTCP

FAST needs smaller buffer at both routers and hosts queue FAST FAST cwnd Yusung Kim, KAIST, Korea 10/2004 FAST needs smaller buffer at both routers and hosts Loss-based control limited at host in these expts BIC HSTCP

Outline Background, motivation FAST TCP MaxNet, SUPA FAST Architecture and algorithms Experimental evaluations Loss recovery MaxNet, SUPA FAST

Loss Recovery Section Overview Linux & TCP loss recovery has problems; esp. in non-congestion loss environments. New Loss Architecture: Determining packet loss & PIF Decoupled window control Testing in high loss environment Receiver window issues Forward Retransmission SACK processing optimization Reorder Detection Testing in small buffer environment

New Loss Recovery Architecture New Architecture for loss recovery motivated by new environments: High loss wireless, 802.11, Satellite Low loss, but large BDP Measure of Path ‘difficulty’ should be extended BDLP: Bandwidith x Delay x (1/(1-Loss))

Periodic losses every 10mins (Yang Xia, Harvey Newman, Caltech)

Haystack - 1 Flow (Atlanta-> Japan) Iperf used to generate traffic. Sender is a Xeon 2.6 Ghz Window was constant: Burstiness in rate due to Host processing and ack spacing.

Haystack – 2 Flows from 1 machine (Atlanta -> Japan)

Linux Loss Recovery Problem All outstanding packets marked as lost. SACKs reduce lost packets 2. Lost packets retransmitted slowly as cwnd is capped at 1 (bug). Timeout

New Loss Recovery Architecture Decouple congestion control from loss recovery: No rate halving, cwnd resets, etc upon loss. Window primarily controlled by delay. Efficient Retransmit mechansim: Linux TCP does not account PIF well when there are retransmissions cwnd limits PIF, not write queue length. Accurately discriminate loss from reordering. Construct accurate way of determining packet loss and PIF.

Loss Recovery Efficient Loss recovery requires ability to accurately determine when a packet is lost and retransmit it immediately: Accurate RTT measurement to determine timeout. Accurate reordering detection. Efficient Forward Retransmission strategy. Efficient CPU utilization to keep up with the work, esp with SACK processing.

Loss Recovery – PIF model A packet i is deemed lost at time t if: SENDTIMEi - t > RTT(t) + REORDERLAG(t) To implement this timeout mechanism we construct a “Transmission-Order-Queue” (TOQ) which is a queue of of in-flight packets (sent, but not lost or acked). Tail = oldest sent packet Head = most recently sent

Loss Recovery Architecture

Loss Recovery – Window control Window control done when ACK for pkt j received: CWND = min( PIFrecv, PIFsent)  >=1 Pkt j=13 sent ACK for pkt j=13 received Data Data Sender Receiver Sender Receiver 13 12 11 10 9 8 7 1 2 3 4 5 6 13 ACKs ACKs 1 2 3 4 5 6 PIFrecv=10 PIFsent=12 7 8 10 12

Loss Recovery – Test scenario DummyNet PC Dual Xeon 3.06Ghz 2Gb FreeBSD 5.1 1-800 Mbps 0.0-0.3 Loss 40ms one-way delay Receiver PC Dual Xeon 2.6Ghz 2Gb Intel GbE Linux 2.4.22 Sender PC Dual Xeon 2.6Ghz 2Gb Intel GbE Linux 2.4.22

Forward Retransmission Receiver reorder queue: Highest seq# pkt received cwnd cwnd cwnd Packets which have been lost several times and are holding up the next acknowledgement and freeing of reorder-queue resource. Current Linux Receiver Window limits transmission speed with high loss high BDP. Forward retransmission needed to reduce require reorder and write queue sizes.

~11 BDPs or 88 Mb!! 800Mbps @ 80ms

800Mbps 0.3 Loss

10Mbps 0.3 Loss BDP = 0.08 x 833 = 66 pkts = 100kb ~600Kb

Forward Retransmission Number of Forward Retransmissions of a pkt depends at least on these factors. We can devise different algorithms to determine best FR rate. L pkt W S sk.snd_una oldest unacked pkt sk.snd_wnd Receiver advertised wnd sk.snd_nxt Next pkt to send L = sk.snd_nxt – skb.end_seq S = sk.snd_wnd – w W = sk.snd_nxt – sk.snd_una F = 1/(pass rate)

Forward Retransmission If we are short on receiver window space, that is S >> C is not true, we want to increase FR rate. (Say S = C, a pkt at L=C, if lost will delay transmission after 1 RTT). FR is proportional to C/S. 2. If L > C, that is, it has been retransmitted already (L/C times) we want to increase FR rate. FR is proportional to L/C. Therefore by 1 & 2 FR is proportional to L/S 3. We don’t want FR rate to exceed say 3.F, 3 times the expected number of retransmissions. In case S is small, we cap FR rate by 3•F•L/W. FR = min (L/S, 3•F•L/W) L = sk.snd_nxt – skb.end_seq S = sk.snd_wnd – w W = sk.snd_nxt – sk.snd_una F = 1/(pass rate) C = cwnd

SACK Processing Processing of SACK packets in Linux is CPU intensive as write queue needs to be traversed. All SKB’s prior to SACKed SKB are marked as LOST. Traversal can invalidate large amount of memory cache. TOQ allows us to eliminate LOST and RETRANSMITTED flags in SKB’s in the write queue. This allows a number of optimizations eliminates traversing write queue at each SACK. Possible to go directly to SKB and mark that as SACKed. Then finding pointer to SKB can be quite fast with SACK BLOCK pointer cache SEQ->*SKB lookup table

SACK Processing Architecture

SACK Processing CPU Utilization With SACK optimization Without SACK optimization

Packet Reordering Studies have shown reordering is common on the Internet. Some causes: multi-path due to parallelism in routers/switches load-balancing across routes

Detecting Reordering record highest seq received so far (newest_seq) (S)ACK Seq Number Arrival Order 4 3 2 1 5 5 4 3 2 1 record highest seq received so far (newest_seq) reorder(t+1) = max(reorder(t),newest_seq – seq) Easy if packet was never retransmitted; need to identify retransmitted packets carefully: If 2 retrans, and xmit time of 2nd << RTT ago, (S)ACK is for 1st xmit If multiple retrans, unique sender timestamp will identify which xmit caused (S)ACK.

Reordering Detection - Experiment Linux Traffic Control (TC-iproute2) module implemented to perform controlled packet reordering: (in all experiments P = 2 R)

Reordering Experiment: WITH REORDER DETECTION Throughput Vs Reorder Injected @ Different Link Capacities

Reordering Experiment: Reorder Detected Vs Reorder Injected @ Different Link Capacities

Conclusions Decoupling of Loss and Congestion control that FAST has facilitated has allowed the development of a new highly efficient loss recovery mechanism. By using delay as congestion measure, it is possible to achieve close to maximum possible good-put with loss.

Outline Background, motivation FAST TCP MaxNet, SUPA FAST Architecture and algorithms Experimental evaluations Loss recovery MaxNet, SUPA FAST

MaxNet: Quick Overview MaxNet is: A Fully distributed flow control architecture for large networks (no per-flow state in router) Max-Min fair in principle. Stable for networks of arbitrary topology, number of users, capacity and delay. Fast convergence properties. Addresses short-flow control. Low queuing delay, drastically reduces router buffer size requirements. Based on similar analysis as FAST TCP Incrementally deployable; Integrates with FAST TCP

Why explicit signal is useful? Queuing delay necessary with delay based protocols: Explicit protocols reduce queuing delay And Reduce requirement for router buffer sizes Explicit signal can be used to set right starting rate. Some challenges with delay based protocols: Alpha tuning baseRTT (sampling, route change etc) BQD Backward queuing delay Delay noise (OS etc) Interaction of Link layer retransmission. Interaction with Wireless coding etc.

Congestion Signal N Bits MaxNet: Packet Format MaxNet requires N bits in the packet to carry an explicit signal about the path congestion level. The routers along the packet’s path modify this congestion signal. The congestion signal controls the source’s rate. Data Congestion Signal N Bits IPV6 IPV4 TCP

MaxNet: System TCP sender TCP receiver p1 p2 Router Router P P P P Maxnet requires the participation of the Source, Router and Receiver. The source rate is controlled by a feedback value in the ACK packet. This feedback value is obtained from routers as the packet passes through MaxNet links on their way to the receiver. Each routers only remarks the packet if its congestion value is higher than in the packet, hence MaxNet. At the end of the path, the packet holds the highest congestion value off all routers along the path. TCP sender TCP receiver Data packets P P P P p1 p2 Xmit Rate Router Router P P P P ACK packets Source 1. Transmits packets at rate controlled by feedback value in ACK Pj Router 2. Computes congestion level Remarks packet if router’s congestion level is higher than level in packet Receiver 4. Relays P value and sends it back to sender in an ACK packet.

MaxNet: Source & Link Algorithm Source Algorithm D(Pj) Pj – Feedback Signal Xmit rate Xi Source receives ACK j with feedback value Pj and determines transmission rate by demand function D(Pj): Xi = D(Pj) Link Algorithm Router monitors aggregate input traffic rate Yl(t) destined for link l which has capacity Cl(t): Every 10 ms router l computes its congestion level pl: pl(t+1) = pl(t) + b(Y(t)-aC(t)) a control target link utilization b controls convergence rate 2. For every data packet k with signal Pk router conditionally remarks it: if (Pk < pl(t)) Pk=pl(t)

Computing Explicit Signal in real routers Congestion signal can be generated almost ‘implicitly’ by measuring the delay of a packet through the device. B(t+1)=B(t)+ Y(t) – C(t)  P(t+1)=P(t)+a(Y(t)-uC(t)) Input Timestamp OutputTimestamp Switch Fabric Input Queue Output Queue Advantage: Simplest to implement Disadvantage: Sufficient queuing delay must be able to build up

MaxNet & XCP Properties Criteria MaxNet XCP Rate Allocation MaxMin Fair – if all sources have the same demand function. Weighted MaxMin – if sources weight their demand function. Constrained MaxMin (less than MaxMin) Stability Provable stability for networks of arbitrary topology, RTTs, capacity and arbitrary number of flows. (Linear analysis). Single link and aggregate of flows, all with same RTT only shown. No general proof exists. Convergence Speed Linear analysis shows faster convergence than ECN, loss (RENO), delay (FAST,VEGAS) based schemes. No control analysis available. Some simulation results show faster than TCP-RENO. Router operations per packet 2 = 1 addition +1 max 12 = 3 multiplications + 1 division + 6 additions + 2 comparisons

MaxNet & XCP Properties Criteria MaxNet XCP Bits per Packet Naïve encoding: 40 Bits/pkt with naïve linear encoding. Exponential encoding: Even 20 bits per packet would give huge dynamic range. 96 Bits/pkt from BSD implementation. Incremental Deployment Yes, MaxNet can be thought of as an explicit version of FAST-TCP (where the congestion signal is implicit- delay). A combined protocol with FAST-TCP is possible which uses explict signal, delay and loss, allowing operation on paths with no explicit signal ability. Unknown Implementation progress TCP-FAST can be adopted. Linux MaxNet module in development. NS2 BSD Lossy environments Decouples loss from congestion measurement. Recent improvements to loss recovery for FAST-TCP apply equally to MaxNet. FAST-TCP was recently shown to achieve around 6Mbps goodput at 30% loss rate, on a 70ms 10Mbps link. For more information go to: http://www.cs.caltech.edu/~bartek/

XCP Max-Min Fairness

Explicit Starting Rate Having a mulit-bit field in the packet also allows for an explicit starting to be communicated from the network to the source. This would allow the source to start transmitting at a high rate after 1 RTT. A possible algorithm to determine the starting rate is: 1. Extended-SYN arrives at link: rate_i = alpha * ((C(t) – Y(t)) – aggregate_committed) aggregate_committed += rate_i set timeout for connection i: timeout = t + TO 2. 1st data packet for connection i arrives at link: set timeout for connection i: timeout = t + RTT clear timeout (state-space can be reduced if we eliminate this step) 3. Timeout for connection i occurs: aggregate_committed -= rate_i

Signaling Unified Protocol Architecture (SUPA FAST TCP) Existing protocols focus on one type of congestion signal Future of FAST TCP is to combine congestion control SUPA FAST TCP

SUPA FAST TCP Network Components Links may have one of the 4 congestion signaling abilities: A path may be a combination of any of these types of links. The challenge is how to detect the bottleneck capability and how to react in all situations.

Conclusion MaxNet provides a framework for doing explicit signal congestion control. A practical approach would involve combining different congestion signals. Evolution of Internet from loss-based protocols to explicit signaling is possible in an incremental way. Explicit protocols solve many of the challenges of using loss or delay as a congestion signal. Widespread deployment not near-term. However, specialized applications where there is pain may be deployed sooner.