Transport Protocols over Circuits/VCs Master of Engineering Presentation by Helali Bhuiyan Computer Engineering University of Virginia August 7, 07
Outline Motivation and Problem Statement Related Work Background Types of Circuits/VCs TCP over Circuits/VCs Solutions Conclusions August 7, 07
Motivation and Problem Statement High-bandwidth circuit-switched or virtual-circuit (VC) networks are being used to support eScience projects Problem Statement Design transport protocols for different types of circuits/VCs August 7, 07
Related Work Several UDP-based transport protocols have been developed specifically for circuits Reliable Blast UDP (RBUDP) Rate-Adaptive Protocol for Information Delivery (RAPID) RBUDP+ and RAPID+ To keep the circuit fully utilized, these solutions try to match their sending rates with the reserved badnwidth No congestion in circuits, hence no packet loss Multitasking at the receiving host may cause receive-buffer overflow Adjust sending rate dynamically based on the feedback received from the receiver August 7, 07
Related Work TCP (i.e., Reno) is not suitable for high-bandwidth connectionless paths Packets can be lost due to congestion within the network It takes a long time to recover from a packet-loss event Several high-speed variants of TCP have been developed Higher growth rate of the congestion window leades to lower recovery time Example: BIC TCP, FAST TCP High-speed variants of TCP aim to solve the congestion problem on connectionless-network paths Are they suitable for circuits? August 7, 07
Related Work: User-Space vs. Kernel-Space UDP-based user-space implementations Receive-buffer overflows can occur due to multitasking Receiving host needs to send loss reports A window-based kernel-level implementation, as in TCP, is a simpler solution Receiving host sends receive-buffer size within each ACK packet, which reflects the exact state of the host’s loading (multitasking) condition August 7, 07
Outline Motivation and Problem Statement Related Work Background Types of Circuits/VCs TCP over Circuits/VCs Solutions Conclusions August 7, 07
Types of Circuits/VCs Switch Switch GbE GbE SONET Interface SONET Interface Different types of circuits/VCs are possible in the data network Layer-1 circuit: a GbE (Gigabit Ethernet) port is mapped to an equivalent or lower-rate SONET circuit Layer-2 circuit: VLAN on a GbE port is mapped to a single SONET circuit Multiplexed Layer-2 circuit: multiple VLANs are mapped to the same SONET circuit August 7, 07
TCP over Circuits/VCs As TCP was originally designed for connectionless networks, several features of TCP require special attention if we want to use TCP on circuits Congestion control algorithm Slow start Congestion window is increased for each ACK received Number of outstanding packets increases, if not constrained by TCP buffers TCP send and receive buffers TCP buffers smaller than the BDP (bandwidth-delay product) of the path will result in lower throughput Congestion-window reset Congestion window is reset if connection is idle for more than one retransmission-timeout Receive-side autotuning Size of the receive-side TCP buffer increases gradually Congestion-window reduced (CWR) state Overflowing IP-transmission queue causes TCP to enter CWR state August 7, 07
TCP over Circuits: Example Switch A Switch B 155 Mbps GbE RTT = 8 ms GbE Sender Receiver SONET Interface SONET Interface Bandwidth-delay product (BDP) is 100 packets Time to emit a standard 1500 byte packet by a GbE port is 12 us At OC3 rate, it takes 80 us to forward each packet Assuming at T = 0, congestion window (cwnd) is 100, and TCP is in congestion avoidance state August 7, 07
TCP over Circuits: Example cwnd = 100 Sender Receiver Switch A Buffer T = 0 = Data = ACK August 7, 07
TCP over Circuits: Example cwnd = 100 1 Sender Receiver Switch A Buffer T = 80 us = Data = ACK August 7, 07
TCP over Circuits: Example cwnd = 100 2 1 Sender Receiver Switch A Buffer T = 160 us = Data = ACK August 7, 07
TCP over Circuits: Example cwnd = 100 … 50 49 2 1 Sender Receiver Switch A Buffer T = 4 ms = Data = ACK August 7, 07
TCP over Circuits: Example cwnd = 100 … 51 50 3 2 1 Sender Receiver Switch A Buffer T = 4 ms + 80 us = Data = ACK August 7, 07
TCP over Circuits: Example cwnd = 100 … 52 51 4 3 1 2 Sender Receiver Switch A Buffer T = 4 ms + 160 us = Data = ACK August 7, 07
TCP over Circuits: Example cwnd = 100 … 100 99 52 51 … 1 2 49 50 Sender Receiver Switch A Buffer T = 8 ms = Data = ACK August 7, 07
TCP over Circuits: Example cwnd = 100.01 … 101 100 53 52 … 2 3 50 51 Sender Receiver Switch A Buffer T = 8 ms + 80 us = Data = ACK August 7, 07
TCP over Circuits: Example cwnd = 100.99 … 199 198 151 150 … 100 101 148 149 Sender Receiver Switch A Buffer T = 16 ms = Data = ACK August 7, 07
TCP over Circuits: Example cwnd = 101 … 201 200 199 152 151 … 101 102 149 150 Sender Receiver Switch A Buffer T = 16 ms + 80 us = Data = ACK August 7, 07
TCP over Circuits: Example cwnd = 101.01 … 202 201 200 153 152 … 102 103 150 151 Sender Receiver Switch A Buffer T = 16 ms + 160 us = Data = ACK August 7, 07
TCP over Circuits: Example cwnd = 102 302 … 301 300 299 252 251 … 201 202 249 250 Sender Receiver Switch A Buffer T = 24 ms = Data = ACK August 7, 07
Experimental Results SN16000 SN16000 155 Mbps GbE RTT = 8.85 ms GbE Zelda1 Wuneng SONET Interface SONET Interface Zelda1 is in Atlanta, GA, and Wuneng is in Raleigh, NC GbE interfaces of two hosts are connected to circuit-switched gateways (SN16000) An OC3 (155 Mbps) Layer-2 circuit is set up between the two switches No PAUSE frame Bandwidth-delay product is 114 packets Per-port buffer size at each of these switches is 1MB 700 packets TCP send and receive buffer sizes in both hosts are set to 4MB August 7, 07
Experimental Results Loss Congestion window growth and instantaneous throughput plot for Reno TCP 1GB transfer August 7, 07
Experimental Results Loss Congestion window growth and instantaneous throughput plot for BIC TCP 1GB transfer August 7, 07
Outline Motivation and Problem Statement Related Work Background Types of Circuits/VCs TCP over Circuits/VCs Solutions Conclusions August 7, 07
Solutions: Tune TCP Tune TCP buffers to avoid losses Tune TCP buffers to limit the growth of the number of outstanding packets User applications are not expected to do the tuning Solution: use the application-tracing tool ptrace Ptrace traps system calls made from a user application Slow start, congestion-window reset, and receive-window autotuning are still unavoidable August 7, 07
Solutions: Circuit TCP (CTCP) CTCP is a modification of TCP, in which the congestion-control software is disabled The sender maintains a constant congestion window size, matched with the bandwidth-delay product The receiver also advertises a fixed receive window Constant window size avoids slow-start, receive-side autotuning and congestion-window reset CTCP uses TCP’s window-based flow control Packets cannot be lost due to buffer overflow August 7, 07
CTCP Results Congestion window growth and instantaneous throughput plot for CTCP August 7, 07
Conclusions Selected transport protocols to match the characteristics of different types of circuit-switched/VC networks TCP is a good base transport-protocol choice for circuit-switched/VC networks Window-based flow control solution Untuned TCP may lead to packet loss over some types of circuits Unmodified user applications can tune TCP buffers to avoid loss with the help of a process-tracing tool Circuit TCP (CTCP) is a better choice, where a fixed number of packets is kept outstanding at all times Selecting the CTCP socket requires modification to the user application Process tracing tools can also be used here to select CTCP socket August 7, 07
Thank You Questions? August 7, 07
CTCP Results Throughput values over different burst sizes Various time gaps between bursts (0s, 100ms, 1s, 2s) Retransmission-timeout (RTO) is 209ms August 7, 07
Backup Contributions CTCP code Iperf Documented CTCP v1.0 code Developed API for CTCP v1.0 Iperf Modified Iperf code to use CTCP API Amanda (Advanced Maryland Network Disk Archiver) Installed and documented a user-friendly installation guide Ptrace (Process Trace) Developed software that uses ptrace to trap system calls, and modify system call behavior CTCP experiments August 7, 07