CSS432 UDP and TCP Textbook Ch5.1 – 5.2 Professor: Munehiro Fukuda (Augmented by Instructor Nash) CSS432: Internetworking
End-to-End Protocols Readings: Chap 5.1-5.2.3, 5.5 Instructor: Rob Nash Today: E2E New Assignment, due May 19th Tools for network analysis Midterms back next week
Transport Layer This layer’s responsibility is to transform host to host communications into process to process communications. IP connects various hosts The layer above provides a per-process abstraction As one host with multiple threads may have multiple connections. I.e., a demuxing of our processes that share one nic CSS432: Internetworking
Transport Further UDP TCP RPC Sun & DCE RTP CSS432: Internetworking
More “Layerist” Theories The Data Link Layer in X.25, ATM, Frame Relay The Transport Layer in TCP/IP Introduction to the End-to-End argument If you can’t offer the service correctly and completely, it’s at the wrong layer Unless a performance optimization CSS432: Internetworking
Applications Above Transport The Transport Layer is influenced from above and below Processes want: Reliable delivery In order delivery No duplicate packets No lost or corrupt packets Arbitrarily large messages Some type of sender/receiver synch CSS432: Internetworking
IP Protocols – Below Transport Underlying best-effort network drop messages re-orders messages delivers duplicate copies of a given message limits messages to some finite size delivers messages after an arbitrarily long delay Request a retransmission of M3 and M5 M5, and M3 M5, M5, M3 M5 M2, M1, M4 M5, M4, M3, M2, M1 M5, M2, M1 M4 M4, M3 M3 CSS432: Internetworking
Simple Demultiplexor (UDP) Unreliable and unordered datagram service Adds multiplexing No flow control Endpoints identified by ports servers have well-known ports see /etc/services on Unix Header format Optional checksum psuedo header + UDP header + data SrcPort DstPort Checksum Length Data 16 31 CSS432: Internetworking
UDP: Simple Demultiplexer socket() bind() recv() Port: 13579 13579 socket() bind() recv() Port:12345 socket() sendto() int sd; sd = socket(AF_INET, SOCK_DGRAM, 0); struct sockaddr_in server; server.sin_family =AF_INET; server.sin_addr.s_addr = htonl( INADDR_ANY) server.sin_port = htons( 12345 ); bind( sd, (struct sockaddr *)&server, sizeof( server ) ); recv( sd, buf, sizeof( buf ), 0 ); struct hostent *hp, *gethostbyname( ); Server.sin_family = AF_INET; hp = gethostbyname( ); bcopy( hp->h_addr, &( server.sin_addr.s_addr ), sizeof( hp->h_length) ); sendto( sd, buf, sizeof( buf ), 0, (struct sockaddr *)&server, Packets demultiplexed UDP M5, M4, M3, M2, M1 CSS432: Internetworking
End-to-End Protocols Common end-to-end services guarantee message delivery deliver messages in FIFO order deliver at most one copy of each message support arbitrarily large messages support synchronization allow the receiver to flow control the sender support multiple application processes on each host P1 P2 P3 P4 M5, M4, M3, M2, M1 Network m5, m4, m3, m2, m1 CSS432: Internetworking
TCP Overview Connection-oriented Byte-stream app writes bytes TCP sends segments app reads bytes Full duplex Flow control: keep sender from overrunning receiver Advertised window size Congestion control: keep sender from overrunning network AIMD, slow-start Application process W rite bytes TCP Send buffer Segment T ransmit segments Read Receive buffer … CSS432: Internetworking
Sockets (Code Example)* int sd = socket(AF_INET, SOCK_STREAM, 0); int sd, newSd; sd = socket(AF_INET, SOCK_STREAM, 0); socket() connect() write() socket() connect() write() socket() bind() listen() accept() sockaddr_in server; bzero( (char *)&server, sizeof( server ) ); server.sin_family =AF_INET; server.sin_addr.s_addr = htonl( INADDR_ANY ) server.sin_port = htons( 12345 ); bind( sd, (sockaddr *)&server, sizeof( server ) ); struct hostent *host = gethostbyname( arg[0] ); sockaddr_in server; bzero( (char *)&server, sizeof( server ) ); server.sin_family = AF_INET; server.s_addr = inet_addr( inet_ntoa( *(struct in_addr*)*host->h_addr_list ) ); server.sin_port = htons( 12345 ); read() read() listen( sd, 5 ); connect( sd, (sockaddr *)&server, sizeof( server ) ); sockaddr_in client; socklen_t len=sizeof(client); while( true ) { newSd = accept(sd, (sockaddr *)&client, &len); write( sd, buf1, sizeof( buf ) ); write( sd, buf2, sizeof( buf ) ); if ( fork( ) == 0 ) { close( sd ); read( newSd, buf1, sizeof( buf1 ) ); read( newSd, buf2, sizeof( buf2 ) ); } close( newSd ); buf2, buf1 buf2, buf1 close( newsd); exit( 0 ); CSS432: Internetworking
Data Link Versus Transport Potentially connects many different hosts need explicit connection establishment and termination Potentially different RTT (Round Trip Time) need adaptive timeout mechanism Potentially long delay in network need to be prepared for arrival of very old packets Need to set MSL (Maximum Segment Lifetime) Potentially different capacity at destination need to accommodate different node capacity Potentially different network capacity need to be prepared for network congestion CSS432: Internetworking
Segment Format Each connection identified with 4-tuple: (SrcPort, SrcIPAddr, DsrPort, DstIPAddr) Sliding window + flow control Acknowledgment, SequenceNum, AdvertisedWinow Flags SYN, FIN, RESET, PUSH, URG, ACK SYN: Establishing a conneciton FIN: terminating a connection RESET: Confused and Terminating PUSH: Section 5.2.7 URG: Sending urgent data ACK: Validating acknowledgment field SequenceNum is incremented in all cases other than ACK. Sender Data (SequenceNum) Acknowledgment + AdvertisedWindow Receiver CSS432: Internetworking
Connection Establishment and Termination** Tree-Way Handshake Client Initiate a connection to a server with seq=x Set a timer and retransmit the request upon an expiration Server Acknowledge the client request with ack=++x Initiate a reverse connection with seq=y Acknowledge the server request with ack=++y X and y are chosen at random Protect against two incarnations of the same connection using the same sequence number. Active participant (client) Passive participant (server) SYN, SequenceNum = x SYN + ACK, SequenceNum = y , ACK, Acknowledgment = + 1 Acknowledgment = CSS432: Internetworking
State Transition Diagram** CLOSED LISTEN SYN_RCVD SYN_SENT ESTABLISHED CLOSE_WAIT LAST_ACK CLOSING TIME_WAIT FIN_WAIT_2 FIN_WAIT_1 Passive open Close Send/ SYN SYN/SYN + ACK SYN + ACK/ACK ACK /FIN FIN/ACK ACK + FIN/ACK Timeout after two segment lifetimes (2 * MSL) Active open /SYN Open Active open A client connect( ) Passive open A server listen( ) Can change active Close Active close A client or a server First close( ) Both side can be active Passive close A client or server close( ) in response to the first close( ) CSS432: Internetworking
State Transition Diagram In what condition can the state transit from FIN_WAIT_1 to TIME_WAIT? What is the purpose of the TIME_WAIT state? CSS432: Internetworking
Timing Chart Client Server Establishment Transfer Data Termination ( connect( ) ) SYN_SENT LISTEN ( listen( ) ) SYN seq=x SYN_RCVD Establishment SYN seq=y, ACK=x + 1 ESTABLISHED ACK=y + 1 ESTABLISHED ( write( ) ) seq=x+1 ACK=y + 1 ( read( ) ) Transfer Data ACK x + 2 ( close( ) ) FIN_WAIT_1 FIN seq=x+2 ACK=y + 1 CLOSE_WAIT ACK x + 3 Termination FIN_WAIT_2 FIN seq = y + 1 LAST_ACK( close( ) ) TIME_WAIT ACK=y + 2 Peek such a flow with tcpdump in assignment 3. CSS432: Internetworking
Sliding Window Revisited- Sending application LastByteWritten TCP LastByteSent LastByteAcked Receiving application LastByteRead LastByteRcvd NextByteExpected Sending side LastByteAcked ≤ LastByteSent LastByteSent ≤ LastByteWritten buffer bytes between [LastByteAcked, LastByteWritten] Receiving side LastByteRead < NextByteExpected NextByteExpected ≤ LastByteRcvd+1 buffer bytes between [LastByteRead, LastByteRcvd] CSS432: Internetworking
Flow Control- y Send ACK with an advertise window Sending application Receiving application TCP TCP LastByteWritten LastByteRead y LastByteAcked LastByteSent NextByteExpected LastByteRcvd LastByteRcvd – LastByteRead ≤ MaxRcvbuffer LastByteSent – LastByteAcked ≤ AdvertisedWindow AdvertisedWindow = MaxRcvBuffer – (NextByteExpected – LastByteRead) EffectiveWindow = AdvertisedWindow – (LastByteSent – LastByteAcked) Send ACK with an advertise window in response to arriving data segments as long as all the preceding bytes have also arrived and until the advertised window reaches 0. (ACK returned at the first time when it reaches 0) LastByteWritten – LastByteAcked ≤ MaxsendBuffer block sender if (LastByteWritten - LastByteAcked) + y > MaxSenderBuffer CSS432: Internetworking
Flow Control with A Slower Receiver Sending application Receiving application y y Read slow. TCP TCP LastByteWritten LastByteRead LastByteAcked LastByteSent LastByteRcvd NextByteExpected LastByteRcvd – LastByteRead ≤ MaxRcvbuffer LastByteSent – LastByteAcked ≤ AdvertisedWindow < 0 AdvertisedWindow = MaxRcvBuffer – (NextByteExpected – LastByteRead) EffectiveWindow = AdvertisedWindow – (LastByteSent – LastByteAcked) No more send, no more ack, thus it stays In the same value LastByteWritten – LastByteAcked ≤ MaxsendBuffer block sender since (LastByteWritten - LastByteAcked) + y > MaxSenderBuffer CSS432: Internetworking
Flow Control The sender won’t send any more data. The receiver won’t initiate to send any advertised window. Then, how can the sender find out when the receiver can receive more data? CSS432: Internetworking
Protection Against Wrap Around 32-bit SequenceNum MSL (Maximum Segment Lifetime) = 120sec. SequenceNum should not be wrapped around within 120 seconds. Bandwidth Time Until Wrap Around T1 (1.5 Mbps) 6.4 hours Ethernet (10 Mbps) 57 minutes T3 (45 Mbps) 13 minutes FDDI (100 Mbps) 6 minutes STS-3 (155 Mbps) 4 minutes STS-12 (622 Mbps) 55 seconds STS-24 (1.2 Gbps) 28 seconds CSS432: Internetworking
Keeping the Pipe Full 16-bit AdvertisedWindow = 64KB Assuming that RTT = 100msec, the advertised window’s capacity is at least RTT x Network bandwidth Bandwidth RTT(100msec) x Bandwidth Product T1 (1.5 Mbps) 18KB Ethernet (10 Mbps) 122KB T3 (45 Mbps) 549KB FDDI (100 Mbps) 1.2MB STS-3 (155 Mbps) 1.8MB STS-12 (622 Mbps) 7.4MB STS-24 (1.2 Gbps) 14.8MB CSS432: Internetworking
Segment Transmission A segment is transmitted out: When a segment to send reaches Maximum segment size (MMS) = Maximum Transfer Unit (MTU) When a TCP receives a push operation that flushes the unsent data (Peek with tcpdump in programming assignment 3) When a timer expires A trigger by a timer may cause the silly window syndrome. CSS432: Internetworking
Silly Window Syndrome small MMS 2 Sender MMS 1 Receiver Ad Window Ad Window If a sender aggressively takes advantage of any available window, The receiver empties every window regardless of its size and thus small windows will never disappear. The problem occurs only when either the sender transmits a small segment or the receiver opens the window a small amount The receiver can delay ACKs to make a larger window How long does it wait? The sender should make a decision Nagle’s Algorithm (Programming assignment 3) CSS432: Internetworking
Nagle’s Algorithm (self-clocking) When the application produces data to send { if both the available data and the window ≥ MSS send a full segment // no problem, send it right now else if there is unAcked data in transit // a full empty segment returns soon buffer the new data until an ACK arrives else // the receiver won’t reply. Can’t avoid the silly window syndrome send all the new data now } Ack works as a timer to fire a new segment transmission. The algorithm may not respond to interactive or real-time applications TCP_NODELAY Option: Transmit data as soon as possible setsockopt(sockfd, SOL_TCP, TCP_NODELAY, &intFlag, sizeof(intFlag)) CSS432: Internetworking
How to Estimate RTT Timeout is necessary To retransmit a lost packet To detect congestions and invoke AIMD. RTT estimation algorithms (dynamic) Original Algorithm Karn/Partridge Algorithm Jacobson/Karels Algorithm CSS432: Internetworking
Original Algorithm Measure SampleRTT for each segment/ ACK pair Compute weighted average of RTT EstRTT = a x EstRTT + b x SampleRTT where a + b = 1 a between 0.8 and 0.9 b between 0.1 and 0.2 Set timeout based on EstRTT TimeOut = 2 x EstRTT Why double? EstRTT cannot respond to deviated SampleRTT quickly. CSS432: Internetworking
Karn/Partridge Algorithm Sender Receiver Sender Receiver Original transmission Original transmission TT TT ACK Retransmission SampleR SampleR Retransmission ACK Do not sample RTT when retransmitting Can’t figure out which transmission the latest ACK corresponds to. Double timeout after each retransmission (similar to Exponential Boff) Congestion causes this retransmission. Modestly retransmit segments. CSS432: Internetworking
Jacobson/ Karels Algorithm Original Algorithm EstRTT = a x EstRTT + b x SampleRTT 0.8 and 0.9 0.1 and 0.2 TimeOut = EstRTT * 2 New Algorithm that takes into a consideration if the variation among sampleRTTs is large. EstRTT = EstRTT + d(SampleRTT – EstRTT) 0.125 Diff Dev = Dev + d(|SampleRTT – EstRTT| – Dev) Diff TimeOut = m x EstRTT + f x Dev 1 4 CSS432: Internetworking
Jacobson/ Karels Algorithm** EstimatedRTT = 8 EstRTT Deviation = 8 Dev SampleRTT = (SampleRTT – 8 EstRTT) / 8 = SampleRTT – EstRTT) diff SampleRTT –= (EstimatedRTT >> 3) EstimatedRTT += SampleRTT; If (SampleRTT < 0) SampleRTT = – SampelRTT; SampleRTT –= (Deviation >> 3); Deviation += SampleRTT; TimeOut = (EstimatedRTT >> 3) + (Deviation >> 1) 8 EstRTT = 8EstRTT + SampleRTT = 8EstRTT + (SampleRTT – EstRTT) EstRTT = EstRTT + 1/8 (SampleRTT – EstRTT) diff |SampleRTT – EstRTT| – 8Dev/8 |diff| Notes Algorithm does not work accurately with a large granularity of clock (500ms on Unix) Accurate timeout mechanism important to congestion control 8 Dev = 8 Dev + |SampleRTT – EsRTT| - Dev Dev = Dev + 1/8( |SampleRTT – EstRTT| - Dev ) TimeOut = 8 EstRTT / 8 + 8 Dev / 2 TimeOut = EstRTT + 4 Dev CSS432: Internetworking
Guess the TCP Extensions!** How could we deal with the retransmission issue found in the original adaptive retransmission algorithm? i.e, how to differentiate ACKs? How could we increase the sequence space? How could we increase the advertised window space? CSS432: Internetworking
TCP Extensions RTT Measurement Store a 32-bit timestamp in outgoing segments’ header option Receive an ack with the original timestamp Sample RTT by subtracting the timestamp from the current timer Resolving the quick wrap-around of sequence number The 32-bit timestamp and the 32-bit sequence number gives a 64-bit sequence space Extending an advertised window Scale up the advertised window How many bytes to send → How many 16-byte units to send CSS432: Internetworking
Reviews Exercises in Chapter 5 UDP TCP: three-way handshake and state transition Sliding window and flow control Segment transmission: silly window syndrome and Nagle’s algorithm Adaptive retransmission: original, Karn/Partridge, and Jacobson/Karels Exercises in Chapter 5 Ex. 4, 5, 22, and 39 (TCP state transition) Ex. 9(a) (Sliding window) Ex. 20 (Nagle’s algorithm) CSS432: Internetworking
Overarching Goals for HW3 (0) To use the tools to analyze network traffic and timing relationships. Details and Discussion (1) Run the given HW3 and analyze the patterns using these tools (2) Build your own HW3 that creates the same set of network transactions (3) More analysis with ttcp options CSS432: Internetworking
Fun Quotes “Using tcpdump, ttcp, netstat, and strace, you will observe how TCP segments are actually transmitted and how OS interferes with the transmission.” “Designing the system otherwise would be silly” CSS432: Internetworking
HW3 - Analysis “Man” the following tools strace netstat ttcp tcpdump System call tracing netstat Various network statistics (connections & routing) ttcp Ballistics lab TCP testing tool tcpdump A network sniffer showing all network transactions CSS432: Internetworking
HW3 Advice Start Early! Take advantage of Lab Hours. Early, small and often >> late, large, one-time Take advantage of Lab Hours. Note that HW4 will be distributed before this HW is due Also, I’ll introduce the Final Project next week CSS432: Internetworking