CPE 401 / 601 Computer Network Systems Lecture 9 SCTP Sockets CPE 401 / 601 Computer Network Systems slides are modified from Janardhan Iyengar, John Rumsey, Nimish Vartak
Where is SCTP in the stack? Application Application kernel user-level SCTP SCTP Socket API Socket API Transport UDP TCP UDP TCP IP Wifi Eth IP Wifi Eth IP IP IP IP IP IP
SCTP – Motivation New applications Migration from PSTN to Packet based Internet Telephony signaling messages Shortcomings of existing protocols TCP “head-of-line blocking” Byte-oriented, not message-oriented Multi-homing support not built in DoS attack prone UDP No Reliability Absence of congestion control Absence of flow control SCTP
SCTP – Overview “SCTP is a reliable transport protocol operating on top of a connectionless packet network such as IP. …” RFC 2960 Has built-in support for multi-homed hosts Is message-based – conserves the message boundaries. Classifies messages as: sequenced delivery of user messages within multiple streams with an option for un-ordered delivery of individual user messages Additional security mechanisms SCTP
SCTP Feature Summary Start with TCP: reliable (retransmissions) congestion controlled connection oriented Add: 4-way handshake to reduce vulnerability to DOS attacks framing preserve message boundaries multistreaming instead of one ordered stream, up to 64K independent ordered streams multihoming instead of one IP address per endpoint a set of IP addresses per endpoint Note: will discuss SCTP
TCP Connection Setup A B SYN SYN-ACK ACK data closed t=0 listen SYN sent SYN-ACK 1RTT SYN recd (TCB created) ACK data established estab’d
SYN Flooding Attack attackers Flooded!! victim SYN TCB TCB SYN TCB TCB 130.2.4.15 128.3.4.5 SYN TCB TCB 228.3.14.5 192.10.2.8 SYN TCB TCB 190.13.4.1 SYN 221.3.5.10 TCB Unavailable, reserved resources There is no ACK in response to the SYN-ACK, hence connection remains half-open Other genuine clients cannot open connections to the victim The victim is unable to provide service
SCTP – Features (contd …) Connection setup (SYN) INIT (SYN-ACK) INIT-ACK COOKIE-ECHO COOKIE-ACK End-Point A End-Point Z SCTP
What’s in a cookie? Information from original INIT Information from current INIT-ACK Timestamp Life span of cookie (Time to live) Signature for authentication (SHA-1, MD5, etc.) SCTP
SCTP Association Setup closed V: Verification tag I: Initiate tag A B t=0 INIT (V=0) (I=TagA) cookie wait INIT–ACK (V=TagA) (I=TagB) (StateCookie) closed 1RTT COOKIE–ECHO (V=TagB) (StateCookie) cookie echoed COOKIE–ACK (V=TagA) 2RTT estab’d data (V=TagB) established
SCTP – Features (contd …) Connection close SHUTDOWN SHUTDOWN-ACK SHUTDOWN-CMPL End-Point A End-Point Z No Half Closed State SCTP
Graceful Shutdown A B App signals shutdown Shutdown pending (pending data) Shutdown pending SHUTDOWN Shutdown received Shutdown sent (pending data) SHUTDOWN-ACK Shutdown-Ack sent SHUTDOWN-COMPLETE Closed Closed
SCTP state diagram SCTP COOKIE_WAIT CLOSED COOKIE_ECHOED ESTABLISHED SHUTDOWN- PENDING SHUTDOWN- PENDING SHUTDOWN- SENT SHUTDOWN- ACK-SENT CLOSED SCTP
SCTP Feature Summary Start with TCP: reliable (retransmissions) congestion controlled connection oriented Add: 4-way handshake to reduce vulnerability to DOS attacks framing preserve message boundaries multistreaming instead of one ordered stream, up to 64K independent ordered streams multihoming instead of one IP address per endpoint a set of IP addresses per endpoint Note: will discuss SCTP
Message Boundaries UDP honors message boundaries Each app message becomes a datagram TCP does not honor message boundaries App messages become part of a byte stream SCTP maintains message boundaries Each app message is maintained as one or more data chunks SCTP
Chunks in SCTP An SCTP packet forms the payload of an IP packet An SCTP packet consists of a 12 byte common header and one or more “Chunks” Control chunks bundled before Source Port Destination Verification Tag Checksum Type Length Value SCTP Header Flags Chunk 1 Chunk N 4 4 2 2 N 2 1 1 The CHUNKS can contain control messages or user data. SCTP
SCTP Header Source Port & Destination Port Verification Tag Checksum Uses same port concept as TCP and UDP Verification Tag Exchanged between endpoints at startup To Validate the sender Checksum Protected by 32 bit checksum (CRC32 algorithm) SCTP Header Source Port Verification Tag Destination Port Checksum 4 4 2 2 Verification Tag There are two Verification Tags used (one in each direction). The initiator of the association specifies a Initiation TAG number in the INIT Chunk. The value of this TAG is used as the Verification Tag on all SCTP packets transmitted by the receiver of the INIT Chunk. The receiver of the INIT Chunk will respond with an INIT ACL chunk which contains a Initiation TAG number in the INIT-ACK Chunk. The value of this TAG is used as the Verification Tag on all SCTP packets transmitted by the receiver of the INIT-ACK Chunk. SCTP
SCTP Chunks Type Flags Length Value Used to distinguish data chunks and different types of control chunks Flags Usage depends on Chunk type Length Required because chunks have a variable length Value Payload field Chunk Value Flags Length Type N 2 1 1 SCTP
INIT Chunk Type = 1 Chunk Flags Number of Inbound Streams Chunk Length Initiate Tag Advertised Receiver Window Credit (a_rwnd) Optional/Variable-Length Parameters Initial Transmission Sequence Number (TSN) Number of Outbound Streams Initiate Tag: The receiver of the INIT records the value of the Initiate Tag parameter. This value MUST be placed into the Verification Tag field of every SCTP packet that the receiver of the INIT transmits within this association. Advertised Receiver Window Credit (a_rwnd): This value represents the dedicated buffer space the sender has reserved for this association. During the life of the association this buffer space SHOULD not be lessened; however, an endpoint MAY change the value of a_rwnd it sends in SACK chunks. Number of Outbound Streams (OS): Defines the number of outbound streams the sender of this INIT chunk wishes to create in this association. I think you can have a maximum of 65,000 streams. Number of Inbound Streams (MIS) : Defines the maximum number of streams the sender of this INIT chunk allows the peer end to create in this association. Note: There is no negotiation of the actual number of streams but instead the two endpoints will use the min (requested, offered). Initial TSN (I-TSN) : Defines the initial TSN that the sender will use. The valid range is from 0 to 4294967295. This field MAY be set to the value of the Initiate Tag field. Optional Parameters IPv4 Address Parameter IPv6 Address Parameter Cookie Preservation Parameter Host Name Address SCTP
Data Chunk Type = 0 Reserv. Stream Sequence Number N Length Transmission Sequence Number (TSN) User Data (seq. n of Stream S) Payload Protocol Identifier Stream Identifier S U B E U bit: The (U)nordered bit, if set to '1', indicates that this is an unordered DATA chunk, and there is no Stream Sequence Number assigned to this DATA chunk. Therefore, the receiver MUST ignore the Stream Sequence Number field. After re-assembly (if necessary), unordered DATA chunks MUST be dispatched to the upper layer by the receiver without any attempt to re-order. If an unordered user message is fragmented, each fragment of the message MUST have its U bit set to '1'. B bit: The (B)eginning fragment bit, if set, indicates the first fragment of a user message. E bit: The (E)nding fragment bit, if set, indicates the last fragment of a user message. Length: This field indicates the length of the DATA chunk in bytes from the beginning of the type field to the end of the user data field excluding any padding. A DATA chunk with no user data field will have Length set to 16 (indicating 16 bytes). TSN : This value represents the TSN for this DATA chunk. The valid range of TSN is from 0 to 4294967295 (2**32 - 1). TSN wraps back to 0 after reaching 4294967295. Stream Identifier S: Identifies the stream to which the following user data belongs. Stream Sequence Number n: This value represents the stream sequence number of the following user data within the stream S. Valid range is 0 to 65535. When a user message is fragmented by SCTP for transport, the same stream sequence number MUST be carried in each of the fragments of the message. Payload Protocol Identifier: This value represents an application (or upper layer) specified protocol identifier. This value is passed to SCTP by its upper layer and sent to its peer. This identifier is not used by SCTP but can be used by certain network entities as well as the peer application to identify the type of information being carried in this DATA chunk. This field must be sent even in fragmented DATA chunks (to make sure it is available for agents in the middle of the network). The value 0 indicates no application identifier is specified by the upper layer for this payload data. User Data: This is the payload user data. The implementation MUST pad the end of the data to a 4 byte boundary with all-zero bytes. Any padding MUST NOT be included in the length field. A sender MUST never add more than 3 bytes of padding. SCTP
Selective Acknowledgement DATA chunk TSN=109 DATA chunk TSN=110 DATA chunk TSN=111 DATA chunk TSN=112 DATA chunk TSN=113 DATA chunk TSN=114 X DATA chunk TSN=115 X DATA chunk TSN=116 DATA chunk TSN=117 DATA chunk TSN=118 DATA chunk TSN=119 DATA chunk TSN=120 DATA chunk TSN=121 DATA chunk TSN=122 X DATA chunk TSN=123 DATA chunk TSN=124 DATA chunk TSN=125 DATA chunk TSN=126 SACK chunk Cumulative TSN=113 Gap Ack Block #1 Start = +3 End = +9 Gap Ack Block #2 Start = +11 End = +13
SACK Chunk Type = 3 Chunk Flags Number of Duplicate TSNs = X Chunk Length Cumulative Transmission Sequence Number (TSN) Ack Advertised Receiver Window Credit (a_rwnd) Duplicate Transmission Sequence Number (TSN) 1 Number of Gap Ack Blocks = N Gap Ack Block #1 End Gap Ack Block #1 Start Gap Ack Block #N End Gap Ack Block #N Start Duplicate Transmission Sequence Number (TSN) N Chunk Flags: Set to all zeros on transmit and ignored on receipt. Cumulative TSN Ack: This parameter contains the TSN of the last DATA chunk received in sequence before a gap. Advertised Receiver Window Credit (a_rwnd): This field indicates the updated receive buffer space in bytes of the sender of this SACK. Number of Gap Ack Blocks: Indicates the number of Gap Ack Blocks included in this SACK. Number of Duplicate TSNs: This field contains the number of duplicate TSNs the endpoint has received. Each duplicate TSN is listed following the Gap Ack Block list. Gap Ack Blocks: These fields contain the Gap Ack Blocks. They are repeated for each Gap Ack Block up to the number of Gap Ack Blocks defined in the Number of Gap Ack Blocks field. All DATA chunks with TSNs greater than or equal to (Cumulative TSN Ack + Gap Ack Block Start) and less than or equal to (Cumulative TSN Ack + Gap Ack Block End) of each Gap Ack Block are assumed to have been received correctly. Gap Ack Block Start: Indicates the Start offset TSN for this Gap Ack Block. To calculate the actual TSN number the Cumulative TSN Ack is added to this offset number. This calculated TSN identifies the first TSN in this Gap Ack Block that has been received. Gap Ack Block End: Indicates the End offset TSN for this Gap Ack Block. To calculate the actual TSN number the Cumulative TSN Ack is added to this offset number. This calculated TSN identifies the TSN of the last DATA chunk received in this Gap Ack Block. Duplicate TSN: Indicates the number of times a TSN was received in duplicate since the last SACK was sent. Every time a receiver gets a duplicate TSN (before sending the SACK) it adds it to the list of duplicates. The duplicate count is re-initialized to zero after sending each SACK. For example, if a receiver were to get the TSN 19 three times it would list 19 twice in the outbound SACK. After sending the SACK if it received yet one more TSN 19 it would list 19 as a duplicate once in the next outgoing SACK. SCTP
SCTP Feature Summary Start with TCP: reliable (retransmissions) congestion controlled connection oriented Add: 4-way handshake to reduce vulnerability to DOS attacks framing preserve message boundaries multistreaming instead of one ordered stream, up to 64K independent ordered streams multihoming instead of one IP address per endpoint a set of IP addresses per endpoint Note: will discuss SCTP
Multi-streaming A.k.a. partial ordering. Eliminates Head of Line (HOL) blocking In TCP, all data must be sent in order; loss at head of line delays delivery of subsequent data In SCTP, you can send over up to 64K independent streams, each ordered independently A loss on one stream does not delay the delivery on other streams i.e. multi-streaming eliminates HOL blocking SCTP
Head-of-Line Blocking in TCP S R R’s App 1 2 ACK 2 3 1 4 ACK 3 2 5 ACK 3 6 ACK 3 ACK 3 PDU 3 is blocking the head of the line.
Head-of-line Blocking TCP provides a single data stream When a segment is lost, subsequent segments must wait to be processed. Problem for some applications (telephony) SCTP provides multiple independent streams per association SCTP
SCTP Multistreaming Logical separation of data within an assoc Designed to prevent head-of-line blocking Can be used to deliver multiple objects belonging to the same assoc Eg: objects on a webpage, multimedia streams (audio/video/text), files in an FTP mget
SCTP Feature Summary Start with TCP: reliable (retransmissions) congestion controlled connection oriented Add: 4-way handshake to reduce vulnerability to DOS attacks framing preserve message boundaries multistreaming instead of one ordered stream, up to 64K independent ordered streams multihoming instead of one IP address per endpoint a set of IP addresses per endpoint Note: will discuss SCTP
Multi-homing Internet End-Point A End-Point Z In TCP, connections made between <IP addr,port> and <IP addr, port> If a host is multi-homed, you have to choose ONE IP Addr only, at each end If that interface goes down, so does the connection With SCTP, you can list as many IP addresses per endpoint as you like If host is still reachable through ANY of those addresses, connection stays up. SCTP
SCTP Multi-Homing Multiple src/dest ip addresses IP network IP A2 IP B2 IP B1 IP B3 IP A1 Multiple src/dest ip addresses Use of different physical paths not guaranteed Peer reachability and path status are monitored (heartbeat) One selectable default destination Parameters per path (cwnd, ssthresh, RTT) SCTP
What is SCTP Multihoming? Host A A1 A2 Host B B1 B2 Internet ISP Hosts pick 1 of 4 possible TCP connections: {(A1, B1), (A1, B2), (A2, B1), (A2, B2)} Hosts use 1 SCTP association: ({A1,A2}, {B1,B2}) Selectable “primary” dest: Host A → B1 ; Host B → A1 New data sent only to primary destination SCTP
Multihoming Operation SCTP Endpoint A IP address A1 IP address A2 SCTP Endpoint B IP address B1 IP address B2 DATA 1 2 SACK SCTP
SCTP – Summary Well suited for Multimedia Like TCP Provides connection establishment Ensures Reliability Provisions for ordered and un-ordered data Provides Congestion Control In addition to TCP features Provides multi-homing Provides multi-streaming Has security features SCTP
SCTP Socket Types SCTP socket API comes in two forms: one-to-one and one-to-many. The one-to-many at one time was known by the “UDP style” socket. The one-to-one used to be called the a “TCP style” socket. So what is the purpose of each socket style and how can it be used? SCTP
One-to-One style The purpose of the one-to-one style socket is to provide a smooth transition mechanism for those applications running on TCP and wishing to move to SCTP. The same semantics used in TCP are used with this style. A server will typically open the socket, make a call to listen (to accept associations), and call accept, blocking upon the arrival of a new association. The only notable difference between a TCP socket and a SCTP socket is the socket call uses IPPROTO_SCTP instead of IPPROTO_TCP (or 0). SCTP
One-to-One Example Server int sd, newfd, sosz; struct sockaddr_in6 sin6; sosz = sizeof(sin6); sd = socket(AF_INET6, SOCK_STREAM, IPPROTO_SCTP); listen(sd, 1); while (1) { newfd = accept(sd, (struct sockaddr *)&sin6, &sosz) do_child_stuff(newfd, &sin6, sosz); } SCTP
One-to-Many style A typical server using a one-to-many style socket will do a socket() call, followed by a listen() and recvfrom(). A typical client will just sendto() the server of his choice. Note that the connect() and accept() call are not needed. The connect() call can be done by either side (server or client) but it is not needed. Note that this style is more like what a UDP client/server would look like thus the previous name. SCTP
One-to-many Example Server int sd, newfd, sosz, msg_flags; struct sockaddr_in6 sin6; struct sndrcvinfo snd_rcv; char buf[8000]; sosz = sizeof(sin6); sd = socket(AF_INET6, SOCK_SEQPKT, IPPROTO_SCTP); listen(sd, 1); while (1) { len = sctp_recvmsg(sd, buf, sizeof(buf), (sockaddr *)&sin6, &sosz, &snd_rcv, &msg_flags); do_child_stuff(newfd, buf, len, &sin6, &snd_rcv, msg_flags); } SCTP
SCTP Notifications The SCTP stack, at times, has information it may wish to share with its application (or Upper Layer Protocol ... ULP). The ULP can turn off and on specific notifications via a socket options call. By default ALL notifications are off. We can get a notification By reading data and looking at the msg_flags, if the message read is a notification, then "MSG_NOTIFICATION” is contained within the msg_flags argument upon return. SCTP
Deciphering Notifications Every Notification uses a TLV format as illustrated below: Type of notifications SCTP_ASSOC_CHANGE SCTP_PEER_ADDR_CHANGE SCTP_REMOTE_ERROR SCTP_SEND_FAILED SCTP_SHUTDOWN_EVENT .... struct sctp_tlv { u_int16_t sn_type; u_int16_t sn_flags; u_int32_t sn_length; }; SCTP
Socket Options SCTP provides a host of socket options to perform a mirad of operations. Some have unique structures others just turn things on and off with boolean's or integers. SCTP_NODELAY SCTP_MAXSEG SCTP_ASSOCINFO SCTP_AUTOCLOSE SCTP_ADAPTION_LAYER SCTP_DEFAULT_SEND_PARAM SCTP_DISABLE_FRAGMENTS ... SCTP
Extended “system calls”. sctp_connectx Allows a user to specify multiple address to attempt to connect too. sctp_bindx Allows an application to bind a set of addresses instead of one or all addresses. sctp_opt_info Some implementations do not support a getsockopt() call that allows data to be passed both ways. This call is compatible with all implementations. sctp_peeloff this call is used to convert a single association that is part of a one-to-many socket into an individual new socket descriptor that is a one-to-one socket.
Extended “system calls” sctp_getpaddrs This call will return a block of memory holding the peers addresses currently part of the association. sctp_freepaddrs This call is used to release the memory back that the sctp_getpaddrs call allocated. sctp_getladdrs This call will return a block of memory holding the local addresses bound to an association. sctp_freeladdrs This call should be used to release the memory allocated by sctp-getladdrs back to the system. SCTP
Extended “system calls” sctp_sendmsg This call will allow the caller to specify on the command line things like the stream number and other SCTPish information to be sent with a message. sctp_send This call has a similar purpose to sctp_sendmsg but instead of a large number of command line options, a sctp_sendrcvinfo structure is used to pass the relevant information. sctp_recvmsg This call (as we saw previously) is used to receive a message but also a sctp_sendrcvinfo structure with details on the message (e.g. The stream number and stream sequence number). SCTP
Summary SCTP is a new transport protocol available now in bleeding edge Linux and BSD kernels, and will make its way into the mainstream It has some cool new features SCTP