Presentation is loading. Please wait.

Presentation is loading. Please wait.

Andrey Shomer & Daniel Yudelevich

Similar presentations


Presentation on theme: "Andrey Shomer & Daniel Yudelevich"— Presentation transcript:

1 Andrey Shomer & Daniel Yudelevich
THE THIRD TRANSPORT

2 What is SCTP? SCTP = Stream Control Transport Protocol
SCTP is a new IETF standard transport protocol (RFC2960) An Alternative to TCP and UDP

3 What’s so bad about TCP & UDP?
UDP: bare minimum just port numbers, and an optional checksum no flow control, no congestion control, no reliability or ordering TCP: a package deal flow control, congestion control, byte-stream orientation total ordering and total reliability

4 What you can do with SCTP?
Almost everything you can do with TCP and UDP Plus the following features NOT available in UDP or TCP Multi-homing Multi-streaming Message boundaries Improved SYN-flood protection Tunable parameters (Timeout, Retrans, etc.) A range of reliability and order (full to partial to none) along with congestion control

5 More Details Multi-homing improved robustness to failures
In TCP, connections made between <IP addr,port> and <IP addr, port> If a host is multi-homed, you have to choose ONE IP Addr only, at each end If that interface goes down, so does the connection With SCTP, you can list as many IP addresses per endpoint as you like If host is still reachable through ANY of those addresses, connection stays up! Multi-streaming reduced delay A.k.a. partial ordering. Eliminates Head of Line (HOL) blocking In TCP, all data must be sent in order; loss at head of line delays delivery of subsequent data In SCTP, you can send over up to 64K independent streams, each ordered independently A loss on one stream does not delay the delivery on other streams i.e. multi-streaming eliminates HOL blocking

6 More Details Message boundaries preserved easier coding
TCP repacketizes data any old way it sees fit (message boundaries not preserved) SCTP preserves message boundaries Application protocols easier to write, and application code simpler. Improved SYN-flood protection more secure TCP vulnerable to SYN flood; (techniques to combat are "bags on the side") Protection against SYN floods is built-in with SCTP (Four way handshake (4WHS) vs 3WHS in TCP) Listening sockets don't set up state until a connection is validated

7 More Details Tunable parameters (Timeout, Retrans, etc.) more flexibility Tuning TCP parameters requires system admin privs, kernel changes, kernel hacking SCTP parameters can be tuned on a socket by socket basis Congestion controlled unreliable/unordered data more flexibility TCP has congestion control, but can't do unreliable/unordered delivery UDP can do unreliable/unordered delivery, but not congestion controlled SCTP is always congestion controlled, and offers a range of services from full reliability to none, and full ordering to none. With SCTP, reliable and unreliable data can be multiplexed over same connection.

8 What you can’t do? byte-stream oriented communication
avoid congestion control UDP lets you blast away, but SCTP won't let you true on-the-wire connectionless communication connection setup required

9 Bits and Bytes – SCTP on wire
Datalink Header (e.g. Ethernet, , PPP) IP Header SCTP Common Header Chunk 1 ... one or more "chunks" Chunk N Datalink Trailer (e.g. Ethernet, , PPP)

10 SCTP Common Header Source and Destination Port: 16-bit port values
Source Port Destination Port Verification Tag CRC-32 Checksum Source and Destination Port: 16-bit port values Verification Tag: 32-bit random value selected by each endpoint in an association during setup Discriminates between two successive associations Protection mechanism against blind attackers

11 Chunks Chunk Type: 8-bit value indicating the type of chunk
Chunk Flags Chunk Length Chunk Data Chunk Type: 8-bit value indicating the type of chunk Chunk Flags: 8-bit flags, defined on per chunk type basis Chunk Length: 16-bit length in bytes, including the chunk type, chunk flags, and chunk length fields

12 Chunk Types There are more than 20 chunk types currently defined in SCTP (1) DATA (0x00) (2) INITIATION [INIT] (0x01) (3) INITIATION-ACKNOWLEDGMENT [INIT-ACK] (0x02) (4) SELECTIVE-ACKNOWLEDGMENT [SACK] (0x03) (5) HEARTBEAT (0x04) ... etc...

13 Connection Establishment
In TCP, the communication relationship between two endpoints is called a connection In SCTP, we would call this an association Socket pair: { <Local IP addr, port>, <Remote IP addr, port> } An SCTP association can be represented as a pair of SCTP endpoints assoc = { [ : 2223], [ , : 80] }

14 Handshake Endpoint A Endpoint Z Association Is Up INIT INIT-ACK
COOKIE-ECHO COOKIE-ACK Association Is Up

15 Data Chunk Type=0x00 Flags=UBE Length=variable TSN Value Stream Identifier Stream Sequence Num Variable Length User Data Payload Protocol Identifier Flag Bits: U – Unordered Data B – Begin E-End (for fragmentation) TSN: transmission sequence num for ordering, reassembly, retransmission Stream Identifier: the stream number for this DATA Stream Sequence Number: orders this DATA chunk within the stream Payload Protocol Identifier: opaque value used by the endpoints User Data: the user message (or portion of)

16 Byte-stream vs. Messages
When data is transferred in TCP, the user gets a stream of bytes (not to be confused with SCTP streams) Users must “frame” their own messages if they are not transfering a stream of bytes (ftp might be considered an application that sends a stream of bytes) An SCTP user will send and receive messages. All message boundaries are preserved A user will always read either ALL of a message or in some cases part of a message

17 Using Streams Streams are a powerful mechanism that allows multiple ordered flows of messages within a single association. Messages are sent in their respective streams and if a message in one stream is lost, it will not hold up delivery of a message in the other streams The application specifies the stream number to send a message on using its API interface

18 IP Multi-homing IP Network
NI-1 NI-2 Endpoint-1 Endpoint-2 IP Network When a peer is multi-homed, a “primary destination address” is selected by the SCTP endpoint By default, all data is sent to this primary address. When the primary address fails, the sender selects an alternate primary address until it is restored or the user changes the primary address.

19 Failure Detection SCTP has two methods of detecting fault:
Heartbeats Data retransmission thresholds Two types of faults can be discovered: An unreachable address An unreachable peer

20 Unreachable Destination Address
NI-1 NI-2 Endpoint-1 Endpoint-2 IP Network X

21 Unreachable Peer: Network Failure
NI-1 NI-2 Endpoint-1 Endpoint-2 IP Network X

22 Unreachable Peer: Endpoint Failure
NI-1 NI-2 IP Network NI-1 NI-2 IP Network

23 Heartbeat Monitoring Mechanism
A HEARTBEAT is sent to any destination address that has been idle for longer than the heartbeat period A destination address is idle if no chunks that can be used for RTT updates have been sent to it e.g. usually DATA and HEARTBEAT The heartbeat period timer is reset any time a DATA or HEARTBEAT are sent The peer responds with a HEARTBEAT-ACK Just after association setup, Heartbeats will occur at a faster rate to “confirm” addresses

24 Implementations Many implementation of SCTP protocol exist, worth mentions are the Kernel implementations in FreeBSD (by Randall Stewart and Peter Lei of Cisco) and the Linux one (kernels 2.6+, sponsored by Motorola, Nokia, and IBM). Both of them support a standard TCP\UDP socket interface as well as linkable library to include more advanced SCTP features. An alternative also exists, SCTPLIB is a non-commercial project (released under GNU) working under FreeBSD, Linux, Mac OS X, Solaris and even Windows. However being a user-level implementation it requires the SCTP server to have special privileges and all the SCTP applications to register with it and communicate using the interprocess communication (IPC) which is implemented in the source code – which may affect performance in certain high load scenarios.

25 Load Sharing SCTP load sharing potentially provides significant increases in transport protocol performance and network efficiency. The article compares between 2 existing load sharing options as well as introduces a new one Note: Comparison is between transport level load sharing only

26 SCTP with loadsharing extensions (LS-SCTP)
Suggested by Ahemd Abd El Al,Tarek N. Saadawi, Myung J. Lee in ComCom (2004) New SCTP chunk types and additional association setup parameters for their load sharing extension to SCTP. Additional, path related sequence numbers and time stamps in new SCTP data chunks and acknowledgments Simplifies the handling of SCTP packets when load-sharing is used, however its not compatible with current SCTP implementations Overhead is bigger, while the same data could be extracted from sender information and correct interpretation of selective ACKs received by sender.

27 Concurrent multi-path transfer (CMT)
Suggested by Janardhan R. Iyengar, Armando L. Caro, Jr., Paul D. Amer, Gerard J. Heinz (University of Dalware) and Randall R. Stewart (Cisco, same guy who implemented the FreeBSD kernel) When the ccwnd of a current link does not allow any more data, the link is switched and data is being sent via the 2nd link (until all ccwnds are fully exploited). A problem could occur when the receiver notices the gaps in data and lats the sender know of that issue, this could cause immediate unnecessary multiple fast-retransmissions and needlessly reduces the ccwnd size (which significantly affects the throughput). CMT deals with this issue by increasing the gap counter only when the data chunk is reported missing by an incoming SACK or when higher TSN are already acknowledged for the path on which those chunks are sent. Allows faster congestion window growth using a new field : CTSNA (stores the highest TSN which was transmited on the path without discontinuity) Delaying selective acks appropriately so that unnecessary SACKs are not transmitted, flags are in use to ensure retransmissions are triggered in time

28 Path based selective acknowledgements
Published byAndreas Jungmaier and Erwin P. Rathgeb in 2006 (Telecommun Syst 31) Proposes Path Based SACKs (PB-SACK), meaning that instead of keeping a SACK counter per association (like in CMT) there is one for every link. The SCTP Receiver sends one SACK per every two PB-SACKs, however it could still send 2 SACKs for the problem described in CMT – yet a single SACK per two data chunks on average is still better. Could be described as an improvement to CMT protocol

29 Algorithm For all paths d(i), set the flag d(i).saw_new_path_sack = FALSE. For all paths d(i), set the flag d(i).new_pCTSNA = FALSE. For any path d(i), for which a data chunk has been newly acknowledged, set the flag d(i).saw_new_path_sack = TRUE. For any d(i) for which d(i).saw_new_path_sack = TRUE, find the highest TSN newly acknowledged. Store this value in d(i).highest_path_tsn_acked. For any d(i) for which d(i).saw_new_path_sack = TRUE, store the number of bytes newly acknowledged in d(i).newly acked bytes. For any d(i) for which d(i).saw_new_path_sack = TRUE, find the corresponding d(i).pCTNSA If d(i).pCTSNA was advanced by the SACK that is being processed, set the flag d(i).new_pCTSNA = TRUE. For any d(i) for which d(i).new_pCTSNA = TRUE, and for which the number of outstanding bytes is higher than the congestion window d(i).cwnd, increase the congestion window as required in sections and of RFC 2960, e.g., in slow start, if the number of outstanding bytes on d(i) exceeds d(i).cwnd, the congestion window is increased by d(i).cwnd+ = min(d(i).newly_acked_bytes, d(i).pMTU). If the SACK chunk contains gap reports, check for any data chunk t remaining in the retransmission queue that is reported missing and was sent to path dt , whether dt.saw_new_path sack= TRUE and t < dt.highest_path_tsn_acked. If so, increase the gap counter for t. If this counter reaches the threshold (e.g., 4), perform a fast retransmission as per Section of RFC 2960.

30 The Simulation Simulated in OPNET Two 34Mbit/S links (as in E3)
Path 1 – 10ms delay, Path 2 – variable delay between 10ms and 200ms

31 Simulation Results Note: ICMP is taken into account

32 Simulation Results

33 Simulation Results

34 Conclusions Consider a parameter rd as delay2/delay1.
Throughput: For rd<9 we benefit from multi-homing (notice that PB-SACK is still better than CTM throughout the graph) in two identical links, afterwards it is counter-productive to use multi-homing (both algorithms reach a state where they can't guarantee an affective load distribution anymore). At this point both algorithm rely almost exclusively on path 1 and the throughtput is limited almost exclusively by the receiver window causing an increased message delay.

35 Conclusions CWND: As expected the total congestion window is higher for PB-SACK than for CMT (since the throughput is higher at the same delay), nevertheless this is not the case for rd>10. At rd=15 the CWND is drastically lower,however the throughput is higher by almost 500KB/s. This shows that the CMT way was not explicitly correct and the aggregated CWND is not the biggest factor on throughput. The more different the links become it becomes clearer that per-path based ACKs are getting better results than just optimizing the aggregate CWND size To summarize, while in similar links CMT shows great results (and LS-SCTP should not be considered since it includes incompatible protocol extensions) PB-SACK is proven superior in inhomogeneous links.


Download ppt "Andrey Shomer & Daniel Yudelevich"

Similar presentations


Ads by Google