BREAKING THE DATA TRANSFER BOTTLENECK

BREAKING THE DATA TRANSFER BOTTLENECK
UDT: A High Performance Data Transport Protocol Yunhong GU Laboratory for Advanced Computing National Center for Data Mining University of Illinois at Chicago October 10, 2005 udt.sourceforge.net

PROTOCOL DESIGN & IMPLEMENTATION
Outline INTRODUCTION PROTOCOL DESIGN & IMPLEMENTATION CONGESTION CONTROL PERFORMANCE EVALUATION COMPOSABLE UDT In this talk I will describe UDT's design, implementation, and evaluation. At the end I will conclude this talk with a summary of the contributions and both the short-term and long-term visions. CONCLUSIONS

>> INTRODUCTION
PROTOCOL DESIGN & IMPLEMENTATION CONGESTION CONTROL PERFORMANCE EVALUATION COMPOSABLE UDT But first, I will try to give you enough background information for this work. CONCLUSIONS

Motivations The widespread use of high-speed networks (1Gb/s, 10Gb/s, etc.) has enabled many new distributed data intensive applications Inexpensive fibers and advanced optical networking technologies (e.g., DWDM - Dense Wavelength Division Multiplexing) 10Gb/s is common in high speed network testbeds, 40 Gb/s is emerging Large volumetric datasets Satellite weather data Astronomy observation Network monitoring The Internet transport protocol (TCP) does NOT scale well as network bandwidth-delay product (BDP) increases New transport protocol is needed! In the past few years, the network bandwidth has been increased rapidly. Five years ago, a 1Gb/s link could be regarded as an extremely fast link, but today, 10Gb/s links have become very common on high speed testbeds, and 40Gb/s links are emerging. The widespread use of high speed networks has enabled many new distributed data intensive applications that were impossible in the past. Examples of large volumetric datasets involved in these applications include satellite weather data, astronomy observation data, and network monitoring data. In the past these data were usually stored in local storage and then were delivered or processed in a batch mode. Today they can be transferred to a remote site in real time and be processed there. Unfortunately, existing applications cannot automatically utilize the high speed networks when they move from the traditional Internet with relatively lower speed. TCP significantly underutilizes the bandwidth in high speed long distance networks. With this background, we need a new transport protocol to support these distributed data intensive applications.

Data Transport Protocol
Functionalities Streaming, messaging Reliability Timeliness Unicast vs. multicast Congestion control Efficiency Fairness Convergence Distributedness Applications Transport Layer Network Layer Let's have a brief review at how the applications can use the physical networks. Applications cannot make use of the physical network directly, but do so through several layers of network protocols. The transport protocols provide data transfer functionalities directly to applications. One typical example of transport functionalities is whether the data transfer is reliable, or how reliable it is. Meanwhile, some mechanics of the transport protocols are transparent to applications. Because the Internet is such a large and loosely coupled system, it is impossible to have a central node to manage all connections. Each connection will have to tune the sending rate by itself without knowing the existence of other connections. This is the Internet congestion control problem, and a good congestion control algorithm should realize relatively good efficiency and fairness throughout the whole system, no matter what the initial state is. Data link Layer Physical Layer

TCP Reliable, data streaming, unicast Congestion control
Increase congestion window size (cwnd) one full sized packet per RTT Halve the cwnd per loss event ½ Bandwidth * RTT The most common transport protocol is TCP, or transmission control protocol, which was developed in the early 70s. TCP provides reliable data streaming services in a unicast way. But here we are more interested in how TCP performs congestion control, in order to understand why it is not efficient in high bandwidth wide area networks. TCP's congestion control mechanism is called AIMD, or additive increase multiplicative decrease, because it increases the congestion window size about one full-sized packet per RTT, and halves the size once a packet loss is detected. Longer RTT means TCP needs a longer time to reach the maximum throughput, or the available network bandwidth. If there is a packet loss during this period, then TCP will never realize the optimal throughput. Because TCP's performance relies on RTT, different concurrent flows with different RTTs will have different throughput. This is another problem of TCP, also known as RTT bias. Poor efficiency in high bandwidth-delay product networks Bias on flows with larger RTT

TCP LAN US-EU US-ASIA US 1000 800 600 400 200 Throughput (Mb/s) 1000 800 600 400 200 0.01% 0.05% 0.1% 0.5% Let's use a real example to see TCP's performance. This is over a 1Gb/s link with varying RTT and loss rate. On local area networks (RTT is about 1ms) and the loss rate is low (0.01%), TCP can work well. And as loss rate increases, TCP's throughput decreases. Now let's see what happens if the RTT becomes larger. TCP's performance drops sharply, even at a low loss rate. Queue = 1000 0.01% * * 0.05% 0.1% * * 0.5% * * * 1.0% * * * Throughput (Mb/s) Packet Loss 1 10 100 200 400 Round Trip Time (ms)

Related Work TCP variants Parallel TCP Rate-based reliable UDP XCP
HighSpeed, Scalable, BiC, FAST, H-TCP, L-TCP Parallel TCP PSockets, GridFTP Rate-based reliable UDP RBUDP, Tsunami, FOBS, FRTP (based on SABUL), Hurricane (based on UDT) XCP SABUL People have developed several solutions to address the problem. The first kind of solution is quite straightforward. Since the original AIMD algorithm does not work well, a direct solution is to modify TCP using greater increase parameters and smaller decrease factors. Almost all of these high speed TCP variants fall into this category. Modifying TCP is not easy in practice. It is very hard in closed source OS such as Windows. Even in open source OS such as Linux it will require recompilation of the kernel. In the past three decades only 4 versions of TCP have been widely deployed. Because of this deployment problem, user space solutions emerge as a favorite in real world applications. Parallel TCP connections are often used to obtain high throughput. The other user space solution is to add reliability and congestion control mechanisms to UDP. XCP is a radical change to the Internet congestion control, as it requires changes in the routers. Due to the additional information from routers, XCP could work better than the other protocols, but it is also more difficult to deploy. SABUL is actually a reliable UDP-based protocol. It is the early prototype of UDT.

Problems of Existing Work
Hard to deploy TCP variants and XCP Need modifications in OS kernel and/or routers Cannot be used in shared networks Most reliable UDP-based protocols Poor fairness Intra-protocol fairness RTT fairness Manual parameter tuning Even though there are a number of new protocols, the basic problem has not been completely resolved yet. I have mentioned that TCP variants and XCP are hard to deploy and therefore it is difficult to use them in real world applications. Meanwhile, most of the reliable UDP-based protocols were only designed for private links, so the only purpose of their congestion control algorithms is to make use of as much bandwidth as possible. In addition, many of these protocols need to manually tune the parameters per network situation, which is not practical in many cases.

A New Protocol LAN US-EU US-ASIA US 1000 800 600 400 200 Throughput (Mb/s) 1000 800 600 400 200 0.01% 0.05% 0.1% 0.5% SO we want to design and develop a new protocol. This new protocol is expected to make use of the bandwidth much more efficiently than TCP, while allowing them both to share the network with fairness and friendliness. Throughput (Mb/s) Packet Loss 1 10 100 200 400 Round Trip Time (ms)

UDT (UDP-based Data Transfer Protocol)
Application level, UDP-based Similar functionalities to TCP Connection-oriented reliable duplex unicast data streaming New protocol design and implementation New congestion control algorithm Configurable congestion control framework That concludes the background information. Now I will describe the protocol itself, UDT. UDT is built above UDP at the application level, so that it can be installed with ease. It provides similar functionalities to that of TCP, so that existing applications can be moved to UDT with little effort and application developers do not need much time to learn UDT's semantics. To address efficiency and fairness, UDT provides new protocol design and implementation, as well as a new congestion control algorithm. Furthermore, UDT itself has been designed as a configurable protocol framework.

Objective & Non-objective
For distributed data intensive applications in high speed networks A small number of flows share the abundant bandwidth Efficient, fair, and friendly Configurable Easily deployable and usable Non-objective Replace TCP on the Internet However, it is not meant to replace TCP, and it is not meant to be used where TCP has been working well. The new protocol is particularly designed for distributed data intensive applications where a small number of bulk flows share the abundant optical bandwidth. The protocol has to be efficient, fair, and friendly. The protocol should be easily deployable and usable. In addition, we want it to be configurable to adapt to more specific situations.

UDT Project Open source (udt.sourceforge.net)
Design and implement the UDT protocol Design the UDT congestion control algorithm Evaluate experimentally the performance of UDT Design and implement a configurable protocol framework based on UDT (Composable UDT) This is a task list of the UDT project. In the rest of the talk, I will give a detailed description on each part.

>> PROTOCOL DESIGN & IMPLEMENTATION
INTRODUCTION >> PROTOCOL DESIGN & IMPLEMENTATION CONGESTION CONTROL PERFORMANCE EVALUATION COMPOSABLE UDT I will start with the UDT protocol design and implementation, in order to give you an overall view of the protocol. CONCLUSIONS

UDT Overview Two orthogonal elements Protocol design & implementation
The UDT protocol The UDT congestion control algorithm Protocol design & implementation Functionality Efficiency Congestion control algorithm Efficiency, fairness, friendliness, and stability In fact, the UDT project can be treated as two orthogonal elements: the UDT protocol and the UDT congestion control algorithm. The UDT protocol can work with many different congestion control algorithms, whereas our congestion control algorithm can be used in other protocols. (For example, using it in TCP to form a new TCP variant.) On the one hand, the protocol design and implementation decides the functionalities of a protocol, and it also contributes partly to the efficiency. On the other hand, the congestion control algorithm decides the characteristics of fairness, friendliness, stability, and part of efficiency. Note that both of the two elements have an impact on the efficiency. The former affects the efficiency through the usage of the processor time, whereas the latter through the tuning of data sending rate.

UDT Overview Applications UDT UDT Socket TCP Socket API Applications
UDP Socket API Applications Applications make use of transport protocols such as TCP through a socket API. To use UDT, applications interact with the UDT socket API, while UDT uses system socket API to transfer data over UDP.

Functionality Reliability Streaming Connection maintenance Duplex
Packet-based sequencing Acknowledgment and loss report from receiver ACK sub-sequencing Retransmission (based on loss report and timeout) Streaming Buffer/memory management Connection maintenance Handshake, keep-alive message, teardown message Duplex Each UDT instance contains both a sender and a receiver Let's review the functionalities first. The reliability control is provided by sequencing and acknowledgment. Each UDT packet is assigned a unique increasing sequence number. The receiver will send back acknowledgments and loss reports according to packet arrival. Lost packets will be retransmitted. The streaming service is implemented in buffer management. UDT also has a series of mechanisms to set up, maintain, and tear down a UDT connection. Finally, each UDT instance has both a sender and receiver, in order to provide duplex data transfer service.

Protocol Architecture
Sender Receiver  Sender Sender Receiver  UDP Seq. No TS Payload UDP Channel ACK Seq. No So this is an abstract view of a UDT instance. The timer is used to trigger various timeout events. In order to send user data to another UDT instance, a single UDP channel is set up. (Note that this channel only exists logically, as UDP is not connection-oriented.) Sender A sends user data in UDP packets to Receiver B. Each UDP packet is assigned a unique sequence number and a timestamp. Receiver B will periodically feedback acknowledgments and loss reports as well. NAK Loss List A B

Software Architecture
API UDP Channel Sender Receiver CC Sender's Buffer Receiver's Sender's Loss List Receiver's The protocol architecture naturally evolves into the software architecture of the UDT implementation. This diagram illustrates the structure inside one UDT instance. In addition to the sender and the receiver, an API module is used to interact with applications, and a UDP channel module is used to transfer packets over UDP. Furthermore, buffer management modules are necessary at both the sender and the receiver sides. At this point a traffic flow can be set up between two UDT instances. However, to add reliability control and congestion control onto UDP, we have two modules to manage packet loss information at both sides, and a congestion control module at the sender side to tune packet sending rate. Now this is an almost complete UDT instance. Control information is passed among these modules. Applications can set and read UDT options through the API module, whereas such configuration commands are further passed to the other related modules. A listener is needed to support the listen/accept server mode. Finally, applications will be able to control the congestion control module to configure specific control algorithms. We will describe this in the Composable UDT section. Listener

Efficiency Consideration
Less packets Timer-based acknowledging Less CPU time Reduce per packet processing time Reduce memory copy Reduce loss list processing time Light ACK vs. regular ACK Parallel processing Threading architecture Less burst in processing Evenly distribute the processing time As I have mentioned, design and implementation can affect the efficiency characteristic of a transport protocol. This is mainly due to the impact of the CPU usage. For example, to transfer data at 1Gbps, about packets have to be processed in one second. If the per packet processing is too long, the throughput will be limited by CPU; if there is a CPU time burst in packet processing, then the incoming packets may overflow the UDP buffer. We have used many techniques in the implementation according to these efficiency considerations, including memory copy avoidance, timer-based acknowledging, and so on.

Application Programming Interface (API)
Socket API New functionalities sendfile/recvfile Overlapped IO support Transparent to existing applications Recompilation needed Certain limitations exist XIO support (in Globus Toolkit 4.0) Wrapper for other programming languages Java, Python In the last slide for the UDT design and implementation, I would like to talk a little about the UDT API. The reason is not that almost half of the UDT code is to deal with API, but the API semantics is somehow part of a transport protocol. It affects the design and implementation, and it affects application development as well. We kept the traditional socket interface, with small updates, such that application developers do not need much time to learn UDT programming. In addition, we also made an effort such that in certain conditions, existing applications can turn from TCP to UDT without any modifications.

>> CONGESTION CONTROL
INTRODUCTION PROTOCOL DESIGN & IMPLEMENTATION >> CONGESTION CONTROL PERFORMANCE EVALUATION COMPOSABLE UDT Ok, hopefully by now you know basically how the UDT protocol works. Now let's review how UDT tunes its packet sending rate to realize both efficiency and fairness at the same time. CONCLUSIONS

Overview Congestion control vs. flow control
Congestion control: effectively utilize the network bandwidth Flow control: prevent the receiver from being overwhelmed by incoming packets Window-based vs. rate-based Window-based: tune the maximum number of on-flight packets (TCP) Rate-based: tune the inter-packet sending time (UDT) AIMD: additive increases multiplicative decreases Feedback Packet loss (Most TCP variants, UDT) Delay (Vegas, FAST) Before we go into the details, let me clarify several issues about UDT's congestion control algorithm. When we say "congestion control" here, we mean the mechanism to effectively utilize the network bandwidth. Another term, "flow control", is usually used to name the approach to prevent incoming traffic overwhelming the receiver. Congestion control is a more difficult problem. Congestion control can tune the data sending rate in two ways: limit the number of on-flight packets and set the inter-packet time. TCP uses the window-based approach, whereas UDT uses the rate-based approach. The most common congestion control algorithm is TCP's AIMD algorithm. We have already discussed AIMD. Another important factor in congestion control is how to indicate a network congestion. Most protocols uses packet loss information, including both TCP and UDT. The increase in round trip time delay can also indicate congestion somewhere along the network.

AIMD with Decreasing Increases
x = x + (x), for every constant interval (e.g., RTT) x = (1 - ) x, when there is a packet loss event where x is the packet sending rate. TCP (x)  1, and the increase interval is RTT.  = 0.5 AIMD with Decreasing Increase (x) is non-increasing, and limx->+ (x) = 0. AIMD has many good properties including global stability and its behavior is well understood. The AIMD algorithm can be modeled by an increase function of alpha(x) and a decrease factor of beta. For example, in TCP, alpha(x) always equal to 1 and beta is 0.5. UDT uses a special case of AIMD algorithm, in which the increase function alpha(x) is non-increasing, and approaches to zero as x increases. (x UNIT: packets/sec)

AIMD with Decreasing Increases
(x) x AIMD (TCP NewReno) UDT HighSpeed TCP Scalable TCP This figure illustrates the difference between several AIMD algorithms used in some TCP variants and UDT. The standard AIMD used in TCP is a straight line because alpha(x) is a constant value. In several high speed TCP variants, the data sending rate increases faster and faster, and the intuition is that a higher sending rate may indicate that the network bandwidth is higher so the sending rate should increase faster. However, this scheme has two problems. The network bandwidth is fixed. As the sending rate becomes faster, the available bandwidth actually becomes smaller. A high increase parameter may cause large oscillations around the optimal bandwidth utilization. The second problem is related to fairness: if the flow with higher sending rate increases faster, the system may never converge to a fairness equilibrium. This discussion leads us to use a totally different scheme. On the one hand, at the beginning when the data sending rate of a flow is slow, it should increase faster to approach the network bandwidth and catch up with those flows that have already a higher sending rate. On the other hand, when the data sending rate of a flow is high, it should increase slower to avoid large oscillations.

UDT Control Algorithm Increase
(x) = f( B - x ) * c where B is the link capacity (Bandwidth), c is a constant parameter Constant rate control interval (SYN), irrelevant to RTT SYN = 0.01 seconds Decrease Randomized decrease factor  = 1 – (8/9)n (x) x Here is the control algorithm used in UDT. The alpha(x) is related to the difference between the network bottleneck bandwidth and the current data sending rate. As x increases, (B-x) will decrease, and we want f() to be an non-decreasing function. The UDT increase function actually uses a filter to project (B-x) onto each level of 10's power, which results a stepwise function. This is to be robust to bandwidth estimation error. I will talk about this shortly after. UDT uses a constant rate control interval in order to realize RTT fairness and reach fast convergence in long RTT links. We will describe how to estimate the network bandwidth B shortly after, but now let's look at an example of how UDT increases its sending rate. (x UNIT: bits/sec)

The Increase Formula: an Example
Bandwidth (B) = 10 Gbps, Packet size = 1500 bytes x (Mbps) B - x (Mbps) Increment (pkts/SYN) [0, 9000) (1000, 10000] 10 [9000, 9900) (100, 1000] 1 [9900, 9990) (10, 100] 0.1 [9990, 9999) (1, 10] 0.01 [9999, ) (0.1, 1] 0.001 <0.1 This is an example over a 10Gb/s link, and the packet size is 1500 bytes. A UDT flow will start at 0 and increase its sending rate by 10 packets per SYN before it reaches 9Gb/s. After that, the increase parameter becomes 1 packet per SYN, and 0.1 packet per SYN, and so on. In fact, UDT can scale up to 90% of the bandwidth at approximately a constant time, which is 750 SYN, or 7.5 seconds. This makes UDT very scalable to the increase of network bandwidth and delay.

Dealing with Packet Loss
Loss synchronization Randomization method Non-congestion loss Do not decrease sending rate for the first packet loss M=5, N=2 M=8, N=3 Now we know how UDT increases its sending rate. Let's now check how it decrease its sending rate. Because UDT uses packet loss as the indication of network congestion, it is very important to carefully deal with these packet losses. There are three special situations to deal with. The first one is the loss synchronization. Once the congestion occurs, all flows may experience packet loss at the same time, therefore they will decrease the sending rate at the same time and then start to increase at the same time. As a consequence, the aggregate throughput has much larger oscillations than the individual flow. The second situation is those packet losses not caused by network congestion. We can use a randomization method to remove the loss synchronization. In this figure each dot is a loss event, and the red dot represents the first loss event in a congestion (there can be multiple loss events during one congestion). Suppose the number of loss events in one congestion is M, then we choose a random number between 1 and M, say N. Then every N packets, the sending rate is decreased by 1/9. To address the non-congestion loss problem, the sending rate is not decreased for the first loss. So if it is a non-congestion loss, then it will not affect UDT. If it is due to congestion, further loss will come if the sending rate is not decreased. The third situation is packet reordering. To deal with this situation, the loss report can be delayed to check more incoming packets. The length of this delay can be determined by historical data. Packet reordering

Bandwidth Estimation Packet Pair Filters
Packet Size / Space  Bottleneck Bandwidth Filters Cross traffic Interrupt Coalescence Robust to estimation errors Randomized interval to send packet pair We have one last unclear issue in the UDT congestion control algorithm: how to know the bandwidth B? UDT uses packet-pair based bandwidth estimation technique. UDT sends out a full sized back-to-back packet pair at randomized intervals. Once the pair of packets passes through a narrow link segment, there will be a gap between the two packets. At the receiver side, the bottleneck bandwidth can be estimated by the packet size over the inter-packet space. This is the ideal situation. In the real world there may be noises from cross traffic, host interrupt coalescence, context switch, and so on. On the one hand, we have to use median and mean filters to filter out these negative impacts. On the other hand, the errors can not be completely removed. UDT has to be robust to estimation error. That is why we use a projection filter in the congestion control formula.

>> PERFORMANCE EVALUATION
INTRODUCTION PROTOCOL DESIGN & IMPLEMENTATION CONGESTION CONTROL >> PERFORMANCE EVALUATION COMPOSABLE UDT We have described the UDT protocol and its congestion control. Let's go to the next section: performance evaluation. CONCLUSIONS

Performance Characteristics
Efficiency Higher bandwidth utilization, less CPU usage Intra-protocol fairness Max-min fairness Jain's fairness index TCP friendliness Bulk TCP flow vs Bulk UDT flow Short-lived TCP flow (slow start phase) vs Bulk UDT flow Stability (oscillations) Stability index (standard deviation) We investigate four major characteristics of the performance of UDT. We expect UDT to reach high bandwidth utilization with reasonable CPU cost. All concurrent UDT flows should share the bandwidth fairly. This means to share the bandwidth equally if there is a single bottleneck in the system. We also want UDT to be friendly to co-existing TCP flows. We consider this property in two separate situations. Because most TCP flows on the Internet are short lived flows, we check the influence of multiple background UDT flows on the short-lived TCP flows. Meanwhile, we also check the competence of concurrent TCP bulk flows and UDT bulk flows. Finally, we also considered the stability for each UDT flow. Here a better stability means less oscillations in the throughput. We have either defined or borrowed a quantitative approach for each of these performance characteristics.

Evaluation Strategies
Simulations vs. experiments NS2 network simulator, NCDM teraflow testbed Setup Network topology, bandwidth, distance, queuing, Link error rate, etc. Concurrency (number of parallel flows) Comparison (against TCP) Real world applications SDSS data transfer, high performance mining of streaming data, etc. Independent evaluation SLAC, JGN2, UvA, Unipmn (Italy), etc. We have used both simulations and real world experiments to do the evaluation. Using the NS2 network simulator, we can examine UDT's performance in a broader set of network settings. However, we believe that real world experiments bear a higher significance in evaluating a network protocol. We tested our UDT implementation in many Gb/s and 10Gb/s network testbeds, especially the NCDM's teraflow testbed. We tested UDT over a large variety of network situations, including different topologies, bandwidth, RTT, etc. We also changed the number of parallel flows. Using the quantitative approach I just described, we can compare UDT's performance against other protocols, especially TCP. In addition, we have used UDT in several real world applications. Finally, the UDT protocol has also been tested independently by many third party institutions. They have published some papers and technical reports describing their testing.

Efficiency, Fairness, & Stability
1Gb/s bandwidth, 106 ms RTT, StarLight, Chicago SARA, Amsterdam Flow 1 Flow 2 Flow 3 Flow 4 100 200 300 400 500 600 700 Time (sec) We don’t have time to review every single evaluation testing today, so I have chosen two of them to describe to you. The first one is an experiment on real networks to check the efficiency, fairness, and stability. We start four UDT flows over a 1Gb/s link across the Atlantic between four pairs of machines at Chicago and Amsterdam. We start one flow every 100 seconds and stop them in the reverse order every 100 seconds.

Efficiency, Fairness, & Stability
Flow 1 902 466 313 215 301 452 885 Flow 2 446 308 216 310 Flow 3 302 202 307 Flow 4 197 Efficiency 912 923 830 918 904 Fairness 1 0.999 0.998 Stability 0.11 0.08 0.16 0.04 0.02 This is the throughput of the 4 UDT flows during the 700 seconds experiment. We can see that all flows share the bandwidth almost equally (figure and the FI value in table), and the aggregate throughput uses about 90% of the bandwidth. In addition, the oscillations are very small. In contrast, a single TCP flow can only reach 5Mb/s along this link using the default setting of Linux. With a thorough tuning for TCP performance, we can still only reach about 40M/s.

TCP Friendliness 500 1MB TCP flows vs. 0 – 10 bulk UDT flows
1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 Number of UDT flows TCP Throughput (Mb/s) The other experiment examines TCP friendliness along the same link. We start 500 1MB TCP flows and different number of bulk background UDT flows. You can see that the aggregate throughput of the 500 TCP flows drops slowly as the number of UDT flows increases, but at 10 UDT flows, those short-lived TCP flows can still obtain about 70% of the throughput when there is no UDT flows. 500 1MB TCP flows vs. 0 – 10 bulk UDT flows 1Gb/s between Chicago and Amsterdam

>> COMPOSABLE UDT
INTRODUCTION PROTOCOL DESIGN & IMPLEMENTATION CONGESTION CONTROL PERFORMANCE EVALUATION >> COMPOSABLE UDT While UDT has been successful for bulk data transfer over high speed networks, we feel that it could have benefited a much broader audience. We expanded UDT so that it can be easily configurable to satisfy more requirements for both network research and application development. We call this Composable UDT. CONCLUSIONS

Composable UDT - Objectives
Easy implementation and deployment of new control algorithms Easy evaluation of new control algorithms Application awareness support and dynamic configuration In detail, we have three objectives for this expansion. The first is to provide a platform for easy implementation and deployment of new network control algorithms. At the beginning of this talk I have mentioned that those TCP variants suffer from deployment difficulties because of the changes required to the OS kernel. Because UDT is at user space, those algorithms can be implemented easily in Composable UDT. Second, even if some of the algorithms will be deployed in the kernel eventually, they can be evaluated using composable UDT in the early stages. Third, different applications may have different requirements for data transfer, which can also depend on network configuration and even different times of day. Composable UDT can provide such application awareness support.

Composable UDT - Methodologies
Packet sending control Window-based, rate-based, and hybrid Control event handling onACK, onLoss, onTimeout, onPktSent, onPktRecved, etc. Protocol parameters access RTT, loss rate, RTO, etc. Packet extension User-defined control packets To achieve these objectives, we expand UDT in the following four directions. The first and the most important is to allow users to directly tune the sending rate, either through the congestion window size or inter-packet time. The second is to use callback functions to tune the sending rate once certain events occurs. Users can redefine the event handlers such as when an ACK is received, when packet loss is detected, and so on. Users can also read and set many protocol parameters, including RTT, RTO, loss rate, and many other performance statistics. Finally, when necessary, users can define their own control packets.

Composable UDT - Evaluation
Simplicity Can it be easily used? Expressiveness Can it be used to implement most control protocols? Similarity Can Composable UDT based implementations reproduce the performance of their native implementations? Overhead Will the overhead added by Composable UDT be too large? We care about four performance characteristics of Composable UDT. Can it be easily used? Can it be used to implement most control protocols? Can composable UDT based implementations reproduce the performance of their native implementations? And will the overhead added by composable UDT be too large?

Simplicity & Expressiveness
Eight event handlers, four protocol control functions, and one performance monitoring function. Support a large variety of protocols Reliable UDT blast TCP and its variants (both loss and delay based) Group transport protocols We argue that using Composable UDT is simple. It has only 8 event handlers, 4 protocol control functions, and 1 performance monitoring function. Yet with this small number of functions it can be used to implement a large variety of protocols, including but not limited to reliable UDT blast, TCP and its variants, and even group transport protocols.

Simplicity & Expressiveness
CCC Base Congestion Control Class CTCP TCP NewReno CGTP Group Transport Protocol CUDPBlast Reliable UDP Blast CFAST FAST TCP CVegas TCP Vegas CScalable Scalable TCP CHS HighSpeed TCP CBiC BiC TCP CWestwood TCP Westwood 28 73 / 11 / 8 / +27-1 27 / 37 / This figure shows the example protocols implemented using Composable UDT. Composable UDT uses an object oriented design, in order to further simplify the implementation. In this figure, CCC is the base class used to derive new control protocols. For example, CTCP is a standard TCP implementation created by overriding several event handlers in CCC. Furthermore, more TCP variants can be derived from CTCP. The green numbers show how many lines of code are needed to implement a Composable UDT based TCP protocol, while the red numbers show how many lines needed to add into and remove from the Linux TCP implementation.

Similarity and Overhead
How Composable UDT based implementations can simulate their native implementations CTCP vs. Linux TCP Flow # Throughput Fairness Stability TCP CTCP 1 112 122 0.517 0.415 2 191 208 0.997 0.999 0.476 0.426 4 322 323 0.949 0.484 0.492 8 378 422 0.971 0.633 0.550 16 672 642 0.958 0.985 0.502 0.482 32 877 799 0.988 0.491 0.470 64 921 716 0.994 0.996 0.569 0.529 We check the similarity of Composable UDT by comparing CTCP and Linux TCP. Over a 1Gb/s link, we start 1 to 64 flows of CTCP and Linux TCP respectively. We record the throughput, fairness index, and stability index, which covers the majority of a transport protocol's performance properties. This table lists these values for both CTCP and Linux TCP. We can see that both implementations have similar performance. However, CTCP does have a CPU usage overhead, as we can see that at 64 parallel flows, CTCP has smaller throughput than Linux TCP. The overhead comes from additional memory copies, context switches, and user thread synchronizations. This is especially significant at the sender side. This is one shortcoming of Composable UDT, but we argue that in many situations, this is not a problem. For example, this overhead only exists when comparing it to kernel implementations; other user space implementations have the same overhead too. CPU usage Sender: CTCP uses about 100% more times of CPU as Linux TCP Receiver: CTCP uses about 20% more CPU than Linux TCP

PROTOCOL DESIGN & IMPLEMENTATION
INTRODUCTION PROTOCOL DESIGN & IMPLEMENTATION CONGESTION CONTROL PERFORMANCE EVALUATION COMPOSABLE UDT I have omitted many details, but I hope that I have given the big picture of UDT and most of its critical elements. Let me now discuss the conclusions I have come to after developing this protocol. >> CONCLUSIONS

Contributions A high performance data transport protocol and associated implementation The UDT protocol Open source UDT library (udt.sourceforge.net) User includes ANL, ORNL, PNNL, etc. An efficient and fair congestion control algorithm DAIMD & the UDT control algorithm Packet loss handling techniques Using bandwidth estimation technique in congestion control A configurable transport protocol framework Composable UDT I would like to start with the contributions of this project. I designed and implemented a new user space data transport protocol for high performance data transfer over the emerging high speed wide area optical networks. The implementation is an open source library with production quality. It has been used by many research institutes and anonymous users. I designed the UDT congestion control algorithm that can be used for efficient and fair data transfer in distributed data intensive applications. I also proposed several techniques to deal with packet losses. In addition, I employed bandwidth estimation technique in the congestion control. Finally, I also expanded the UDT implementation to become a configurable transport protocol framework. The Composable UDT allows users to easily implement and evaluate new control protocols. I have done extensive evaluation work using simulations, experiments, and real world applications.

Publications Papers on the UDT Protocol Internet Draft
Supporting Configurable Congestion Control in Data Transport Services, Yunhong Gu and Robert L. Grossman, SC 2005, Nov , Seattle, WA. Optimizing UDP-based Protocol Implementation, Yunhong Gu and Robert L. Grossman, PFLDNet 2005, Lyon, France, Feb Experiences in Design and Implementation of a High Performance Transport Protocol, Yunhong Gu, Xinwei Hong, and Robert L. Grossman, SC 2004, Nov , Pittsburgh, PA. An Analysis of AIMD Algorithms with Decreasing Increases, Yunhong Gu, Xinwei Hong and Robert L. Grossman, First Workshop on Networks for Grid Applications (Gridnets 2004), Oct. 29, San Jose, CA. SABUL: A Transport Protocol for Grid Computing, Yunhong Gu and Robert L. Grossman, Journal of Grid Computing, 2003, Volume 1, Issue 4, pp Internet Draft UDT: A Transport Protocol for Data Intensive Applications, Yunhong Gu and Robert L. Grossman, draft-gg-udt-01.txt. Here are some papers on UDT and Composable UDT. I have also written an Internet draft as an effort to standardize UDT.

Publications Papers on Data Transfer Service using UDT
Experimental Studies of Data Transport and Data Access of Earth Science Data over Networks with High Bandwidth Delay Products, Robert Grossman, Yunhong Gu, Dave Hanley, Xinwei Hong and Parthasarathy Krishnaswamy, Computer Networks, Volume 46, Issue 3, Oct. 2004, pp Teraflows over Gigabit WANs with UDT, Robert Grossman, Yunhong Gu, Xinwei Hong, Antony Antony, Johan Blom, Freek Dijkstra, and Cees de Laat,, Journal of Future Computer Systems, Vol. 21, Issue 4, pp , April 2005. The Photonic TeraStream: Enabling Next Generation Applications Through Intelligent Optical Networking at iGrid 2002, J. Mambretti, J. Weinberger, J. Chen, E. Bacon, F. Yeh, D. Lillethun, R. Grossman, Y. Gu, M. Mazzuco,, Journal of Future Computer Systems, Volume 19, Number 6, pages Experimental Studies Using Photonic Data Services at IGrid 2002, R. Grossman, Y. Gu, D. Hanley, X. Hong, D. Lillethun, J. Levera, J. Mambretti, M. Mazzucco, and J. Weinberger, Journal of Future Computer Systems, 2003, Volume 19, Number 6, pages Here are some papers describing data transfer services using UDT.

Publications Papers on Applications using UDT
Open DMIX: High Performance Web Services for Distributed Data Mining, R. Grossman, Y. Gu, C. Gupta, D. Hanley, X. Hong, and P. Krishnaswamy, 7th International Workshop on High Performance and Distributed Mining, . Open DMIX - Data Integration and Exploration Services for Data Grids, R. Grossman, Y. Gu, D. Hanley, X. Hong, and G. Rao, First International Workshop on Knowledge Grid and Grid Intelligence (KGGI 2003). Global Access to Large Distributed Data Sets using Photonic Data Services, R. Grossman, Y. Gu, D. Hanley, X. Hong, D. Lillethun, J. Levera, J. Mambretti, M. Mazzucco, and J. Weinberger, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST 2003), Los Alamitos, CA. Data Webs for Earth Science Data, Asvin Ananthanarayan, Rajiv Balachandran, Yunhong Gu, Robert Grossman, Xinwei Hong, Jorge Levera, Marco Mazzucco, Parallel Computing, Volume 29, 2003, pages And here are some papers describing high performance applications using UDT as its data transfer service.

Achievements SC 2002 Bandwidth Challenge “Best Use of Emerging Network Infrastructure” Award SC 2003 Bandwidth Challenge “Application Foundation” Award SC 2004 Bandwidth Challenge “Best Replacement for FedEx / UDP Fairness” Award SC 2005 ? Nov. 12 – 18, Seattle WA High Performance Mining of Streaming Data using UDT iGrid 2005 Exploring and mining remote data at 10Gb/s We have demonstrated UDT and applications using UDT in many conferences and exhibitions. We have participated in the bandwidth challenge in the annual super computing conference. UDT has helped us to win a bandwidth challenge award in each of the last three years. This year next month we will head to Seattle to demonstrate real time high performance mining of streaming data. We will use UDT to transfer large volumetric datasets from Japan, Korea, Europe, and several locations in the US to the SC05 exhibition floor, and the data will be analyzed there.

Vision Short-term Long-term
A practical solution to the distributed data intensive applications in high BDP environments Long-term Evolve with new technologies (open source & open standard) More functionalities and support for more use scenarios Network research platform (e.g., fast prototyping and evaluation of new control algorithms) I would like to end this talk with a vision about the short-term and long-term future of UDT. At the moment, UDT is a practical solution to transfer bulk data in distributed data intensive applications. But we will not stop here, we will inject new technologies into UDT as they appear. Because of the high cost of upgrading network equipments, there will always be a long time lag between up-to-date network technologies and the real Internet. We expect that UDT can offer the advantage of fast deployment at user space. In this way, we also expect that UDT can be used as a research platform for new technologies in network transport protocols. Finally, because UDT is open source and open standard, it will also accept contributions from outside.

The End Thank You! Yunhong Gu, October 10, 2005

BREAKING THE DATA TRANSFER BOTTLENECK

Similar presentations

Presentation on theme: "BREAKING THE DATA TRANSFER BOTTLENECK"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

BREAKING THE DATA TRANSFER BOTTLENECK

Similar presentations

Presentation on theme: "BREAKING THE DATA TRANSFER BOTTLENECK"— Presentation transcript:

Similar presentations

About project

Feedback