Optimizing UDP-based Protocol Implementations Yunhong Gu and Robert L. Grossman Presenter: Michal Sabala National Center for Data Mining
Outline UDP Performance Characteristics and Optimizations Composable UDT: A Framework for UDP- based Protocol Implementations
Part I. UDP Performance Characteristics and Optimization Techniques
Introduction UDP-based Protocol is needed –As short-term solution to the lack of effective kernel space transport protocols for high bandwidth-delay product networks –As application specific data transfer library, e.g., Multimedia data transfer It is not an easy task to impalement a new UDP-based protocol from scratch –And may be not necessary!
UDP Performance Sending and receiving buffer size Packet size IO mode –Scattering/gathering (writev/readv) –Memory copy avoidance (e.g., overlapped IO of Windows Socket2) To reach same data transfer rate, UDP needs slightly less CPU time than TCP, and cause slightly less end system delay
UDP Performance: Impact of Buffer Size
UDP Performance: Impact of Packet Size Throughput CPU Util.
UDP-based Protocol Performance Additional overhead –Additional memory copy –Additional packet processing –Additional context switches
Optimization Guidelines Avoid additional memory copy Reduce the number of packets –Control packets, esp. acknowledgements Reduce overall processing time –Simpler mechanism is better Avoid burst in processing time –CPU may be too busy to process incoming packets
Optimization Guidelines Memory copy avoidance –UDP IO –API semantics Acknowledgements –Timer-based Acknowledging –Light ACK –Loss processing Timing, rate control, and self-clocking
Optimization Guidelines Disk IO –sendfile/recvfile Threading –Synchronization cost Code Optimization –sending/receiving loop Profiling
Part II. Composable UDT: A Framework for UDP-based Protocol Implementations
Composable UDT Based on the UDT (UDP-based Data Transfer library) implementation Integrated those optimization techniques described in this paper
Objectives Rapid development of UDP-based transport protocols and application specific data transfer libraries Easy evaluation of new congestion control algorithms Non-objectives –Replace kernel space protocol implementations –User-level TCP implementation
Current Status UDT/CCC: Configurable congestion control In future –Data reliability configuration –Message boundary support
Configurable Congestion Control Packet sending control –Rate-based, window-based, hybrid Redefinition of control event handlers –Loss, ACK, Time Out, etc. Access to internal protocol parameters –RTT, RTO, Loss Rate, etc. User customized packet formats
Implementation C++ class inheritance –CCC: base class for control event handing Callbacks Performance monitoring –Internal protocol parameters –Performance statistics
Implementation
Example: Simplified TCP class CTCP: public CCC { public: virtual void init() { m_dPktSndPeriod = 0.0; m_dCWndSize = 2.0; setACKInterval(2); } virtual void onACK(const int&) { m_dCWndSize += 1.0/m_dCWndSize; } virtual void onLoss(const int*, const int&) { m_dCWndSize *= 0.5; } };
Configurable Congestion Control
Future Work Continue to improve the UDT/CCC library More experimental evaluation work of the UDT/CCC library –Compare k-TCP and u-TCP in more network environments –Implement more TCP variants More pre-implemented congestion control algorithms
Conclusion UDP-based protocol is one of the solutions to bulk data transfer in high BDP networks Some optimization principles and techniques are discussed in this paper We further propose a composable framework in order to make it much easier to implement UDP-based protocols
Thank you! For more information, please visit UDT Project: NCDM:
Backup Slides
UDP Performance: Experiment Setup NameCPUMemoryNICOS onno Dual Itanium2 1.5GHz 8 GB10 GbELinux sara77 Dual Xeon 2.4GHz 2 GB1 GbELinux ncdm171 Dual PowerPC G4 1GHz 2 GB1 GbEMac OS X win91 Dual Xeon 2.4GHz 2 GB1 GbE Windows XP Professional ncdm87 Dual Opteron 2.4GHz 4 GB1 GbELinux 2.6.8
UDP Performance: CPU Utilization Name UDPTCP SendingReceivingSendingReceiving onno sara ncdm win ncdm
UDP Performance: End System Delay Name UDPTCP Delay (ms) onno sara ncdm win ncdm
UDT Profiling: Modules
UDT Profiling: Functionalities
CPU Utilization: K-TCP vs U-TCP Machines SenderReceiver K-TCPU-TCPK-TCPU-TCP onno sara ncdm win ncdm