Download presentation
Presentation is loading. Please wait.
Published byCody Blake Modified over 9 years ago
1
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University balaji@cis.ohio-state.edu Piyush Shivam Ohio State University shivam@cis.ohio-state.edu D.K. Panda Ohio State University panda@cis.ohio-state.edu Pete Wyckoff Ohio Supercomputer Center pw@osc.edu
2
Presentation Overview Background and Motivation Design Challenges Performance Enhancement Techniques Performance Results Conclusions
3
Background and Motivation Sockets Sockets Frequently used API Frequently used API Traditional Kernel-Based Implementation Traditional Kernel-Based Implementation Unable to exploit High Performance Networks Unable to exploit High Performance Networks Earlier Solutions Earlier Solutions Interrupt Coalescing Interrupt Coalescing Checksum Offload Checksum Offload Insufficient Insufficient It gets worse with 10 Gigabit Networks It gets worse with 10 Gigabit Networks Can we do better Can we do better User-level support User-level support
4
Kernel Based Implementation of Sockets NIC IP TCP Sockets Application or Library Hardware Kernel User Space Pros High Compatibility Cons Kernel Context Switches Multiple Copies CPU Resources
5
Alternative Implementations of Sockets (GigaNet cLAN) “VI aware” NIC IP TCP Sockets Application or Library Hardware Kernel User Space Pros High Compatibility Cons Kernel Context Switches Multiple Copies CPU Resources IP-to-VI layer
6
Sockets over User-Level Protocols Sockets is a generalized protocol Sockets is a generalized protocol Sockets over VIA Sockets over VIA Developed by Intel Corporation [shah98] and ET Research Institute [sovia01] Developed by Intel Corporation [shah98] and ET Research Institute [sovia01] GigaNet cLAN platform GigaNet cLAN platform Most networks in the world are Ethernet Most networks in the world are Ethernet Gigabit Ethernet Gigabit Ethernet Backward compatible Backward compatible Gigabit Network over the existing installation base Gigabit Network over the existing installation base MVIA: Version of VIA on Gigabit Ethernet MVIA: Version of VIA on Gigabit Ethernet Kernel Based Kernel Based A need for a High Performance Sockets layer over Gigabit Ethernet A need for a High Performance Sockets layer over Gigabit Ethernet
7
User-Level Protocol over Gigabit Ethernet Ethernet Message Passing (EMP) Protocol Ethernet Message Passing (EMP) Protocol Zero-Copy OS-Bypass NIC-driven User-Level protocol over Gigabit Ethernet Zero-Copy OS-Bypass NIC-driven User-Level protocol over Gigabit Ethernet Developed over the Dual-processor Alteon NICs Developed over the Dual-processor Alteon NICs Complete Offload of message passing functionality to the NIC Complete Offload of message passing functionality to the NIC Piyush Shivam, Pete Wyckoff, D.K. Panda, “EMP: Zero-Copy OS-bypass NIC- driven Gigabit Ethernet Message Passing”, Supercomputing, November ’01 Piyush Shivam, Pete Wyckoff, D.K. Panda, “Can User-Level Protocols take advantage of Multi-CPU NICs?”, IPDPS, April ‘02
8
EMP: Latency A base latency of 28 s compared to an ~120 s of TCP for 4-byte messages
9
EMP: Bandwidth Saturated the Gigabit Ethernet network with a peak bandwidth of 964Mbps
10
Proposed Solution Gigabit Ethernet NIC Sockets over EMP Application or Library Hardware Kernel User Space Kernel Context Switches Multiple Copies CPU Resources High Performance OS Agent EMP Library
11
Presentation Overview Background and Motivation Design Challenges Performance Enhancement Techniques Performance Results Conclusions
12
Design Challenges Functionality Mismatches Connection Management Message Passing Resource Management UNIX Sockets
13
Functionality Mismatches and Connection Management Functionality Mismatches Functionality Mismatches No API for buffer advertising in TCP No API for buffer advertising in TCP Connection Management Connection Management Data Message Exchange Data Message Exchange Descriptors required for connection management Descriptors required for connection management
14
Message Passing Message Passing Message Passing Data Streaming Data Streaming Parts of the same message can be read potentially to different buffers Parts of the same message can be read potentially to different buffers Unexpected Message Arrivals Unexpected Message Arrivals Separate Communication Thread Separate Communication Thread Keeps track of used descriptors and re-posts Keeps track of used descriptors and re-posts Polling Threads have high Synchronization cost Polling Threads have high Synchronization cost Sleeping Threads involve OS scheduling granularity Sleeping Threads involve OS scheduling granularity Rendezvous Approach Rendezvous Approach Eager with Flow Control Eager with Flow Control
15
Rendezvous Approach SenderReceiver SQRQSQRQ send() receive() Request ACK Data
16
Eager with Flow Control SenderReceiver SQRQSQRQ send() Data ACK Data receive()
17
Resource Management and UNIX Sockets Resource Management Resource Management Clean up unused descriptors (connection management) Clean up unused descriptors (connection management) Free registered memory Free registered memory UNIX Sockets UNIX Sockets Function Overriding Function Overriding Application Changes Application Changes File Descriptor Tracking File Descriptor Tracking
18
Presentation Overview Background and Motivation Design Challenges Performance Enhancement Techniques Performance Results Conclusions
19
Performance Enhancement Techniques Credit Based Flow Control Disabling Data Streaming Delayed Acknowledgments EMP Unexpected Queue
20
Credit Based Flow Control SenderReceiver SQRQSQRQ Credits Left: 4Credits Left: 3Credits Left: 2Credits Left: 1Credits Left: 0Credits Left: 4 Multiple Outstanding Credits
21
Non-Data Streaming and Delayed Acknowledgments Disabling Data Streaming Disabling Data Streaming Intermediate copy required for Data Streaming Intermediate copy required for Data Streaming Place data directly into user buffer Place data directly into user buffer Delayed Acknowledgments Delayed Acknowledgments Increase in Bandwidth Increase in Bandwidth Lesser Network Traffic Lesser Network Traffic NIC has lesser work to do NIC has lesser work to do Decrease in Latency Decrease in Latency Lesser descriptors posted Lesser descriptors posted Lesser Tag Matching at the NIC Lesser Tag Matching at the NIC 550ns per descriptor 550ns per descriptor
22
EMP Unexpected Queue EMP Unexpected Queue EMP Unexpected Queue EMP features unexpected message queue EMP features unexpected message queue Advantages: Last to be checked Advantages: Last to be checked Disadvantage: Data Copy Disadvantage: Data Copy Acknowledgments in the Unexpected Queue Acknowledgments in the Unexpected Queue No copy, since acknowledgments carry no data No copy, since acknowledgments carry no data Acknowledgments pushed out of the critical path Acknowledgments pushed out of the critical path
23
Presentation Overview Background and Motivation Design Challenges Performance Enhancement Techniques Performance Results Conclusions
24
Performance Results Micro-benchmarks Micro-benchmarks Latency (ping-pong) Latency (ping-pong) Bandwidth Bandwidth FTP Application FTP Application Web Server Web Server HTTP/1.0 Specifications HTTP/1.0 Specifications HTTP/1.1 Specifications HTTP/1.1 Specifications
25
Experimental Test-bed Four Pentium III 700Mhz Quads 1GB Main Memory Alteon NICs Packet Engine Switch Linux version 2.4.18
26
Micro-benchmarks: Latency Up to 4 times improvement compared to TCP Overhead of 0.5us compared to EMP
27
Micro-benchmarks: Bandwidth An improvement of 53% compared to enhanced TCP
28
FTP Application Up to 2 times improvement compared to TCP
29
Web Server (HTTP/1.0) Up to 6 times improvement compared to TCP
30
Web Server (HTTP/1.1) Up to 3 times improvement compared to TCP
31
Conclusions Developed a High Performance User-Level Sockets implementation over Gigabit Ethernet Developed a High Performance User-Level Sockets implementation over Gigabit Ethernet Latency close to base EMP (28 s) Latency close to base EMP (28 s) 28.5 s for Non-Data Streaming 28.5 s for Non-Data Streaming 37 s for Data Streaming sockets 37 s for Data Streaming sockets 4 times improvement in latency compared to TCP 4 times improvement in latency compared to TCP Peak Bandwidth of 840Mbps Peak Bandwidth of 840Mbps 550Mbps obtained by TCP with increased Registered space for the kernel (up to 2MB) 550Mbps obtained by TCP with increased Registered space for the kernel (up to 2MB) Default case is 340Mbps with 32KB Default case is 340Mbps with 32KB Improvement of 53% Improvement of 53%
32
Conclusions (contd.) FTP Application shows an improvement of nearly 2 times FTP Application shows an improvement of nearly 2 times Web Server shows tremendous performance improvement Web Server shows tremendous performance improvement HTTP/1.0 shows an improvement of up to 6 times HTTP/1.0 shows an improvement of up to 6 times HTTP/1.1 shows an improvement of up to 3 times HTTP/1.1 shows an improvement of up to 3 times
33
Future Work Dynamic Credit Allocation Dynamic Credit Allocation NIC: The trusted component NIC: The trusted component Integrated QoS Integrated QoS Currently on Myrinet Clusters Currently on Myrinet Clusters Commercial applications in the Data Center environment Commercial applications in the Data Center environment Extend the idea to next generation interconnects Extend the idea to next generation interconnects InfiniBand InfiniBand 10 Gigabit Ethernet 10 Gigabit Ethernet
34
For more information, please visit the http://nowlab.cis.ohio-state.edu Network Based Computing Laboratory, The Ohio State University Thank You NBC Home Page
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.