Download presentation
Published byBuddy Cameron Modified over 9 years ago
1
Mapping of scalable RDMA protocols to ASIC/FPGA platforms
Yosef Gavriel Tirat-Gefen, PhD Senior Member IEEE Chief Scientist Castel Systems Inc. & Dept. Physics and Astronomy George Mason University Fairfax, VA
2
Presentation Overview
Motivation TCP Off-loading Zero-copying RDMA protocol RDMA protocol stack Structure of a RDMA card Results Conclusion
3
Enabling high-bandwidth WAN applications
Motivation Supercomputer or Server farm Supercomputer or Server farm WAN Terabyte storage Terabyte storage Workstation Enabling high-bandwidth WAN applications
4
Applications Distributed Command and Control.
Signal processing (e.g. RADAR) Sharing of intelligence data real-time. Distributed large scale computation/ simulation of aerospace problems. Extension of storage area networks over a wide area network (WAN). Enabling technology for modern supercomputing installations.
5
Traditional TCP/IP Networking
Application/O.S. TCP Layer 3 (IP) Layer 2 (MAC) Layer 1 (PHY) Application/O.S. TCP Layer 3 (IP) Layer 2 (MAC) Layer 1 (PHY) Router Layer 3 Layer 2 Layer 1 Layer 3 Layer 2 Layer 1
6
Standard Data Flow on TCP/IP
Application A Memory Space Application B Memory Space WAN/LAN TCP Buffer/Stack Memory Space TCP Buffer/Stack Memory Space L3 L2 L1 L1 L2 L3
7
Standard Data Flow on TCP/IP
Traditional TCP/IP copies data from application to TCP memory buffer Leads to CPU lost cycles in buffer copying CPU gets overwhelmed to rates above 2.5 Gbps TCP/IP off-loading is a help but it does not solve the problem on the receiver side
8
TCP/IP off-load processing
Application/O.S. TCP Layer 3 (IP) Layer 2 (MAC) Layer 1 (Phy) Application/O.S. TCP/IP offload Processor (TOE) Mapped to hardware
9
Zero-copying and TCP offloading processing
Host CPU Cache Memory TCP off-load Processor TOE/NIC Card Host CPU Host Main Memory WAN/LAN Network buffer Receive Buffer
10
Zero-copying and TCP offloading processing
Zero-copying is still not achieved as receiver buffer is still copied back to application memory space TCP/IP off-loading is not scalable RDMA protocols provide a solution
11
RDMA data-flow for WAN applications
Host Memory Host Memory Host CPU A Host CPU B Application Memory Space Application Memory Space WAN RDMA NIC Card RDMA NIC Card
12
Scalable WAN-RDMA for bandwidths above 10 Gbps
10 Gbps links RDMA NIC Card for WAN Tx Buffer Host MAC PHY > 10 Gbps RDMA Engine WAN Rx Buffer DMA channel
13
The RDMA protocol layers and our prototype
Running on Host CPU ULP (e.g. iSCSI, NFS) RDMA DDP MPA SCTP TCP Layer 3 (e.g. IP) Layer 2 (MAC) Layer 1 (PHY) FPGA implementation FPGA and off-the-shelf MAC/PHY chips
14
Overall Hardware/Firmware Organization of the WAN RDMA card
PCI-Express/Hyper-transport Interface IP/Firmware module RDMA Protocol Engine Rx Memory controller Tx Memory controller SCTP Protocol Engine Rx Memory Bank Layer 3 (IP) Processor Rx Memory Bank Data stream split/join unit SAR SAR SAR SAR 10GE/OC-192 framer 10GE/OC-192 framer 10GE/ OC-192 framer 10GE/OC-192 framer PHY PHY PHY PHY
15
Present Results Currently using Virtex-II/Virtex-IIPro (Xilinx) as target devices for our cores Data indicate that most of the key cores will fit one FPGA device (Virtex-II) Aggregate of all cores is spanning several FPGAs Intra-device communication is a issue, need to be careful with PCB design. We are currently trying to accommodate most of the cores in one FPGA. Most of the cores will be made available free-of-charge to researchers in non-profit or government organizations.
16
Conclusion Advent of Hyper-transport/ PCI-Express and VITA (embedded computing) standards will enable I/0 bandwidths above 10 Gbps locally Extension of RDMA protocol enables large bandwidths over wide area networks The proposed cores will fulfill the natural growth of bandwidth requirements in commercial/defense/aerospace applications.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.