Mapping of scalable RDMA protocols to ASIC/FPGA platforms

Slides:



Advertisements
Similar presentations
All rights reserved © 2006, Alcatel Grid Standardization & ETSI (May 2006) B. Berde, Alcatel R & I.
Advertisements

Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
StreamBlade SOE TM Initial StreamBlade TM Stream Offload Engine (SOE) Single Board Computer SOE-4-PCI Rev 1.2.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.
Chapter 4 Conventional Computer Hardware Architecture
Performance Characterization of a 10-Gigabit Ethernet TOE W. Feng ¥ P. Balaji α C. Baron £ L. N. Bhuyan £ D. K. Panda α ¥ Advanced Computing Lab, Los Alamos.
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
SFI-4.1 Brian Von Herzen, Ph.D. Xilinx Consultant,
August 02, 2004Mallikarjun Chadalapaka, HP1 iSCSI/RDMA: Overview of DA and iSER Mallikarjun Chadalapaka HP.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
1 Network Packet Generator Characterization presentation Supervisor: Mony Orbach Presenting: Eugeney Ryzhyk, Igor Brevdo.
Research Agenda on Efficient and Robust Datapath Yingping Lu.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007.
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
Router Architectures An overview of router architectures.
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji  Hemal V. Shah ¥ D. K. Panda 
CECS 474 Computer Network Interoperability Tracy Bradley Maples, Ph.D. Computer Engineering & Computer Science Cal ifornia State University, Long Beach.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.
Router Architectures An overview of router architectures.
Supporting iWARP Compatibility and Features for Regular Network Adapters P. BalajiH. –W. JinK. VaidyanathanD. K. Panda Network Based Computing Laboratory.
Only Sky is a limit, So lets the real job start S. Pogrebenko, , &
I/O Acceleration in Server Architectures
New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.
1.  Project Goals.  Project System Overview.  System Architecture.  Data Flow.  System Inputs.  System Outputs.  Rates.  Real Time Performance.
NetBurner MOD 5282 Network Development Kit MCF 5282 Integrated ColdFire 32 bit Microcontoller 2 DB-9 connectors for serial I/O supports: RS-232, RS-485,
Is Lambda Switching Likely for Applications? Tom Lehman USC/Information Sciences Institute December 2001.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
1 March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1, Pavan Balaji 2, Ahmad.
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
Silicon Building Blocks for Blade Server Designs accelerate your Innovation.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
1 Liquid Software Larry Peterson Princeton University John Hartman University of Arizona
 Network Segments  NICs  Repeaters  Hubs  Bridges  Switches  Routers and Brouters  Gateways 2.
Chapter 17 - Internetworking: Concepts, Architecture, and Protocols 1. Internetworking concepts 2. Router 3. protocol for internetworking 4. TCP/ IP layering.
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
StreamBlade TM Architecture Introduction To The StreamBlade TM Architecture Rev 1.2.
Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
Srihari Makineni & Ravi Iyer Communications Technology Lab
ENW-9800 Copyright © PLANET Technology Corporation. All rights reserved. Dual 10Gbps SFP+ PCI Express Server Adapter.
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Hot Interconnects TCP-Splitter: A Reconfigurable Hardware Based TCP/IP Flow Monitor David V. Schuehler
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
CS 4396 Computer Networks Lab Router Architectures.
ND The research group on Networks & Distributed systems.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Hardware Benchmark Results for An Ultra-High Performance Architecture for Embedded Defense Signal and Image Processing Applications September 29, 2004.
1 Presenter: Min Yu,Lo 2015/12/21 Kumar, S.; Jantsch, A.; Soininen, J.-P.; Forsell, M.; Millberg, M.; Oberg, J.; Tiensyrja, K.; Hemani, A. VLSI, 2002.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
12005 MAPLD/1006Tirat-Gefen FPGA/ASIC Cores for Interplanetary Internet Applications Yosef Gavriel Tirat-Gefen, PhD Senior Member IEEE Member of ACM, Internet.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Technical Overview of Microsoft’s NetDMA Architecture Rade Trimceski Program Manager Windows Networking & Devices Microsoft Corporation.
Background Computer System Architectures Computer System Software.
Brian Lauge Pedersen Senior DataCenter Technology Specialist Microsoft Danmark.
Status Report of the PC-Based PXD-DAQ Option Takeo Higuchi (KEK) 1Sep.25,2010PXD-DAQ Workshop.
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Storage Networking Protocols
Router Construction Outline Switched Fabrics IP Routers
Cost Effective Network Storage Solutions
NetFPGA - an open network development platform
Multicasting Unicast.
Presentation transcript:

Mapping of scalable RDMA protocols to ASIC/FPGA platforms Yosef Gavriel Tirat-Gefen, PhD Senior Member IEEE Chief Scientist Castel Systems Inc. & Dept. Physics and Astronomy George Mason University Fairfax, VA yosefgavriel@computer.org

Presentation Overview Motivation TCP Off-loading Zero-copying RDMA protocol RDMA protocol stack Structure of a RDMA card Results Conclusion

Enabling high-bandwidth WAN applications Motivation Supercomputer or Server farm Supercomputer or Server farm WAN Terabyte storage Terabyte storage Workstation Enabling high-bandwidth WAN applications

Applications Distributed Command and Control. Signal processing (e.g. RADAR) Sharing of intelligence data real-time. Distributed large scale computation/ simulation of aerospace problems. Extension of storage area networks over a wide area network (WAN). Enabling technology for modern supercomputing installations.

Traditional TCP/IP Networking Application/O.S. TCP Layer 3 (IP) Layer 2 (MAC) Layer 1 (PHY) Application/O.S. TCP Layer 3 (IP) Layer 2 (MAC) Layer 1 (PHY) Router Layer 3 Layer 2 Layer 1 Layer 3 Layer 2 Layer 1

Standard Data Flow on TCP/IP Application A Memory Space Application B Memory Space WAN/LAN TCP Buffer/Stack Memory Space TCP Buffer/Stack Memory Space L3 L2 L1 L1 L2 L3

Standard Data Flow on TCP/IP Traditional TCP/IP copies data from application to TCP memory buffer Leads to CPU lost cycles in buffer copying CPU gets overwhelmed to rates above 2.5 Gbps TCP/IP off-loading is a help but it does not solve the problem on the receiver side

TCP/IP off-load processing Application/O.S. TCP Layer 3 (IP) Layer 2 (MAC) Layer 1 (Phy) Application/O.S. TCP/IP offload Processor (TOE) Mapped to hardware

Zero-copying and TCP offloading processing Host CPU Cache Memory TCP off-load Processor TOE/NIC Card Host CPU Host Main Memory WAN/LAN Network buffer Receive Buffer

Zero-copying and TCP offloading processing Zero-copying is still not achieved as receiver buffer is still copied back to application memory space TCP/IP off-loading is not scalable RDMA protocols provide a solution

RDMA data-flow for WAN applications Host Memory Host Memory Host CPU A Host CPU B Application Memory Space Application Memory Space WAN RDMA NIC Card RDMA NIC Card

Scalable WAN-RDMA for bandwidths above 10 Gbps 10 Gbps links RDMA NIC Card for WAN Tx Buffer Host MAC PHY > 10 Gbps RDMA Engine WAN Rx Buffer DMA channel

The RDMA protocol layers and our prototype Running on Host CPU ULP (e.g. iSCSI, NFS) RDMA DDP MPA SCTP TCP Layer 3 (e.g. IP) Layer 2 (MAC) Layer 1 (PHY) FPGA implementation FPGA and off-the-shelf MAC/PHY chips

Overall Hardware/Firmware Organization of the WAN RDMA card PCI-Express/Hyper-transport Interface IP/Firmware module RDMA Protocol Engine Rx Memory controller Tx Memory controller SCTP Protocol Engine Rx Memory Bank Layer 3 (IP) Processor Rx Memory Bank Data stream split/join unit SAR SAR SAR SAR 10GE/OC-192 framer 10GE/OC-192 framer 10GE/ OC-192 framer 10GE/OC-192 framer PHY PHY PHY PHY

Present Results Currently using Virtex-II/Virtex-IIPro (Xilinx) as target devices for our cores Data indicate that most of the key cores will fit one FPGA device (Virtex-II) Aggregate of all cores is spanning several FPGAs Intra-device communication is a issue, need to be careful with PCB design. We are currently trying to accommodate most of the cores in one FPGA. Most of the cores will be made available free-of-charge to researchers in non-profit or government organizations.

Conclusion Advent of Hyper-transport/ PCI-Express and VITA (embedded computing) standards will enable I/0 bandwidths above 10 Gbps locally Extension of RDMA protocol enables large bandwidths over wide area networks The proposed cores will fulfill the natural growth of bandwidth requirements in commercial/defense/aerospace applications.