Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Categories of I/O Devices
Operating System.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Spring 2000CS 4611 Introduction Outline Statistical Multiplexing Inter-Process Communication Network Architecture Performance Metrics.
Spring 2003CS 4611 Introduction, Continued COS 461.
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,
Performance Evaluation of RDMA over IP: A Case Study with the Ammasso Gigabit Ethernet NIC H.-W. Jin, S. Narravula, G. Brown, K. Vaidyanathan, P. Balaji,
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
MPI and RDMA Yufei 10/15/2010. MPI over uDAPL: abstract MPI: most popular parallel computing standard MPI needs the ability to deliver high performace.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Socket Programming.
Develop Application with Open Fabrics Yufei Ren Tan Li.
Protocols and the TCP/IP Suite Chapter 4 (Stallings Book)
Protocols and the TCP/IP Suite
DAPL: Direct Access Transport Libraries Introduction and Example Yufei 10/01/2010.
1/20 Introduction Outline Statistical Multiplexing Inter-Process Communication Network Architecture Performance Metrics Implementation Issues.
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
Figure 1.1 Interaction between applications and the operating system.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
FreeBSD Network Stack Performance Srinivas Krishnan University of North Carolina at Chapel Hill.
IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.
P. Balaji, S. Bhagvat, D. K. Panda, R. Thakur, and W. Gropp
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface KFI Framework.
Lecture 1, 1Spring 2003, COM1337/3501Computer Communication Networks Rajmohan Rajaraman COM1337/3501 Textbook: Computer Networks: A Systems Approach, L.
Protocols and the TCP/IP Suite Chapter 4. Multilayer communication. A series of layers, each built upon the one below it. The purpose of each layer is.
EstiNet Network Simulator & Emulator 2014/06/ 尉遲仲涵.
Chapter 3 Operating Systems Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Protocols for Wide-Area Data-intensive Applications: Design and Performance Issues Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi, Brian.
Protocol Architectures. Simple Protocol Architecture Not an actual architecture, but a model for how they work Similar to “pseudocode,” used for teaching.
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago.
1 March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1, Pavan Balaji 2, Ahmad.
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim Performance Analysis of TCP/IP Data.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
Lect1..ppt - 01/06/05 CDA 6505 Network Architecture and Client/Server Computing Lecture 2 Protocols and the TCP/IP Suite by Zornitza Genova Prodanoff.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Introduction Slide 1 A Communications Model Source: generates.
Introduction to Technology Infrastructure Chapter 1.
1 Using HPS Switch on Bassi Jonathan Carter User Services Group Lead NERSC User Group Meeting June 12, 2006.
2006 Sonoma Workshop February 2006Page 1 Sockets Direct Protocol (SDP) for Windows - Motivation and Plans Gilad Shainer Mellanox Technologies Inc.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
August 22, 2005Page 1 of (#) Datacenter Fabric Workshop Open MPI Overview and Current Status Tim Woodall - LANL Galen Shipman - LANL/UNM.
OS, , Part II Processes Department of Computer Engineering, PSUWannarat Suntiamorntut.
Minimizing Communication Latency to Maximize Network Communication Throughput over InfiniBand Design and Implementation of MPICH-2 over InfiniBand with.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Chapter 2 Protocols and the TCP/IP Suite 1 Chapter 2 Protocols and the TCP/IP Suite.
Processes CSCI 4534 Chapter 4. Introduction Early computer systems allowed one program to be executed at a time –The program had complete control of the.
Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
The influence of system calls and interrupts on the performances of a PC cluster using a Remote DMA communication primitive Olivier Glück Jean-Luc Lamotte.
OpenFabrics Interface WG A brief introduction Paul Grun – co chair OFI WG Cray, Inc.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 13 TCP Implementation.
OpenFabrics 2.0 rsockets+ requirements Sean Hefty - Intel Corporation Bob Russell, Patrick MacArthur - UNH.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface Kfabric Framework.
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Protocols and the TCP/IP Suite
High Performance and Reliable Multicast over Myrinet/GM-2
Fabric Interfaces Architecture – v4
Protocols and the TCP/IP Suite
Toward Effective and Fair RDMA Resource Sharing
CS703 - Advanced Operating Systems
Beyond FTP & hard drives: Accelerating LAN file transfers
Protocols and the TCP/IP Suite
Presentation transcript:

Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical and Computer Engineering Stony Brook University

Outline n Introduction and Background n Middleware Design and RFTP application n Experimental Results n Conclusion

Outline n Introduction and Background l Overview l RDMA Semantics n Middleware Design and RFTP application n Experimental Results n Conclusion

Today’s Data-intensive Applications n Explosion of data, and massive data processing n Scalable storage systems n Ultra-high speed network for data transfer: 40/100Gbps networks n Reliable Transfer (error checking and recovery) at 40/100G speed, burden on processing power

ANI Ultra-high Speed Network

End-to-End 40/100G Networking 100G APPS FTP /100G NIC 40/100 Gbps Backbone 40/100 Gbps Backbone 100 G APPS FTP /100G NIC End-to-End Networking at 40/100 Gbits/s Our project and its role

Protocol Offload and Hardware Acceleration n TCP/IP Offload Engine (TOE) n Protocol Offload Engine (POE) n Remote Directory Memory Access (RDMA) l Kernel by pass l Zero-copy

Applications over different RDMA implementations

RDMA Semantics n Channel Semantic – SEND/RECV l Two-side operation l Both data source and data sink are involved. The sink pre- posts a list of buffers into receive queue. n Memory Semantic – RDMA WRITE/RDMA READ l One-side operation l Credit-based. The sink advertises its available registered memory to the source for RDMA_WRITE operation. n We use RDMA WRITE operation to deliver user payload(128KB ~ 4MB per block), while use SEND/RECV to exchange control messages( ~2KB).

Outline n Introduction and Backgroud n Middleware Design and RFTP application l Middleware Layer l Middleware Software Architecture l Asynchronous Communication Events design l RFTP Modules l RDMA extension to standard FTP protocol n Experimental Results n Conclusion

Middleware Layer InfiniBandRoCEiWARP IB Verbs libibverbs RDMA CM librdmacm Application Buffer Management Connection Management Event Dispatch/Join Task Scheduling Middleware OFED Hardware

Middleware – Multi-threaded Architecture ThreadsData Structure CQ QP-1QP-2QP-n Data Block List Receive Control Message List Send Control Message List Remote MR Info List application system Queue Pair List Memory Sender CE dispatcher CE slave-n... CE slave-2 CE slave-1 Logger Hardware HCA

Communication Events n Session ID negotiation l Each data transfer task will be assigned a unique session ID n Number of data connection negotiation l Establish several parallel connections n Memory region credit request and response l The source issues request of Memory regions’ information l The sink feedbacks several credit according to buffer status n Block completion notification l The source issues a notification to the sink which block’s data is ready

Parallel and Pipelined Data Transfer n Explore parallelism of RDMA operations l Multiple active data streams l Each stream uses a pipelined execution n Out-of-order blocks l Reorder l Deliver in-order blocks to application

RDMA-enabled FTP - RFTP RDMA Middleware FTP … Disk I/O Module InfiniBandiWARPRoCE Verbs Communication manager SSDMagnetic Disk Driver API Hardware Operating System Middleware Application Buffer Manage I/O Scheduling Connection Manage Event Dispatch Task Scheduling Direct I/O API

RDMA extension to standard FTP protocol

Outline n Introduction and Backgroud n Middleware Design and RFTP application n Experimental Results l Testbed Setup l LAN results l MAN results n Conclusion

Testbed Setup - LAN 10Gbps 40Gbps

Testbed Setup - MAN 40Gbps RoCE link RTT = 3.6ms

LAN – Bandwidth and CPU Usage Comparison

MAN – RFTP evaluation

Outline n Introduction and Background n Middleware Design and RFTP application n Experimental Results n Conclusion

Conclusion n Data-intensive application in cloud computing require efficient data transfer protocols to fully utilize the capacity of advanced network infrastructure n Designed and implemented a RDMA-based middleware layer n Developed a FTP application based on this middleware layer n Tested the performance of our design and implementation on both LAN and long-haul MAN links

Thank you