Architecture and Usages of Accelio 2014 OFA Developer Workshop Sunday, March 30 - Wednesday, April 2, 2014 Monterey CA Eyal Salomon Mellanox Technologies.

Slides:



Advertisements
Similar presentations
Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Threads Relation to processes Threads exist as subsets of processes Threads share memory and state information within a process Switching between threads.
The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Distributed Processing, Client/Server, and Clusters
Technical Architectures
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Software Engineering and Middleware: a Roadmap by Wolfgang Emmerich Ebru Dincel Sahitya Gupta.
Federated DAFS: Scalable Cluster-based Direct Access File Servers Murali Rangarajan, Suresh Gopalakrishnan Ashok Arumugam, Rabita Sarker Rutgers University.
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Ch 4: Threads Dr. Mohamed Hefeeda.
1 Chapter 4 Threads Threads: Resource ownership and execution.
Polaris Financial Technologies Welcomes the members of Hyderabad chapter for the 2nd event on 4 th July 14 held by PACE (The Testing Practice)
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
.NET Mobile Application Development Introduction to Mobile and Distributed Applications.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.
Windows RDMA File Storage
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Roland Dreier Technical Lead – Cisco Systems, Inc. OpenIB Maintainer Sean Hefty Software Engineer – Intel Corporation OpenIB Maintainer Yaron Haviv CTO.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Socket Swapping for efficient distributed communication between migrating processes MS Final Defense Praveen Ramanan 12 th Dec 2002.
LWIP TCP/IP Stack 김백규.
Introduction to Threads CS240 Programming in C. Introduction to Threads A thread is a path execution By default, a C/C++ program has one thread called.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
© 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect.
The NE010 iWARP Adapter Gary Montry Senior Scientist
Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.
2006 Sonoma Workshop February 2006Page 1 Sockets Direct Protocol (SDP) for Windows - Motivation and Plans Gilad Shainer Mellanox Technologies Inc.
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Share Memory Program Example int array_size=1000 int global_array[array_size] main(argc, argv) { int nprocs=4; m_set_procs(nprocs); /* prepare to launch.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Source: Operating System Concepts by Silberschatz, Galvin and Gagne.
CS333 Intro to Operating Systems Jonathan Walpole.
GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 14 Threads 2 Read Ch.
Networking Implementations (part 1) CPS210 Spring 2006.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
iSER update 2014 OFA Developer Workshop Eyal Salomon
OpenFabrics Interface WG A brief introduction Paul Grun – co chair OFI WG Cray, Inc.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
CSC 360, Instructor: Kui Wu Thread & PThread. CSC 360, Instructor: Kui Wu Agenda 1.What is thread? 2.User vs kernel threads 3.Thread models 4.Thread issues.
Datacenter Fabric Workshop NFS over RDMA Boris Shpolyansky Mellanox Technologies Inc.
© 2008 by Wind River; made available under the EPL v1.0 | 19-Nov-2008 TCF The Target Communication Framework Michael Scharf, Wind River wiki.eclipse.org/DSDP/TM/TCF_FAQ.
Threads. Thread A basic unit of CPU utilization. An Abstract data type representing an independent flow of control within a process A traditional (or.
Background Computer System Architectures Computer System Software.
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface Kfabric Framework.
1 Network Communications A Brief Introduction. 2 Network Communications.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
1.3 Operating system services An operating system provide services to programs and to the users of the program. It provides an environment for the execution.
Introduction to Operating Systems Concepts
Introduction to threads
Platform as a Service (PaaS)
Platform as a Service (PaaS)
Client-Server Communication
Chapter 4: Threads.
Current Generation Hypervisor Type 1 Type 2.
Operating System (013022) Dr. H. Iwidat
Integrating DPDK/SPDK with storage application
Multithreaded Programming
Chapter 4 Threads, SMP, and Microkernels
Outline Chapter 3: Processes Chapter 4: Threads So far - Next -
Presentation transcript:

Architecture and Usages of Accelio 2014 OFA Developer Workshop Sunday, March 30 - Wednesday, April 2, 2014 Monterey CA Eyal Salomon Mellanox Technologies

What is Accelio in a Nutshell High-performance, Transport independent, Simple to use Reliable Messaging and RPC Library for Accelerating applications Support User space, Kernel, C/C++, Java, Python (Future) bindings Optimal usage of CPU and Network hardware resources Built in fault-tolerance, transaction reliability, and load-balancing Integrated into OpenSource (e.g. HDFS, Ceph), and Commercial Storage/DB products in-order to accelerate its transport with minimal development/integration effort OpenSource Community project from the ground up: –Site: –Code in: –Project/Bug tracking: March 30 – April 2, OFA Developer workshop2

Accelio Goal Goal: Provide an easy to use, reliable, scalable, and high performance data/message delivery middleware that maximize efficiency of modern CPU and NIC hardware Key features: –Focus on high-performance asynchronous APIs –Reliable message delivery (end to end) –Request/Response (Transaction) or Send/Receive models –Provide connection and resource abstraction to max scalability and availability –Maximize multi-threaded application performance with dedicated HW resources per thread –Designed to maximize the benefits of RDMA, hardware offloads, and Multi-core CPUs –Will support multiple transport options (RDMA, TCP,..) –Native support for service and storage clustering/scale-out –Small message combining –Simple and abstract API March 30 – April 2, OFA Developer workshop3

Accelio Architecture March 30 – April 2, OFA Developer workshop4 Use multiple connections per session: -maximize CPU core usage/parallelism -High-availability & Migration -Scale network bandwidth Pluggable Transports: -Code once for multiple HW options -Seamlessly use RDMA Abstract, Easy to use API

High Level Transaction Flow March 30 – April 2, OFA Developer workshop5

Accelio Example - Hello Client March 30 – April 2, OFA Developer workshop6 int main(int argc, char *argv[]) { struct … /* open one thread context set the polling timeout */ ctx = xio_context_create(NULL, 0); /* create a session and connect to server */ session = xio_session_create(XIO_SESSION_CLIENT, &attr, url, 0, 0, &session_data); session_data.conn = xio_connect(session, ctx, 0, NULL, &session_data); … xio_send_request(session_data.conn, session_data.req); /* run the default xio main loop */ xio_context_run_loop(ctx, XIO_INFINITE); /* normal exit phase */ xio_context_destroy(ctx); return 0; } int main(int argc, char *argv[]) { struct … /* open one thread context set the polling timeout */ ctx = xio_context_create(NULL, 0); /* create a session and connect to server */ session = xio_session_create(XIO_SESSION_CLIENT, &attr, url, 0, 0, &session_data); session_data.conn = xio_connect(session, ctx, 0, NULL, &session_data); … xio_send_request(session_data.conn, session_data.req); /* run the default xio main loop */ xio_context_run_loop(ctx, XIO_INFINITE); /* normal exit phase */ xio_context_destroy(ctx); return 0; }

Accelio Example - Hello Client March 30 – April 2, 2014#OFADevWorkshop7 int on_session_event(struct xio_session *session, struct xio_session_event_data *event_data, void *cb_user_context) { switch (event_data->event) { case XIO_SESSION_CONNECTION_TEARDOWN_EVENT: xio_connection_destroy(event_data->conn); break; case XIO_SESSION_TEARDOWN_EVENT: xio_session_destroy(session); break; } return 0; } int on_session_event(struct xio_session *session, struct xio_session_event_data *event_data, void *cb_user_context) { switch (event_data->event) { case XIO_SESSION_CONNECTION_TEARDOWN_EVENT: xio_connection_destroy(event_data->conn); break; case XIO_SESSION_TEARDOWN_EVENT: xio_session_destroy(session); break; } return 0; } int on_response(struct xio_session *session, struct xio_msg *rsp, int more_in_batch, void *cb_prv_data) { struct … process_response(rsp); /* process the incoming message, send a new one */ xio_release_response(rsp); /* acknowledge xio that response resources can be recycled */ … xio_send_request(session_data.conn, session_data.req); return 0; } int on_response(struct xio_session *session, struct xio_msg *rsp, int more_in_batch, void *cb_prv_data) { struct … process_response(rsp); /* process the incoming message, send a new one */ xio_release_response(rsp); /* acknowledge xio that response resources can be recycled */ … xio_send_request(session_data.conn, session_data.req); return 0; }

Accelio Example - Hello Server March 30 – April 2, OFA Developer workshop8 int main(int argc, char *argv[]) { struct … /* create thread context for the server */ ctx= xio_context_create(NULL, 0); /* bind a listener server to a portal/url */ server = xio_bind(ctx, &server_ops, url, NULL, 0, &server_data); xio_context_run_loop(ctx, XIO_INFINITE); /* normal exit phase */ xio_unbind(server); xio_context_destroy(ctx); return 0; } int main(int argc, char *argv[]) { struct … /* create thread context for the server */ ctx= xio_context_create(NULL, 0); /* bind a listener server to a portal/url */ server = xio_bind(ctx, &server_ops, url, NULL, 0, &server_data); xio_context_run_loop(ctx, XIO_INFINITE); /* normal exit phase */ xio_unbind(server); xio_context_destroy(ctx); return 0; }

Accelio Example - Hello Server March 30 – April 2, 2014#OFADevWorkshop9 static int on_new_session(struct xio_session *session, struct xio_new_session_req *req, void *cb_prv_data) { /* accept new connection request */ xio_accept(session, NULL, 0, NULL, 0); return 0; } static int on_new_session(struct xio_session *session, struct xio_new_session_req *req, void *cb_prv_data) { /* accept new connection request */ xio_accept(session, NULL, 0, NULL, 0); return 0; } static int on_new_request(struct xio_session *session, struct xio_msg *req, int more_in_batch, void *cb_prv_data) { struct … /* process request and send a response */ process_request(req); /* attach the original request to response and send it */ response->request = req; xio_send_response(response); return 0; } static int on_new_request(struct xio_session *session, struct xio_msg *req, int more_in_batch, void *cb_prv_data) { struct … /* process request and send a response */ process_request(req); /* attach the original request to response and send it */ response->request = req; xio_send_response(response); return 0; }

Accelio Integration With Other Applications/Projects Accelio is adopted as the high-performance, low-latency, Reliable Messaging/RPC library for variety Open-Source and Commercial products, customer projects Support multiple bindings (Kernel C, User Space C/C++, Java, Python (future)) March 30 – April 2, OFA Developer workshop10

Use case 1: XNBD Accelio based network block device Multi-queue implementation in the block layer for high performance Utilizes Accelio’s facilities and features: –Hardware acceleration for RDMA –Zero data copy –Lockless design –Optimal CPU usage –Reliable message delivery IO operation translation to libaio submit operations to remote device. OpenSource Community project from the ground up: –Code in: Prerequisites: –Accelio 1.1 version and above. –Kernel version and above. March 30 – April 2, OFA Developer workshop11

Use case 2: R-AIO Remote File Access Application Example March 30 – April 2, OFA Developer workshop12 Provide access to a remote file system by redirecting libaio (async file IO) commands to a remote server (which will issue the IO and return the results to the client) Deliver extraordinary performance to remote ram file (/dev/ram) Using 4 CPUs & HW QPs for parallelism Similar performance to local ram file access (i.e. minimal degradation due to communication)

Use case 3: JXIO March 30 – April 2, OFA Developer workshop13 JXIO provides the first RDMA API in JAVA  JXIO is a Java wrapper of Accelio library  Open source project:  Preserves Accelio’s zero copy and performance all the way  Every C struct in Accelio is represented by a matching Java class  Provides 1.5M transactions per second (in Java)  Reliable message delivery  Low memory footprint  Essential component in Mellanox’s HDFS RDMA acceleration solution

Test Configuration Server –HP ProLiant DL380p Gen8 –2 x Intel(R) Xeon(R) CPU E GHz –64 GB Memory Adapters –ConnectX3-Pro VPI (IB FDR or 40GbE) –ConnectIB 16x PCIe –OFED 2.1 OS –RedHat EL 6.4 –Kernel: el6.x86_64 Test –Accelio I/O test utility in C, User space –Request/Responce transactions (RPC) –Over 1 or 2 ports, using auto load balancing based on threads March 30 – April 2, 2014#OFADevWorkshop14

Bandwidth Results March 30 – April 2, 2014#OFADevWorkshop15 ~ 12GB/s with Connect-IB

Transaction Per Second (IOP/s) Results March 30 – April 2, 2014#OFADevWorkshop16 1 I/O Thread (use single port) Hit Max Bandwidth 8 I/O Threads > 9M TP/s with Connect-IB

Round Trip Latency (Request & Response) Results March 30 – April 2, 2014#OFADevWorkshop17 Less than 3us in 1KB Messages Hit Max Bandwidth 8 I/O Threads 1 I/O Thread

Latency Under Maximum Load (Millions of Messages/Sec) March 30 – April 2, 2014#OFADevWorkshop18 1.8M Messages/Sec Hit Max Bandwidth (Link become congested) 1 I/O Thread (use single port) ~9M Messages/Sec With Connect-IB ~9M Messages/Sec With Connect-IB 8 I/O Threads

Open source project  Initiated by Mellanox  Partnership  Companies and Individuals are welcome to join the project and contribute  Web site:  Code in:  Project/Bug tracking:   License: Dual BSD/GPLv2 March 30 – April 2, 2014#OFADevWorkshop19

#OFADevWorkshop Thank You