Rob Davis, Mellanox Ilker Cebeli, Samsung

Slides:



Advertisements
Similar presentations
© 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance Measurements of a User-Space.
Advertisements

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
© 2013 Sri U-Thong Limited. All rights reserved. This presentation has been prepared by Sri U-Thong Limited and its holding company (collectively, “Sri.
Differentiated I/O services in virtualized environments
Emerging Markets Dividends An Enduring Theme September 2012.
Virtunet Systems Elevator Pitch
STRATCAP Introduces: Special Project ANAKLIA, GEORGIA 2011.
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) SriramGopinath( )
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Comparative Operating Systems Fall 2001 An Examination of Embedded Linux as a Real Time Operating System Mark Mahoney.
1 Some Context for This Session…  Performance historically a concern for virtualized applications  By 2009, VMware (through vSphere) and hardware vendors.
SRP Update Bart Van Assche,.
Commodity Hedging Overview May 10, 2012 The following information is current as of May 10, Memorial Production Partners LP (MEMP) intends to provide.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
1 Lecture 20: I/O n I/O hardware n I/O structure n communication with controllers n device interrupts n device drivers n streams.
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) Sriram Gopinath( )
The NE010 iWARP Adapter Gary Montry Senior Scientist
1.4 Open source implement. Open source implement Open vs. Closed Software Architecture in Linux Systems Linux Kernel Clients and Daemon Servers Interface.
Economic & Market Recap May Equity and Fixed Income Markets.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
ENW-9800 Copyright © PLANET Technology Corporation. All rights reserved. Dual 10Gbps SFP+ PCI Express Server Adapter.
Switched Storage Architecture Benefits Computer Measurements Group November 14 th, 2002 Yves Coderre.
5/8/09 Titanium Performance Update 2 Dot Hill Confidential.
iSER update 2014 OFA Developer Workshop Eyal Salomon
1.4 Open source implement. Open source implement Open vs. Closed Software Architecture in Linux Systems Linux Kernel Clients and Daemon Servers Interface.
1 November 11, 2015 The Evolution of Memory Architecture November 11, 2015 Dr. Amit Berman Senior Engineer, Samsung.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Annual Meeting of Shareholders June 10, 2015 Welcome!
PernixData FVP & Architect Storage that is Fast, Scalable and Predictable Frank Brix Pedersen Systems Engineer -
1 The Power of Dividend Growth DISCLOSURE This information has been provided by RBC Global Asset Management Inc. (RBC GAM) and is for informational.
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Scaling Ethernet Victor Lama’s Concept of the Week – 10/2/2010 G500-Fabric Specialist.
Quality and Value for the Exam 100% Guarantee to Pass Your Exam Based on Real Exams Scenarios Verified Answers Researched by Industry.
Learnings from the first Plugfest
Balazs Voneki CERN/EP/LHCb Online group
Global Client Value October 2011
TLDK overview Konstantin Ananyev 05/08/2016.
Joint Genome Institute
Diskpool and cloud storage benchmarks used in IT-DSS
Persistent Memory over Fabrics
OCP: High Performance Computing Project
Multi-PCIe socket network device
Introduction to Networks
RDMA Extensions for Persistency and Consistency
Enabling the NVMe™ CMB and PMR Ecosystem
HyperLoop: Group-Based NIC Offloading to Accelerate Replicated Transactions in Multi-tenant Storage Systems Daehyeok Kim Amirsaman Memaripour, Anirudh.
iSCSI-based Virtual Storage System for Mobile Devices
Event Building With Smart NICs
……. Date. Disclaimer… Randgold reports its mineral resources and mineral reserves in accordance with the JORC 2012 code. As such numbers.
NVMHCI Workgroup and T10 Collaboration Proposal
Compal Electronics, Inc. 4Q17 Consolidated Financial Results
Merrill Lynch Communications Forum 2005
Windows Virtual PC / Hyper-V
Chapter 15: File System Internals
A Scalable Approach to Virtual Switching
Compal Electronics, Inc. 3Q18 Consolidated Financial Results
High Throughput Application Messaging
NVMe™/TCP Development Status and a Case study of SPDK User Space Solution 2019 NVMe™ Annual Members Meeting and Developer Day March 19, 2019 Sagi Grimberg,
Open Source Activity Showcase Computational Storage SNIA SwordfishTM
Accelerating Applications with NVM Express™ Computational Storage 2019 NVMe™ Annual Members Meeting and Developer Day March 19, 2019 Prepared by Stephen.
Compal Electronics, Inc. 1Q19 Consolidated Financial Results
Virtunet Systems Elevator Pitch
© 2013 Sri U-Thong Limited. All rights reserved
Microsoft Virtual Academy
Forward-Looking Statements
Fy16 earnings update 12 August 2016.
Factors Driving Enterprise NVMeTM Growth
Mario open Ethernet drive architecture Introducing a new technology from HGST Mario Blandini
Presentation transcript:

Rob Davis, Mellanox Ilker Cebeli, Samsung Accelerating NVMe™ over Fabrics with Hardware Offloads at 100Gb/s and Beyond Rob Davis, Mellanox Ilker Cebeli, Samsung

Disclaimer This presentation and/or accompanying oral statements by Samsung representatives collectively, the “Presentation”) is intended to provide information concerning the SSD and memory industry and Samsung Electronics Co., Ltd. and certain affiliates (collectively, “Samsung”).  While Samsung strives to provide information that is accurate and up-to-date, this Presentation may nonetheless contain inaccuracies or omissions.  As a consequence, Samsung does not in any way guarantee the accuracy or completeness of the information provided in this Presentation.   This Presentation may include forward-looking statements, including, but not limited to, statements about any matter that is not a historical fact; statements regarding Samsung’s intentions, beliefs or current expectations concerning, among other things, market prospects, technological developments, growth, strategies, and the industry in which Samsung operates; and  statements regarding products or features that are still in development.  By their nature, forward-looking statements involve risks and uncertainties, because they relate to events and depend on circumstances that may or may not occur in the future. Samsung cautions you that forward looking statements are not guarantees of future performance and that the actual developments of Samsung, the market, or industry in which Samsung operates may differ materially from those made or suggested by the forward-looking statements in this Presentation. In addition, even if such forward-looking statements are shown to be accurate, those developments may not be indicative of developments in future periods. .

NVMe™ over Fabrics Market Success

NVMe™ over Fabrics Maturity UNH-IOL provides a neutral environment for multi- vendor interoperability and conformance to standards testing since 1988 In May 2017 and again in October they hosted the first and second test for NVMe-oF™ Test plans called for participating vendors to mix and match their NICs in both Target and Initiator positions Testing was successful with near line rate performance at 25Gb/s achieved at the first test

Time for the Next Level of Performance Current Performance 6M IOPs, 512B block size 2M IOPs, 4K block side 50% CPU utilization ~15usec latency difference from local How do we lower the latency difference and CPU utilization?

Some of the use cases for NVMe™ Over Fabrics

Performance Test Configuration – 2016 1x NVMe-oF™ target 24x NVMe 2.5” SSDs 2x 100GbE NICs Dual x86 CPUs 4x initiator hosts 2x25GbE NICs each Open Source NVMe-oF kernel drivers

Local vs. Remote Latency Comparison – 2016 Read Gap Write Gap ~17 us ~9 us

Performance Test Configuration – 2017 1x NVMe-oF™ target 36x NF1 SSDs 2x 100GbE NICs, 2x 50GbE NICs Dual x86 CPUs 6x initiator clients 2x25Gb/s each Open Source NVMe-oF kernel drivers Ubuntu Linux 16.04/4.9 on Target 18x NF1 Drives 18x NF1 Drives

Local vs. Remote Latency Comparison - 2017 2017 Tests Read Gap Write Gap ~14 us ~10 us 2016 Tests Read Gap Write Gap ~17 us ~9 us

SSDs Will Continue to get Faster 2017 Tests Read Gap Write Gap ~14 us ~10 us 2016 Tests Read Gap Write Gap ~17 us ~9 us

Closing the Local vs. Remote Performance Gap ~15usec latency (not including SSD) 50% CPU utilization ~5usec latency (not including SSD) 0.01% CPU utilization

No Offload - Initiator Requests and Responses to Target Go Through Software

How Offload Works Offload Only control path, management and exceptions go through Target CPU software Data path and NVMe™ commands handled by the network adapter

Results – CPU Utilization – Latency Offload Local vs. Remote Latency ~5usec

NVMe-oF™ Offload Magnifies CMB Value Control Memory Buffer (CMB) Use SSD and Network Adapter with NVMe-oF™ protocol offload to bypass CPU memory controller completely

Summary NVMe™ over Fabrics is taking off Large and small vendors in productions Multi-vendor interoperability NVMe over Fabrics protocol offload moves the performance even closer to local SSD Control Memory Buffer (CMB) value is dramatically enhanced with NVMe over Fabrics protocol offload on the network adapter 8M IOPs, 512B block size 5M IOPs, 4K block side 0.01% CPU utilization ~5usec latency (not including SSD)