Leveraging NVMe-oFTM for Existing and New Applications

Slides:

Advertisements

Similar presentations

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.

Advertisements

Oracle Exalogic Elastic Cloud Vysoký výkon pre Javu, Middleware a Aplikácie Mikuláš Strelecký, Oracle Slovensko.

Profit from the cloud TM Parallels Dynamic Infrastructure AndOpenStack.

Managing storage requirements in VMware Environments October 2009.

CON Software-Defined Networking in a Hybrid, Open Data Center Krishna Srinivasan Senior Principal Product Strategy Manager Oracle Virtual Networking.

IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.

© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

Module – 4 Intelligent storage system

The NE010 iWARP Adapter Gary Montry Senior Scientist

InfiniSwitch Company Confidential. 2 InfiniSwitch Agenda InfiniBand Overview Company Overview Product Strategy Q&A.

1 Dell World 2014 Reducing IT complexity and improving utilization though convergence infrastructure solutions Dell World 2014 Ravi Pendekanti, Vice President,

March 9, 2015 San Jose Compute Engineering Workshop.

Srihari Makineni & Ravi Iyer Communications Technology Lab

Hyper-V Performance, Scale & Architecture Changes Benjamin Armstrong Senior Program Manager Lead Microsoft Corporation VIR413.

VMware vSphere Configuration and Management v6

Rick Claus Sr. Technical Evangelist,

 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.

iSER update 2014 OFA Developer Workshop Eyal Salomon

STORAGE ARCHITECTURE/ MASTER): Disk Storage: What Are Your Options? Randy Kerns Senior Partner The Evaluator Group.

STORAGE ARCHITECTURE/ MASTER): Where IP and FC Storage Fit in Your Enterprise Randy Kerns Senior Partner The Evaluator Group.

Tackling I/O Issues 1 David Race 16 March 2010.

Network Virtualization Policy-Based Isolation QoS Performance Metrics Live & Storage Migrations Cross-Premise Connectivity Dynamic & Multi-Tenant.

Under the Hood with NVMe over Fabrics

2014 Redefining the Data Center: White-Box Networking Jennifer Casella October 9, 2014 #GHC

System Storage TM © 2007 IBM Corporation IBM System Storage™ DS3000 Series Jüri Joonsaar Tartu.

E2800 Marco Deveronico All Flash or Hybrid system

Ryan Leonard Storage and Solutions Architect

Balazs Voneki CERN/EP/LHCb Online group

Video Security Design Workshop:

Module 2: DriveScale architecture and components

Windows Server* 2016 & Intel® Technologies

BD-CACHE Big Data Caching for Datacenters

Direct Attached Storage and Introduction to SCSI

HP Client Virtualization Build on HP Converged Infrastructure

What is Fibre Channel? What is Fibre Channel? Introduction

Sebastian Solbach Consulting Member of Technical Staff

Appro Xtreme-X Supercomputers

Windows Server* 2016 & Intel® Technologies

OCP: High Performance Computing Project

HPE Persistent Memory Microsoft Ignite 2017

Multi-PCIe socket network device

Introduction to Networks

RDMA Extensions for Persistency and Consistency

Software Architecture in Practice

GGF15 – Grids and Network Virtualization

SCSI over PCI Express (SOP) use cases

Module – 7 network-attached storage (NAS)

Direct Attached Storage and Introduction to SCSI

Tushar Gohad, Intel Moshe Levi, Mellanox Ivan Kolodyazhny, Mirantis

G2M Research NVMe All-Flash Array (AFA) Panel Webinar March 6, 2018

Enabling the NVMe™ CMB and PMR Ecosystem

Marrying OpenStack and Bare-Metal Cloud

Software Defined Networking (SDN)

Hyperconvergence Your Way

IBM® Storwize V3700 (Business Partner Version)

Building continuously available systems with Hyper-V

Specialized Cloud Architectures

Cost Effective Network Storage Solutions

Open Source Activity Showcase Computational Storage SNIA SwordfishTM

Accelerating Applications with NVM Express™ Computational Storage 2019 NVMe™ Annual Members Meeting and Developer Day March 19, 2019 Prepared by Stephen.

Microsoft Virtual Academy

Microsoft Virtual Academy

© 2016 Global Market Insights, Inc. USA. All Rights Reserved Ethernet Storage Market Size Growth During Forecast Period.

CS 295: Modern Systems Organizing Storage Devices

NVMe Markets: Rising at Breakneck Speed with Even Better Days Ahead Eric Burgener, Research Vice President Infrastructure Systems, Platforms and Technologies.

Factors Driving Enterprise NVMeTM Growth

OpenStack for the Enterprise

Cluster Computers.

Presentation transcript:

Leveraging NVMe-oFTM for Existing and New Applications Sponsored by NVM Express™ organization, the owner of NVMe™, NVMe-oF™ and NVMe-MI™ standards

Speakers Marcus Thordal Bryan Cowger Erez Scop Nishant Lodha

Agenda In this panel session we will discuss application and use case examples leveraging NVMe- oFTM Which improvements should you expect with NVMe-oF and how does this apply to different types of applications. Learn from early adopters implementations of NVMe-oF, what you need to consider when planning to migrate existing applications from SCSI to NVMeTM or deploying new applications with NVMe-oF We will complete the session with a peak into the future of how NVMe-oF can enable new application use cases.

Application Requirements & NVMeTM Fabric Selection Enterprise Applications Rigid application storage architecture Application architedcture cannot be changed. Hyper-Scale Applications Full control of application storage architecture Application can be architected and composed to be less dependent on Fabric Properties Analytics, ML & AI Architecture traditionally DAS Changing to centralized shared storage Science project or Enterprise dependent Application? Fabric requirements: High Reliability Low Latency Deterministic Performance Scale Fabric requirements: Reasonable Reliability Low Latency Deterministic Performance? Scale needs can be confined to rack locations Fabric requirements: Reasonable/High Reliability Low Latency Deterministic Performance Scale

Areas of Performance Improvement with NVMeTM over Fabrics End to End Performance Improvements Areas of Improvement with NVMe-oFTM AP P O S H Y P E R SCSI NVMe N I C / H B A N I C / H B A SCSI NVMe Storage Services & Acceleration CACHE SCSI HDDs SCSI SSDs FC-NVMe NVMe SSDs Storage Array Architecture Performance Improvement is an optimized path through the controller Server Performance Improvement with shorter path through the OS storage stack with NVMe™ & NVMe-oF™ -and lower CPU utilization Front side of Storage Array Performance Improvement is a shorter path through the target’s host port stack Back side of Storage Array Performance improvement by moving from SAS/SATA drives to NVMe SSDs

Real World Performance Advantages with NVMe over Fibre Channel Slide from Webinar:

The Future with NVMe-oFTM NVMe-oFTM will change how storage is consumed Storage Fabric Impact

The Future with NVMe-oFTM NVMe-oFTM will change how storage is consumed Storage Fabric Impact Storage Architecture Changes

The Future with NVMe-oFTM NVMe-oFTM will change how storage is consumed Storage Fabric Impact Storage Architecture Changes Application Architecture Changes

The Future with NVMe-oFTM NVMe-oFTM will change how storage is consumed Storage Fabric Impact Storage Architecture Changes Application Architecture Changes

Agenda Composable Infrastructure – enabled by NVMe-oFTM Various composable storage options Hardware-centric architecture TCP vs. RDMA transport options Performance results using HW-centric designs

Today’s “Shared Nothing” Model a.k.a. DAS CPU CPU, Memory, etc. Challenges: - Forces the up-front decision of how much storage to devote to each server. - Locks in the compute:storage ratio. SSD SSD Dedicated Storage (HDDs -> SSDs) SSD SSD

The Composable Datacenter Utilized SSDs Spare SSDs App A: Needs 1 SSD CPU SSD SSD Spares / Expansion Pool Minimize Dark Flash Buy storage only as needed Power SSDs only as needed App B: Needs 2 SSDs CPU SSD SSD SSD App C: Needs 3 SSDs CPU SSD SSD SSD SSD Pool of CPUs Pool of Storage

A Spectrum of Options Myriad choices for NVMe-oFTM / Composable Infrastructure target deployments Storage Servers Network Processors Dedicated Bridge Versatility L M H Cost L M H Power L M H Footprint L M H

Typical NVMe-oFTM Deployment JBOF / EBOF / FBOF Fabric . . . Fabric-to- PCIe bridge Server(s) Ethernet Fanout SSDs NVMeTM over Fabrics NVMe (over PCIe ®)

NVMe-oFTM Bridge Architecture PCIe ® Root Complex (PRC) PCIe Abstraction Layer (AL) TFC Main Memory (TMM) NVMe Engine (NE) RDMA Engine (RE) Transport Engine (TE) MPU Ethernet MAC Rx FIFO Tx FIFO Rx Parser/Router Rx Packet Buffer Rx DMA TSD Mem Tx Framer Data Pages SQs CQs SQE CQE Int µP Doorbell Comp DMA Req Capsule FW HW Approximately 150 FSMs running in parallel Some simple, e.g. buffer management Some quite complex: RoCE v1, v2 iWARP /TCP TCP NVMeTM NVMe-oFTM Slow-path implemented in FW

Performance Across Various Transport Protocols Configuration Transport Protocol 4kB Read (M IOPS) 4kB Write 128kB Read (GB/s) 128kB Write (GB/s) Fuji Config: Ethernet: 2x50Gb PCIe: 2x8 gen3 Enclosure: Kazan “K8” 4 SSDs per PCIe port SSDs: 8 WDC 1.6TB “Gallant Fox” RoCE v1 2.76 1.08 11.8 10.3 RoCE v2 2.79 1.10 11.5 9.9 iWARP 2.77 1.17 9.7 TCP 2.65 1.0 11.2 9.6 Kazan Networks Confidential

Single SSD Performance Optane™ 4kB Read 4kB Write 128kB Read 128kB Write Kazan Networks Confidential

Loaded Latency – Optane™ vs. Outstanding Commands µsec Outstanding Commands 1 16 64 256 DAS NVMe-oFTM Kazan Networks Confidential

Incremental Latency Optane™ SSD Target Benchmark case Real-world No latency penalty!!! Benchmark case Kazan Networks Confidential

Summary / Takeaways Lots of options now hitting the market to enable Composable Infrastructure Very low-cost, low-power options for JBOFs No loss in performance compared to DAS IOPS, bandwidth, and latency all equivalent

Mellanox NVMeTM SNAP Use Case Erez Scop, Mellanox Technologies

Storage Disaggregation Motivation Problem Solution Grow storage and/or compute independently Requires software changes NVMeTM SNAP No local disks needed (disk-less) Move local NVMe drives to centralized location Higher performance per node Immediate CAPX saving Lower MTBF RDMA software stack NVMe-oFTM drivers – limited OS support Different management Compute nodes see NVMe local drives (Emulated in-hardware) Zero software changes Supported on all OSs Latency as local NVMe drive Bandwidth up to network available (100Gbps and above)

NVMe Standard Host Driver NVMe Standard Host Driver NVMeTM SNAP OS/Hypervisor PCIe NVMe Standard Host Driver Management I/O NVMe Controller Physical Local NVMe SSD Storage Host Server – local disk Host Server – with NVMe SNAP PCIe BUS OS/Hypervisor NVMe Standard Host Driver Management I/O NVMe Controller NVMe SNAP Emulated NVMe SSD Storage Remote Storage Emulated NVMeTM PCI drives OS agnostic Software defined Hardware accelerated Bootable NVMe SRIOV support PCIe NVMe-oF JBOF

Solution – Disaggregation with NVMeTM SNAP Utilizing NVMe-oFTM latency and throughput for ‘local feel’ Scale Storage independently Scale Compute independently Save $$ on Data Center Storage Improved MTBF

Customer’s proprietary protocol NVMeTM SNAP Internals HOST IO queues NVMe Admin queue SPDK advantages Efficient memory management Zero-copy all the way Full polling Multi queues, multi threads, lockless Well defined APIs: vBdev, Bdev drivers NVMe SNAP emulation SDK Handle NVMeTM registers and admin Customer’s proprietary code BDEV: for proprietary storage network protocols vBDEV: for per-io routing decisions, RAIDs, etc Customer’s proprietary protocol BDEV vBDEV Customer’s vBdev NVMe-oF Initiator NVMe over TCP Linux POSIX AIO Optional, e.g. RAID / LVM SPDK bundle NVMe emu controller NVMe SNAP SDK SmartNIC Optional, customers proprietary logic

Use Case driven Fabric Choices – Lessons Learnt Nishant Lodha, Marvell

Making the right “fabric” choice! Not “just” about “fabrics” performance Culture and Install Base Use Cases

NVMeTM/RDMA (Ethernet) FC-NVMe (Fibre Channel) Use Cases by Fabric No one size fits all! NVMeTM/RDMA (Ethernet) FC-NVMe (Fibre Channel) NVMe/TCP (Ethernet) DAS, HPC, AI/ML Enterprise Applications All Applications Simplicity is key. Balance of performance and cost Performance at the cost of complexity Leverage existing infrastructure. Reliability is key Logos are indicative of workload characteristics only.

New: Hyper Converged meets Dis-aggregated Challenge Scale Storage Independent of Compute for HCI Solution Pool local and remote NVMeTM by deploying NVMe-oFTM connected EBOF Benefits: Retain the simplicity of HCI management and provisioning Deliver storage services in software Reduce captive storage in a server When: Next generation HCI fabrics being enabled to consume external EBOF

New: External Storage for Deep Learning Clusters Deep Learning architectures require access to low latency and high capacity storage Storage Pools vs. Captive per node storage A 25/100GbE NVMe-oFTM fabric provides excellent scalability Delivers a high-performance data platform for deep learning, with performance on par with locally resident datasets Shared Pool of NVMe Storage 100GbE NVMe-oF Deep Learning Nodes

Web 2.0 Use Cases – Average does not cut it! Web 2.0 Traffic require “consistent” response times from underlying storage and networking infrastructures. Fabric decisions based on “Average” NVMe-oFTM latencies are just not ”ok” Tail latency measurements indicate how well the outliers perform I/O cost to CPU also helps predict latency variances in cases of heavy load Source: Marvell Internal test labs.

Questions?