1 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public InfiniBand: Today and Tomorrow Jamie Riotto Sr. Director.

Slides:



Advertisements
Similar presentations
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
Advertisements

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
Unified Wire Felix Marti, Open Fabrics Alliance Workshop Sonoma, April 2008 Chelsio Communications.
Brocade VDX 6746 switch module for Hitachi Cb500
RDS and Oracle 10g RAC Update Paul Tsien, Oracle.
The Efficient Fabric Presenter Name Title. The march of ethernet is inevitable Gb 10Gb 8Gb 4Gb 2Gb 1Gb 100Mb +
1 © 2001, Cisco Systems, Inc. All rights reserved. NIX Press Conference Catalyst 6500 Innovation Through Evolution 10GbE Tomáš Kupka,
1 InfiniBand HW Architecture InfiniBand Unified Fabric InfiniBand Architecture Router xCA Link Topology Switched Fabric (vs shared bus) 64K nodes per sub-net.
CON Software-Defined Networking in a Hybrid, Open Data Center Krishna Srinivasan Senior Principal Product Strategy Manager Oracle Virtual Networking.
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
Storage area network and System area network (SAN)
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.
Infiniband enables scalable Real Application Clusters – Update Spring 2008 Sumanta Chatterjee, Oracle Richard Frank, Oracle.
New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.
OFA-IWG - March 2010 OFA Interoperability Working Group Update Authors: Mikkel Hagen, Rupert Dance Date: 3/15/2010.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
© Copyright 2010 Hewlett-Packard Development Company, L.P. 1 HP + DDN = A WINNING PARTNERSHIP Systems architected by HP and DDN Full storage hardware and.
Roland Dreier Technical Lead – Cisco Systems, Inc. OpenIB Maintainer Sean Hefty Software Engineer – Intel Corporation OpenIB Maintainer Yaron Haviv CTO.
OFA Interoperability Logo Program Sujal Das, April 30, 2007 Sonoma Workshop Presentation.
Reliable Datagram Sockets and InfiniBand Hanan Hit NoCOUG Staff 2010.
Voltaire The Grid Backbone™ InfiniBand CERN Seminar Asaf Somekh VP Strategic Alliances June 2006.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 1 Steven Carter Cisco Systems Makia Minich,
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
Silicon Building Blocks for Blade Server Designs accelerate your Innovation.
Maximizing The Compute Power With Mellanox InfiniBand Connectivity Gilad Shainer Wolfram Technology Conference 2006.
Open Fabrics BOF Supercomputing 2008 Tziporet Koren, Gilad Shainer, Yiftah Shahar, Bob Woodruff, Betsy Zeller.
The NE010 iWARP Adapter Gary Montry Senior Scientist
InfiniSwitch Company Confidential. 2 InfiniSwitch Agenda InfiniBand Overview Company Overview Product Strategy Q&A.
Gilad Shainer, VP of Marketing Dec 2013 Interconnect Your Future.
CON Software-Defined Networking in a Hybrid, Open Data Center Krishna Srinivasan Senior Principal Product Strategy Manager Oracle Virtual Networking.
ATCA based LLRF system design review DESY Control servers for ATCA based LLRF system Piotr Pucyk - DESY, Warsaw University of Technology Jaroslaw.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
InfiniBand in the Lab Erik 1.
OFED Usage in VMware Virtual Infrastructure Anne Marie Merritt, VMware Tziporet Koren, Mellanox May 1, 2007 Sonoma Workshop Presentation.
Infiniband in EDA (Chip Design) Glenn Newell Sr. Staff IT Architect Synopsys.
High Performance Communication for Oracle using InfiniBand Ross Schibler CTO Topspin Communications, Inc Session id: #36568 Peter Ogilvie Principal Member.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
OpenFabrics Enterprise Distribution (OFED) Update
InfiniBand at Sun Carl Hensler Distinguished Engineer Solaris Engineering Sun Microsystems.
CMS week, June 2002, CERN 1 First P2P Measurements on Infiniband Luciano Berti INFN Laboratori Nazionali di Legnaro.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Connecting to the Network Introduction to Networking Concepts.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
Shawn Hansen Director of Marketing. Windows Compute Cluster Server 2003 Enable scientist and researcher to focus on Science, not IT. Mission: Enable scientist.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
Barriers to IB adoption (Storage Perspective) Ashish Batwara Software Solution Architect May 01, 2007.
OFA-IWG Interop Event April 2007 Rupert Dance Lamprey Networks Sonoma Workshop Presentation.
Internet Protocol Storage Area Networks (IP SAN)
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.
STORAGE ARCHITECTURE/ MASTER): Where IP and FC Storage Fit in Your Enterprise Randy Kerns Senior Partner The Evaluator Group.
© 2007 EMC Corporation. All rights reserved. Internet Protocol Storage Area Networks (IP SAN) Module 3.4.
Next Generation HPC architectures based on End-to-End 40GbE infrastructures Fabio Bellini Networking Specialist | Dell.
20071 Native Infiniband Storage John Josephakis, VP, Data Direct Networks St. Louis – November 2007.
Voltaire and the CERN openlab collaborate on Grid technology project using InfiniBand May 27, 2004 Patrick Chevaux EMEA Business Development
Enhancements for Voltaire’s InfiniBand simulator
Ryan Leonard Storage and Solutions Architect
Appro Xtreme-X Supercomputers
Introduction to Networks
Joint Techs Workshop InfiniBand Now and Tomorrow
IS3120 Network Communications Infrastructure
Module – 7 network-attached storage (NAS)
Infrastructure Transformation - Nexus Technology Update
Storage Networking Protocols
Application taxonomy & characterization
Cost Effective Network Storage Solutions
Presentation transcript:

1 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public InfiniBand: Today and Tomorrow Jamie Riotto Sr. Director of Engineering Cisco Systems (formerly Topspin Communications)

2 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Agenda InfiniBand Today – State of the market – Cisco and InfiniBand – InfiniBand products available now – Open source initiatives InfiniBand Tomorrow – Scaling InfiniBand – Future Issues Q&A

3 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public InfiniBand Maturity Milestones High adoption rates – Currently shipping > 10,000 IB ports / Qtr Cisco acquisition will drive broader market adoption End-to-end price points of <$1000. New Cluster scalability proof-points – 1000 to 4000 nodes

4 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Cisco Adopts InfiniBand Cisco acquired Topspin on May 16, 2005 Adds InfiniBand to Switching Portfolio – Network Switches, Storage Switches, now Server Switches – Creates independent Business Unit to promote InfiniBand & Server Virtualization New Product line of Server Fabric Switches (SFS) – SFS 7000 Series InfiniBand Server Switches – SFS 3000 Series Multifabric Server Switches

5 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Network Switch Clients Network Resources (Internet, Printer, Server) Storage Switch Server Storage (SAN) Server Switch Servers StorageNetwork Cisco and InfiniBand The Server Fabric Switch

6 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Cisco HPC Case Studies

7 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Real Deployments Today: Wall Street Bank with 512 Node Grid SAN LAN 2 96-port TS port TS Server Nodes 2 TS-360 w/ Ethernet and Fibre Channel Gateways Core Fabric Edge Fabric GRID I/O Existing Networks Fibre Channel and GigE connectivity built seamlessly into the cluster

8 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public 520 Dual CPU Nodes 1,040 CPUs NCSA National Center for Supercomputing Applications Tungsten 2: 520 Node Supercomputer Core Fabric Edge Fabric 6 72-port TS port TS uplink cables 512 1m cables 18 Compute Nodes  Parallel MPI codes for commercial clients  Point to point 5.2us MPI latency Deployed: November 2004

9 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public D.E. Shaw Bio-Informatics: 1,066 Node Super Computer Fault Tolerant Core Fabric Edge Fabric port TS port TS-120 1,068 5m/7m/10m/15m uplink cables 1,066 1m cables 12 Compute Nodes 1,066 Fully Non-Blocking Fault Tolerant IB Cluster

10 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Large Government Lab Worlds Largest Commodity Server Cluster – 4096 nodes Application: High Performance Super Computing Cluster Environment: 4096 Dell Servers 50% Blocking Ratio 8 TS-740s 256 TS-120s Benefits: Compelling Price/Performance Largest Cluster Ever Built (by approx. 2X) Expected to be 2nd Largest Supercomputer in the world by node count Core Fabric 8x SFS TS ports each Edge 256x TS ports each 18 Compute Nodes) 8192 Processor 60TFlop SuperCluster 2048 uplinks (7m/10m/15m/20m)

11 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public InfiniBand Products Available Today

12 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public InfiniBand Switches and HCAs Fully non-blocking switch building blocks available in sizes from 24 up to 288 ports. Blade servers offer integrated switches and pass-through modules HCAs available in PCI-X and PCI-Express IP & Fibre-Channel Gateway Modules

13 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Integrated InfiniBand for Blade Servers Create “wire-once” fabric Integrated 10Gbps InfiniBand switches provide unified “wire- once” fabric Optimize density, cooling, space, and cable management. Option of integrated InfiniBand switch (ex: IBM BC) or pass- thru module (ex: Dell 1855) Virtual I/O provides shared Ethernet and Fibre Channel ports across blades and racks IB Switch 10Gbps30Gbps Blade Chassis with InfiniBand Switches HCA

14 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Ethernet and Fibre Channel Gateways Unified “wire-once” fabric SAN Server Fabric LAN/WAN Server Cluster Fibre Channel to InfiniBand gateway for storage access Ethernet to InfiniBand gateway for LAN access Single InfiniBand link for: - Storage - Network Single InfiniBand link for: - Storage - Network

15 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public InfiniBand Price / Performance InfiniBand PCI-Express 10GigEGigEMyrinet DMyrinet E Data Bandwidth (Large Messages) 950MB/s900MB/s100MB/s245MB/s495MB/s MPI Latency (Small Messages) 5us50us 6.5us5.7us HCA Cost (Street Price) $550$2K-$5KFree$535$880 Switch Port$250$2K-$6K$100-$300$400 Cable Cost (3m Street Price) $100 $25$175 Myrinet pricing data from Myricom Web Site (Dec 2004) ** InfiniBand pricing data based on Topspin avg. sales price (Dec 2004) *** Myrinet, GigE, and IB performance data from June 2004 OSU study Note: MPI Processor to Processor latency – switch latency is less

16 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public InfiniBand Cabling CX4 Copper (15m) Flexible 30-Gauge Copper (3m) Fiber Optics up to 150m

17 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Host Drivers for Standard Protocols Open source strategy = reliability at low cost IPoIB: legacy TCP/IP applications SDP: reliable socket connections (optional RDMA) MPI: leading edge HPCC applications (RDMA) SRP: block storage access (RDMA) uDAPL: User level RDMA

18 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public OS Support Operating Systems Available: – Linux (Red Hat, SuSE, Fedora, Debian, etc.) – Windows 2000 and 2003 – HP-UX (Via HP) – Solaris (Via Sun)

19 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public The InfiniBand Driver Architecture BSD SocketsFS API TCP SDP IP Drivers VERBS ETHERINFINIBAND HCA DAT FILE SYSTEM SCSI SRP FC FCP SDP INFINIBANDSAN API BSD Sockets NFS-RDMA LAN/WANSERVER FABRIC SAN INFINIBAND SWITCH ETHER SWITCH FC SWITCH FC GW E ETH GW NETWORK APPLICATION UDAPL TS IPoIB User Kernel

20 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Open Software Initiatives OpenIB.org – Topspin primary authors of major portions including IPoIB, SDP, SRP and TS-API. Cisco will continue to invest. – Current protocol development nearing production quality code. Expect release by end of year. – Charter has been expanded to include Windows and iWarp – MPI will be available in the near future (MVAPICH 0.96) OpenSM OpenMPI

21 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public InfiniBand Tomorrow

22 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Looking into the future Cost Speed Distance Limitations Cable Management Scalability IB and Ethernet

23 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Speed: InfiniBand DDR / QDR, 4X / 12X DDR Available end of 2005 Doubles wire speeds to ? (ok, still working on this one) PCI-Express DDR Distances of 5-10m using copper Distances of 100m using fiber QDR Available WHEN? 12X (30 Gb/s) available for over one year!! – Not interesting until 12X HCA Not interesting until > 16X PCIe

24 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Future InfiniBand Cables InfiniBand over CAT5 / CAT6 / CAT7 Shielded cable distances up to ??? Leverage existing 10-GigE cabling 10-GigE too expensive?

25 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public IB Distance Scaling IB Short Haul – New Copper drivers –25 – 50 Meters (KeyEye) – Meters (IEEE 10Ge) IB Wan – Same Subnet over distance (300 KM target) – Buffer / Credit / Timeout issues – Applications: Disaster Recover, Data Mirroring IB Long Haul – IB over IP (over SONET?) – utilizes existing public plant (WDM, Debugging, etc)

26 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Scaling InfiniBand Subnet Management Host-side Drivers MPI IPoIB SRP Memory Utilization

27 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public IB Subnet Manager Subnets are getting bigger – 4,000 -> 10,000 nodes – Topology convergence times Topology disturbance times Topology disturbance minimization

28 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Subnet Management Challenges Cluster Cold Start times –Template Routing – Persistent Routing Cluster Topology Change Management – Intentional Change - Maintenance – Unintentional Change – Dealing with Faults How to impact minimum number of connections Predetermine fault reaction strategy? Topology Diagnostic Tools – Link/Route Verification – Built-in BERT testing Partition Management

29 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Multiple Routing Models Minimum Latency Routing: – Load-Balanced Shortest-Path Routing Minimum Contention Routing: – Lowest-Interference Divergent-Path Routing Template Driven Routing: – Supports Pre-Determined Routing Topology – For example: Clos Routing, Matrix Row/Column, etc – Automatic Cabling Verification for Large Installations

30 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public IB Routing Challenges Static / Dynamic Routing – IB impliments Static Routing through Linear Forwarding Tables at each chip – Multi-LID Routing enables Dynamic Routing Credit Loops Cost Base Routing – Speed mismatches cause Store & Forward (vs. cut through) – SDR <> DDR <>QDR – 4X <> 12X – Short Haul <> Long Haul

31 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Multi-LID Source-Based Routing Support Applications can implement “Dynamic” Routing for Contention Avoidance, Failover, Parallel Data Transfer 1,2,3,4 Spine SwitchesLeaf Switches

32 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public New IB Peripherals CPUs? Storage – SAN – NFS-RDMA Memory (coherent / non-coherent) Purpose built Processors? – Floating Point Processors – Graphics Processors – Pattern Matching Hardware – XML Processor

33 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public THANK YOU! Questions & Answers