High Performance Communication for Oracle using InfiniBand Ross Schibler CTO Topspin Communications, Inc Session id: #36568 Peter Ogilvie Principal Member.

Slides:



Advertisements
Similar presentations
IBM Software Group ® Integrated Server and Virtual Storage Management an IT Optimization Infrastructure Solution from IBM Small and Medium Business Software.
Advertisements

© 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance Measurements of a User-Space.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
Oracle Exalogic Elastic Cloud Vysoký výkon pre Javu, Middleware a Aplikácie Mikuláš Strelecký, Oracle Slovensko.
Brocade VDX 6746 switch module for Hitachi Cb500
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
RDS and Oracle 10g RAC Update Paul Tsien, Oracle.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Server Platforms Week 11- Lecture 1. Server Market $ 46,100,000,000 ($ 46.1 Billion) Gartner.
Efficient Cloud Computing Through Scalable Networking Solutions.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
CON Software-Defined Networking in a Hybrid, Open Data Center Krishna Srinivasan Senior Principal Product Strategy Manager Oracle Virtual Networking.
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
Module – 7 network-attached storage (NAS)
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji  Hemal V. Shah ¥ D. K. Panda 
IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.
Product Manager Networking Infrastructure Choices for Storage.
File Systems and N/W attached storage (NAS) | VTU NOTES | QUESTION PAPERS | NEWS | VTU RESULTS | FORUM | BOOKSPAR ANDROID APP.
Infiniband enables scalable Real Application Clusters – Update Spring 2008 Sumanta Chatterjee, Oracle Richard Frank, Oracle.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.
SRP Update Bart Van Assche,.
Roland Dreier Technical Lead – Cisco Systems, Inc. OpenIB Maintainer Sean Hefty Software Engineer – Intel Corporation OpenIB Maintainer Yaron Haviv CTO.
Reliable Datagram Sockets and InfiniBand Hanan Hit NoCOUG Staff 2010.
1 March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1, Pavan Balaji 2, Ahmad.
Database Edition for Sybase Sales Presentation. Market Drivers DBAs are facing immense time pressure in an environment with ever-increasing data Continuous.
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
Maximizing The Compute Power With Mellanox InfiniBand Connectivity Gilad Shainer Wolfram Technology Conference 2006.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
The NE010 iWARP Adapter Gary Montry Senior Scientist
InfiniSwitch Company Confidential. 2 InfiniSwitch Agenda InfiniBand Overview Company Overview Product Strategy Q&A.
Data Dependent Routing may not be necessary when using Oracle RAC Ken Gottry Apr-2003 Through Technology Improvements in: Oracle 9i - RAC Oracle 9i - CacheFusion.
CON Software-Defined Networking in a Hybrid, Open Data Center Krishna Srinivasan Senior Principal Product Strategy Manager Oracle Virtual Networking.
2006 Sonoma Workshop February 2006Page 1 Sockets Direct Protocol (SDP) for Windows - Motivation and Plans Gilad Shainer Mellanox Technologies Inc.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Srihari Makineni & Ravi Iyer Communications Technology Lab
ENW-9800 Copyright © PLANET Technology Corporation. All rights reserved. Dual 10Gbps SFP+ PCI Express Server Adapter.
OFED Usage in VMware Virtual Infrastructure Anne Marie Merritt, VMware Tziporet Koren, Mellanox May 1, 2007 Sonoma Workshop Presentation.
User-mode I/O in Oracle 10g with ODM and DAFS Jeff Silberman Systems Architect Network Appliance Session id: Margaret Susairaj Server Technologies.
Integrating New Capabilities into NetPIPE Dave Turner, Adam Oline, Xuehua Chen, and Troy Benjegerdes Scalable Computing Laboratory of Ames Laboratory This.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
CMS week, June 2002, CERN 1 First P2P Measurements on Infiniband Luciano Berti INFN Laboratori Nazionali di Legnaro.
Sockets Direct Protocol Over InfiniBand in Clusters: Is it Beneficial? P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, J. Wu and D. K. Panda.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
Full and Para Virtualization
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
Sonoma Workshop 2008 OpenFabrics at 40 and 100 Gigabits? Bill Boas, Vice-Chair
Barriers to IB adoption (Storage Perspective) Ashish Batwara Software Solution Architect May 01, 2007.
Internet Protocol Storage Area Networks (IP SAN)
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
© 2007 EMC Corporation. All rights reserved. Internet Protocol Storage Area Networks (IP SAN) Module 3.4.
Thomas Baus Senior Sales Consultant Oracle/SAP Global Technology Center Mail: Phone:
Experiences with VI Communication for Database Storage Yuanyuan Zhou, Angelos Bilas, Suresh Jagannathan, Cezary Dubnicki, Jammes F. Philbin, Kai Li.
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Enhancements for Voltaire’s InfiniBand simulator
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Introduction to Networks
Joint Techs Workshop InfiniBand Now and Tomorrow
Module – 7 network-attached storage (NAS)
Storage Networking Protocols
Application taxonomy & characterization
Presentation transcript:

High Performance Communication for Oracle using InfiniBand Ross Schibler CTO Topspin Communications, Inc Session id: #36568 Peter Ogilvie Principal Member of Technical Staff Oracle Corporation

Session Topics  Why the Interest in InfiniBand Clusters  InfiniBand Technical Primer  Performance  Oracle 10g InfiniBand Support  Implementation details

Why the Interest in InfiniBand  InfiniBand is key new feature in Oracle 10g Enhances price/performance and scalability; simplifies systems  InfiniBand fits broad movement towards lower costs Horizontal scalability; converged networks, system virtualization...grid  Initial DB performance & scalability data is superb Network tests done; Application level benchmarks now in progress  InfiniBand is widely supported standard - available today Oracle…Dell, HP, IBM, Network Appliance, Sun and ~100 others involved.  Tight alliance btw Oracle and Topspin enables IB for 10g Integrated & tested; delivers complete Oracle “wish list” for high speed interconnects

Server Revenue Mix 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% $0-2.9K$3-5.9K$6-9.9K$ K $ K $ K $ K $ K $ K $1M-3M$3M+ Price Band Share of Revenues Source: IDC Server Tracker, 12/ % Entry Mid High-End Server Revenue Mix 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% $0-2.9K$3-5.9K$6-9.9K$ K $ K $ K $ K $ K $ K $1M-3M$3M+ Price Band Share of Revenues Source: IDC Server Tracker, 12/ % 39% Entry Mid High-End Server Revenue Mix 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% $0-2.9K$3-5.9K$6-9.9K$ K $ K $ K $ K $ K $ K $1M-3M$3M+ Price Band Share of Revenues % 39% 43% Entry Mid High-End Source: IDC Server Tracker, 12/2002 System Transition Presents Opportunity  Major shift to standard systems - blade impact not even factored in yet  Customer benefits from scaling horizontally across standard systems – Lower up-front costs, Granular scalability, High availability

The Near Future Server Revenue Mix 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% $0-2.9K$3-5.9K$6-9.9K$ K $ K $ K $ K $ K $ K $1M-3M$3M+ Price Band Share of Revenues Scale OutScale Up Legacy & Big Iron Apps Database Clusters & Grids  Market Splits around Scale-Up vs. Scale-Out  Database grids provide foundation for scale out  InfiniBand switched computing interconnects are critical enabler Enterpris e Apps Web Services

Application Servers Shared Storage Oracle RAC Gigabit Ethernet Traditional RAC Cluster Fibre Channel

Application Servers Shared Storage Oracle RAC OUCH! Scalability within the Database Tier limited by Interconnect Latency, Bandwidth, and Overhead Gigabit Ethernet Three Pain Points OUCH! OUCH! Throughput Between the Application Tier and Database Tier limited by Interconnect Bandwidth, and Overhead I/O Requirements driven by number of servers instead of application performance requirements Fibre Channel

Application Servers Shared Storage Oracle RAC Clustering with Topspin InfiniBand

Application Servers Shared Storage Oracle RAC Removes all Three Bottlenecks Central server to storage I/O scalability through InfiniBand switch Removes I/O bottlenecks to storage and provides smoother scalability InfiniBand provides 10 Gigabit low latency interconnect for cluster Application tier can run over InfiniBand, benefiting from same high throughput and low latency as cluster

Example Cluster with Converged I/O Ethernet to InfiniBand gateway for LAN access  Four Gigabit Ethernet ports per gateway  Create virtual Ethernet pipe to each server Fibre Channel to InfiniBand gateway for storage access  Two 2Gbps Fibre Channel ports per gateway  Create 10Gbps virtual storage pipe to each server InfiniBand switches for cluster interconnect  Twelve 10Gbps InfiniBand ports per switch card  Up to 72 ports total ports with optional modules  Single fat pipe to each server for all network traffic Industry Standard Storage Industry Standard Server Industry Standard Network Industry Standard Storage Industry Standard Server Industry Standard Network Industry Standard Server

Topspin InfiniBand Cluster Solution Ethernet or Fibre Channel Gateway modules Integrated System and Subnet management Family of switches Host Channel Adapter With Upper Layer Protocols Protocols  uDAPL  SDP  SRP  IPoIB Platform Support  Linux: Redhat, Redhat AS, SuSE  Solaris: S10  Windows: Win2k & 2003  Processors: Xeon, Itanium, Opteron Cluster Interconnect with Gateways for I/O Virtualization

 InfiniBand is a new technology used to interconnect servers, storage and networks together within the datacenter  Runs over copper cables (<17m) or fiber optics (<10km)  Scalable interconnect: – 1X = 2.5Gb/s – 4X = 10Gb/s – 10X = 30Gb/s InfiniBand Primer

ServerServer InfiniBand Nomenclature Ethernet Storage Network Topspin 360/90 Host Host Host Host Host Host Host Host Host Host Host Host Host Host Host Host Host Host Host Host Host Server Server CPU Host Interconnect Mem Cntlr System Memory IB Link HCA SM Switch IB Link TCA IB Link TCA Ethernet link IB Link FC link

InfiniBand Nomenclature SM Switch IB Link TCA IB Link TCA Ethernet link IB Link FC link CPU Host Interconnect Mem Cntlr System Memory IB Link HCA  HCA – Host Channel Adaptor  SM - Subnet manager  TCA – Target Channel Adaptor

Kernel Bypass Kernel Bypass Model Hardware Application Kernel User TCP/IP Transport Driver uDAPL Sockets Layer SDP async sockets

NIC Copy on Receive CPU Host Interconnect Mem Cntlr Server (Host) interconnect System Memory OS Buffer App Buffer Data traverses bus 3 times

With RDMA and OS Bypass HCA CPU Host Interconnect Mem Cntlr Server (Host) interconnect System Memory OS Buffer App Buffer Data traverses bus once, saving CPU and memory cycles

6.4Gb/s 3.2Gb/s 1.2Gb/s APIs and Performance BSD Sockets Async I/O extension Application 1GE RDMA IPoIB TCP IP SDP 10G IB 0.8Gb/s uDAPL

Why SDP for OracleNet & uDAPL for RAC?  RAC IPC – Message based – Latency sensitive – Mixture of previous APIs  use of uDAPL  OracleNet – Streams based – Bandwidth intensive – Previously written to sockets  use of Sockets Direct Protocol API

InfiniBand Cluster Performance Benefits Source: Oracle Corporation and Topspin on dual Xeon processor nodes Network Level Cluster Performance for Oracle RAC InfiniBand delivers 2-3X higher block transfers/sec as compared to GigE Block Transfer/sec (16KB)

InfiniBand Application to Database Performance Benefits InfiniBand delivers 30-40% lower CPU utilization and 100% higher throughput as compared to Gigabit Ethernet Source: Oracle Corporation and Topspin Percent

Broad Scope of InfiniBand Benefits Oracle RAC Application Servers Network Shared Storage Ethernet gateway FC gateway: host/lun mapping OracleNet: over SDP over IB Intra RAC: IPC over uDAPL over IB DAFS over IB SAN NAS 20% improvement in throughput 2x improvement in throughput and 45% less CPU 3-4x improvement in block updates/sec 30% improvement in DB performance

Database uDAPL Optimization Timeline IB HW/FW uDAPL skgxp CacheFusion Workload CM Sept 2002: uDAPL functional with 6Gb/s throughput Dec 2002: Oracle interconnect performance released, showing improvements in bandwidth (3x), latency(10x) and cpu reduction (3x) Feb 2003: Cache Block Updates show fourfold performance improvement in 4-node RAC April-August 2003: Gathering OAST and industry standard workload performance metrics. Fine tuning and optimization at skgxp, uDAPL and IB layers Jan 2003: added Topspin CM for improved scaling of number of connections and reduced setup times LM

RAC Cluster Communication  High speed communication is key – must be faster to fetch a block from a remote cache than to read the block from disk – Scalability is a function of communication CPU overhead  Two Primary Oracle Consumers – Lock manager / Oracle buffer cache – Inter instance parallel query communication  SKGXP Oracle’s IPC driver interface – Oracle is coded to skgxp – Skgxp is coded to vendor high performance interfaces – IB support delivered as a shared library libskgxp10.so

Cache Fusion Communication Shadow processes to client LMS Lock request cache RDMA

Parallel Query Communication PX Servers PX Servers to client msg data data

Cluster Interconnect Wish List  OS bypass (user mode communication)  Protocol offload  Efficient asynchronous communication model  RDMA with high bandwidth and low latency  Huge memory registrations for Oracle buffer caches  Support large number of processes in an instance  Commodity Hardware  Software interfaces based on open standards  Cross platform availability InfiniBand is first interconnect to meet all of these requirements

Asynchronous Communication  Benefits – Reduces impact of latency – Improves robustness by avoiding communication dead lock – Increases bandwidth utilization  Drawback - Historically costly, as synchronous operations are broken into separate submit and reap operations

Protocol Offload & OS Bypass  Bypass makes submit cheap – Requests are queued directly to hardware from Oracle  Offload – Completions move from the hardware to Oracle’s memory – Oracle can overlap commutation and computation without a trap to the OS or context switch

InfiniBand Benefits by Stress Area Stress AreaBenefit Cluster NetworkExtremely low latency 10 Gig throughput ComputeCPU & kernel offload removes TCP overhead Frees CPU cycles Server I/OSingle converged 10 Gig network for cluster, storage, LAN Central I/O scalability Stress level varies over time with each query InfiniBand provides substantial benefits in all three areas

Benefits for Different Workloads  High bandwidth and low latency benefits for Decision Support (DSS) – Should enable serious DSS workloads on RAC clusters  Low latency benefits for scaling Online Transaction Processing (OLTP)  Our estimate: One IB Link replaces 6-8 Gigabit Ethernet links

Commodity Hardware  Higher capabilities and lower cost than propriety interconnects  InfiniBand’s large bandwidth capability means that a single link can replace multiple GigE and FC interconnects

Memory Requirements  The Oracle buffer cache can consume 80% of a host’s physical memory  64 bit addressing and decreasing memory prices mean ever larger buffer caches  Infiniband provides… – Zero copy RDMA between very large buffer caches – Large shared registrations moves memory registration out of the performance path

Two Efforts Coming Together RAC/Cache Fusion and Oracle Net  Two Oracle engineering teams working at cluster and application tiers – 10g incorporates both efforts  Oracle Net benefits from many of the same capabilities as Cache Fusion – OS kernel bypass – CPU offload – New transport protocol (SDP) support – Efficient asynchronous communication model – RDMA with high bandwidth and low latency – Commodity hardware  Working on external and internal deployments

Open Standard Software APIs uDAPL and Async Sockets /SDP  Each new communication driver is a large investment for Oracle  One stack which works across multiple platforms means improved robustness  Oracle grows closer to the interfaces over time  Ready today for immerging technologies  Ubiquity and robustness of IP for high speed communication

Summary  Oracle and major system & storage vendors are supporting InfiniBand  InfiniBand presents superb opportunity for enhanced horizontal scalability and lower cost  Oracle Net’s InfiniBand Support significantly improves performance for both the app server and the database in Oracle 10g  Infiniband provides the performance to move applications to low cost Linux RAC databases. ????

A Q & Q U E S T I O N S A N S W E R S

Next Steps….  See InfiniBand demos first hand on the show floor – Dell, Intel, Netapp, Sun, Topspin (booth #620) – Includes clustering, app tier and storage over InfiniBand  InfiniBand whitepapers on both Oracle and Topspin websites – –