Reliable Datagram Sockets and InfiniBand Hanan Hit NoCOUG Staff 2010.

Slides:



Advertisements
Similar presentations
MicroTerabyte Leveraging InfiniBand to Build a Powerful, Scalable Oracle Database and Application Platform Brian Dougherty Chief Architect, CMA.
Advertisements

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
1 Agenda … HPC Technology & Trends HPC Platforms & Roadmaps HP Supercomputing Vision HP Today.
Oracle Exalogic Elastic Cloud Vysoký výkon pre Javu, Middleware a Aplikácie Mikuláš Strelecký, Oracle Slovensko.
Oracle Exadata for SAP.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1.
© 2010 VMware Inc. All rights reserved Confidential Performance Tuning for Windows Guest OS IT Pro Camp Presented by: Matthew Mitchell.
S.A.N. Solutions. Presenter Jeff Patton Network Administrator, IT Services Group
Introduction to DBA.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
RDS and Oracle 10g RAC Update Paul Tsien, Oracle.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Server Platforms Week 11- Lecture 1. Server Market $ 46,100,000,000 ($ 46.1 Billion) Gartner.
Efficient Cloud Computing Through Scalable Networking Solutions.
Chapter 3 Chapter 3: Server Hardware. Chapter 3 Learning Objectives n Describe the base system requirements for Windows NT 4.0 Server n Explain how to.
1 AppliedMicro X-Gene ® ARM Processors Optimized Scale-Out Solutions for Supercomputing.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
© 2014 IBM Corporation IBM FlashSystem John Clifton
© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.
© 2009 Oracle Corporation. S : Slash Storage Costs with Oracle Automatic Storage Management Ara Vagharshakian ASM Product Manager – Oracle Product.
5.3 HS23 Blade Server. The HS23 blade server is a dual CPU socket blade running Intel´s new Xeon® processor, the E5-2600, and is the first IBM BladeCenter.
Infiniband enables scalable Real Application Clusters – Update Spring 2008 Sumanta Chatterjee, Oracle Richard Frank, Oracle.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
SRP Update Bart Van Assche,.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
Introducing Snap Server™ 700i Series. 2 Introducing the Snap Server 700i series Hardware −iSCSI storage appliances with mid-market features −1U 19” rack-mount.
Roland Dreier Technical Lead – Cisco Systems, Inc. OpenIB Maintainer Sean Hefty Software Engineer – Intel Corporation OpenIB Maintainer Yaron Haviv CTO.
Module – 4 Intelligent storage system
The NE010 iWARP Adapter Gary Montry Senior Scientist
Storage Systems Market Analysis Dec 04. Storage Market & Technologies.
InfiniSwitch Company Confidential. 2 InfiniSwitch Agenda InfiniBand Overview Company Overview Product Strategy Q&A.
NCICB Systems Architecture Bill Britton Terrapin Systems LPG/NCICB Dedicated Support.
© 2012 IBM Corporation IBM Flex System™ The elements of an IBM PureFlex System.
Business Intelligence Appliance Powerful pay as you grow BI solutions with Engineered Systems.
SESSION CODE: BIE07-INT Eric Kraemer Senior Program Manager Microsoft Corporation.
Oracle RAC and Linux in the real enterprise October, 02 Mark Clark Director Merrill Lynch Europe PLC Global Database Technologies October, 02 Mark Clark.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
InfiniBand in the Lab Erik 1.
High Performance Communication for Oracle using InfiniBand Ross Schibler CTO Topspin Communications, Inc Session id: #36568 Peter Ogilvie Principal Member.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
MDC323B SMB 3 is the answer Ned Pyle Sr. PM, Windows Server
Rick Claus Sr. Technical Evangelist,
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
STORAGE ARCHITECTURE/ MASTER): Disk Storage: What Are Your Options? Randy Kerns Senior Partner The Evaluator Group.
Barriers to IB adoption (Storage Perspective) Ashish Batwara Software Solution Architect May 01, 2007.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.
Tackling I/O Issues 1 David Race 16 March 2010.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Microsoft Advertising 16:9 Template Light Use the slides below to start the design of your presentation. Additional slides layouts (title slides, tile.
Introduction to Exadata X5 and X6 New Features
Voltaire and the CERN openlab collaborate on Grid technology project using InfiniBand May 27, 2004 Patrick Chevaux EMEA Business Development
E2800 Marco Deveronico All Flash or Hybrid system
Ryan Leonard Storage and Solutions Architect
Video Security Design Workshop:
Sebastian Solbach Consulting Member of Technical Staff
Joint Techs Workshop InfiniBand Now and Tomorrow
Low Latency Analytics HPC Clusters
Presentation transcript:

Reliable Datagram Sockets and InfiniBand Hanan Hit NoCOUG Staff 2010

Agenda 2 Infiniband Basics What is RDS (Reliable Datagram Sockets)? Advantages of RDS over InfiniBand Architecture Overview TPC-H over 11g Benchmark InfiniBand vs. 10GE

3 November 11, Value Proposition - Oracle Database RAC Oracle Database Real Application Clusters (RAC) provides the ability to build an application platform from multiple systems clustered together Benefits –Performance Increase performance of a RAC database by adding additional servers to the cluster –Fault Tolerance A RAC database is constructed from multiple instances. Loss of an instance does not bring down the entire database –Scalability Scale a RAC database by adding instances to the cluster database

Some Facts 4 High-end database applications in the OLTP category are in size range from TB with 2-10k IOPS. The high end DW applications falls into the category of TB with I/O bandwidth requirement of around 4-8 GB per second. The x86_64 server with 2 sockets seems to offer the best price at the current point. The major limitations of the above servers is limited number of slots available to connect to the external I/O cards and the CPU cost of processing I/O in conventional kernel based I/O mechanisms. The main challenge in building cluster databases that runs in multiple serves is the ability to provide low cost balanced I/O bandwidth. The conventional fiber channel based storage arrays with its expensive plumbing does not scale very well to create the balance where these db servers could be optimally utilized. November 11,2010

IBA/Reliable Datagram Sockets (RDS) Protocol 5 What is IBA InfiniBand Architecture (IBA) is an industry-standard, channel-based, switched-fabric, high-speed interconnect architecture with low latency and high throughput. The InfiniBand architecture specification defines a connection between processor nodes and high performance I/O nodes such as storage devices. What is RDS A low overhead, low latency, high bandwidth, ultra reliable, supportable, Inter-Process Communication (IPC) protocol and transport system Matches Oracle’s existing IPC models for RAC communication  Optimized for transfers from 200Bytes to 8MByte Based on Socket API November 11,2010

Reliable Datagram Sockets (RDS) Protocol 6 Leverage InfiniBand’s built-in high availability and load balance features Port failover on the same HCA HCA failover on the same system Automatic load balancing Open Source on Open Fabric / OFED November 11,2010

Advantages of RDS over InfiniBand 7 Lowering Data Center TCO requires efficient fabrics Oracle RAC 11g will scale for database intensive applications only with the proper high speed protocol and efficient interconnect RDS over 10GE 10Gbps not enough to feed multi core Server IO needs Each core may require > 3Gbps Packets can be lost and require retransmit Statistics are not accurate throughput indication Efficiency is much lower than reported RDS over InfiniBand The network efficiency is always 100% 40Gbps today Uses Infiniband delivery capabilities that offload end-to-end checking to the Infiniband fabric. Integrated in the Linux kernel More tools will be ported to support RDS, i.e.: netstat, etc. Shows significant real world application performance boost Decision Support System Mixed Batch/OLTP workloads November 11,2010

Infiniband considerations 8 Why do Oracle use Infiniband? High bandwidth (1x SDR = 2.5 Gbps, 1x DDR = 5.0 Gbps, 1x QDR = 10.0 Gbps) V2 DB machine uses 4x QDR links (40 Gbps in each direction, simultaneously)‏ Low latency (few µs end-to-end, 160ns per switch hop)‏ RDMA capable Exadata cells recv/send large transfers using RDMA, thus saving CPU for other operations November 11,2010

Architecture Overview 9 November 11,2010

10 November 11, #1 Price/Performance TPC-H over 11g Benchmark 11g over DDR –Servers: 64 x ProLiant BL460c CPU: 2 x Intel Xeon X5450 – Quad-Core –Fabric: Mellanox DDR InfiniBand –Storage: Native InfiniBand Storage –6 x HP Oracle Exadata World Record clustered TPC-H Performance and Price/Performance 11g over 1GE11g over DDR Price / DB $5.00 $10.00 $15.00 $20.00 $ % TCO Saving

11 November 11, POC Hardware Configuration Application Servers 2x HP BL480C 2 Processors / 8 core X GHz 64GB RAM 4x 72GB 15K drives NIC: HP NC373i 1GB NIC Concurrent Manager Servers 6x HP BL480C 2 Processors / 8 core X GHz 64GB RAM 4x 72GB 15K drives NIC: HP NC373i 1GB NIC Database Servers 6x HP DL580 G5 4 processors / 24 cores X GHz 256GB RAM 8x 72GB 15K drives NIC: Intel 10GBE XF SR 2 port PCIe NIC Interconnect: Mellanox 4x PCIe Infiniband Storage Array HP XP GB cache / 20GB shared memory 60 Array Groups of 4 spindles 240 spindles total 146GB 15K fibre channel disk drives 1GbE Network 10GbE Network Infiniband Network 4Gb Fibre Channel Network Application Servers Concurrent Management Servers Database Servers Storage Array

12 November 11,2010 CPU Utilization InfiniBand maximize CPU efficiency –Enables >20% higher than 10GE InfiniBand Interconnect 10GigE Interconnect

13 November 11,2010 Disk IO Rate InfiniBand maximizes Disk utilization –Delivers 46% higher IO traffic than 10GE InfiniBand Interconnect 10GigE Interconnect

14 November 11,2010 InfiniBand deliver 63% more TPS vs. 10GE ActivityStart TimeEnd TimeDurationRecordsTPS InfiniBand Interconnect 1Invoice Load - Load File6/17/09 7:486/17/09 7:540:06:019,899,63527, Invoice Load - Auto Invoice6/17/09 8:006/17/09 9:541:54:219,899,6351, Invoice Load – TotalN/A 2:00:229,899,6351, GigE interconnect 1Invoice Load - Load File6/25/09 17:156/25/09 17:200:05:217,196,17122, Invoice Load - Auto Invoice6/25/09 18:226/25/09 20:392:17:057,196, Invoice Load – TotalN/A 2:22:267,196, Work Load – Nodes 1 through 4: Batch processing – Node 5: Extra Node not used – Node 6: EBS Other Activity Database size (2 TB) – ASM – GB TPS Rates for invoice load use case InfiniBand needs only 6 servers vs. 10 Servers needed by 10GE 10GE InfiniBand TPS

15 November 11,2010 Sun Oracle Database Machine Clustering is the architecture of the future –Highest performance, lowest cost, redundant, incrementally scalable Sun Oracle Database Machine that based on 40Gb/s InfiniBand delivers a complete clustering architecture for all data management needs

16 November 11,2010 Sun Oracle Database Server Hardware 8 Sun Fire X4170 DB per rack 8 CPU cores 72 GB memory Dual-ports 40Gb/s InfiniBand card Fully redundant power and cooling

17 November 11,2010 Exadata Storage Server Hardware Building block of massively parallel Exadata Storage Grid –Up to 1.5 GB/sec raw data bandwidth per cell –Up to 75,000 IOPS with Flash Sun Fire™ X4275 Server –2 Quad-Core Intel® Xeon® E5540 Processors –24GB RAM –Dual-port 4X QDR (40Gb/s) InfiniBand card Disk Options12 x 600 GB SAS disks (7.2 TB total) 12 x 2TB SATA disks (24 TB total) –4 x 96 GB Sun Flash PCIe Cards (384 GB total) Software pre-installed –Oracle Exadata Storage Server Software –Oracle Enterprise Linux –Drivers, Utilities Single Point of Support from Oracle –3 year, 24 x 7, 4 Hr On-site response

18 November 11,2010 Mellanox 40Gbps InfiniBand Networking Sun Datacenter InfiniBand Switch – 36 Ports QSFP Fully redundant non-blocking IO paths from servers to storage 2.88 Tb/sec bi-sectional bandwidth per switch 40Gb/s QDR, Dual ports per server Highest Bandwidth and Lowest Latency

DB machine protocol stack 19 November 11,2010 Infiniband HCA IPoIB RDS TCP/UDP iDB Oracle IPC RAC RDS provides - Zero loss - Zero copy (ZDP)‏ SQL*Net, CSS, etc

20 November 11,2010 What's new in V2 2 managed, 2 unmanaged switches 24 port DDR switches 15 second min. SM failover timeout CX4 connectors SNMP monitoring available Cell HCA in x4 PCIe slot 3 managed switches 36 port QDR switches 5 seconds min. SM failover timeout QSFP connectors SNMP monitoring coming soon Cell HCA in x8 PCIe slot V1 DB machine V2 DB machine

21 November 11,2010 Infiniband Monitoring SNMP alerts on Sun IB switches are coming EM support for IB fabric coming –Voltaire EM plugin available (at an extra cost)‏ In the meantime, customers can & should monitor using –IB commands from host –Switch CLI to monitor various switch components Self monitoring exists –Exadata cell software monitors its own IB ports –Bonding driver monitors local port failures –SM monitors all port failures on the fabric

22 November 11,2010 Scale Performance and Capacity Scalable –Scales to 8 rack database machine by just adding wires More with external InfiniBand switches –Scales to hundreds of storage servers Multi-petabyte databases Redundant and Fault Tolerant –Failure of any component is tolerated –Data is mirrored across storage servers

23 November 11,2010 Competitive Advantage “…everybody is using Ethernet, we are using InfiniBand, 40Gb/s InfiniBand” Larry Ellison Keynote at Oracle OpenWorld introducing Exadata-2 (Sun Oracle DB machine), October 14, 2009 San Francisco