Experiences with VI Communication for Database Storage Yuanyuan Zhou, Angelos Bilas, Suresh Jagannathan, Cezary Dubnicki, Jammes F. Philbin, Kai Li.

Slides:

Advertisements

Similar presentations

© 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance Measurements of a User-Space.

Advertisements

Threads, SMP, and Microkernels

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.

Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,

High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.

CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Supporting Parallel Applications on Clusters of Workstations: The Intelligent Network Interface Approach.

Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.

Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-

Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.

Chapter 5 Threads os5.

Computer Systems/Operating Systems - Class 8

Merrill Holt Director Parallel Server Product Management Oracle Corporation.

Chapter 4: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads.

1. Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2.

VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.

Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.

User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.

3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.

3.5 Interprocess Communication

Threads CSCI 444/544 Operating Systems Fall 2008.

Federated DAFS: Scalable Cluster-based Direct Access File Servers Murali Rangarajan, Suresh Gopalakrishnan Ashok Arumugam, Rabita Sarker Rutgers University.

1 Chapter 4 Threads Threads: Resource ownership and execution.

Realizing the Performance Potential of the Virtual Interface Architecture Evan Speight, Hazim Abdel-Shafi, and John K. Bennett Rice University, Dep. Of.

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.

1 I/O Management in Representative Operating Systems.

COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.

Scheduler Activations On BSD: Sharing Thread Management Between Kernel and Application Christopher Small and Margo Seltzer Harvard University Presenter:

RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.

Processes Part I Processes & Threads* *Referred to slides by Dr. Sanjeev Setia at George Mason University Chapter 3.

SymCall: Symbiotic Virtualization Through VMM-to-Guest Upcalls John R. Lange and Peter Dinda University of Pittsburgh (CS) Northwestern University (EECS)

Computer System Architectures Computer System Software

Chapter 4 Threads, SMP, and Microkernels Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E.

Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)

CS533 Concepts of Operating Systems Jonathan Walpole.

Database Edition for Sybase Sales Presentation. Market Drivers DBAs are facing immense time pressure in an environment with ever-increasing data Continuous.

High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.

Operating System 4 THREADS, SMP AND MICROKERNELS

Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.

Module – 4 Intelligent storage system

1 Threads, SMP, and Microkernels Chapter 4. 2 Focus and Subtopics Focus: More advanced concepts related to process management : Resource ownership vs.

MediaGrid Processing Framework 2009 February 19 Jason Danielson.

A Comparative Study of the Linux and Windows Device Driver Architectures with a focus on IEEE1394 (high speed serial bus) drivers Melekam Tsegaye

Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.

2006 Sonoma Workshop February 2006Page 1 Sockets Direct Protocol (SDP) for Windows - Motivation and Plans Gilad Shainer Mellanox Technologies Inc.

Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.

Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-

Minimizing Communication Latency to Maximize Network Communication Throughput over InfiniBand Design and Implementation of MPICH-2 over InfiniBand with.

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.

1 Threads, SMP, and Microkernels Chapter 4. 2 Process Resource ownership: process includes a virtual address space to hold the process image (fig 3.16)

1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.

Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.

1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.

Improving the Performance of Storage Servers Yuanyuan Zhou Princeton University.

Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.

Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.

The influence of system calls and interrupts on the performances of a PC cluster using a Remote DMA communication primitive Olivier Glück Jean-Luc Lamotte.

Multithreaded Programing. Outline Overview of threads Threads Multithreaded Models  Many-to-One  One-to-One  Many-to-Many Thread Libraries  Pthread.

Operating System 4 THREADS, SMP AND MICROKERNELS.

Full and Para Virtualization

An Efficient Threading Model to Boost Server Performance Anupam Chanda.

Chapter 4 Threads, SMP, and Microkernels Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E.

Background Computer System Architectures Computer System Software.

Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.

CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.

An Introduction to GPFS

Introduction to Operating Systems Concepts

Lecture 4- Threads, SMP, and Microkernels

B.Ramamurthy Chapter 2 : Appendix

Outline Operating System Organization Operating System Examples

Presentation transcript:

Experiences with VI Communication for Database Storage Yuanyuan Zhou, Angelos Bilas, Suresh Jagannathan, Cezary Dubnicki, Jammes F. Philbin, Kai Li

Presentation Outline Motivation  Use VI-based interconnects to improve I/O path performance between a database server and the storage subsystem. Approach  Design a block-level storage architecture V3 – VI-attached Volume Vault  Implement a software layer between the application and VI DSA – Direct Storage Access  Perform measurement by using Microsoft SQL Server 2000 on a 4-cpu and 32-cpu database server.

V3 Architecture V3 client  Small scale uniprocessor  SMP system  Large scale multiprocessor server V3 server  V3 volume  A volume provides more than 2TB storage  Span multiple nodes by combinations of RAID  More than 250TB storage in V3 back-end VI interconnect Customer software

DSA Implementations Block-level I/O module specification layer between the application and VI. Take advantage of VI-specific features  RDMA  Minimize kernel involvement  minimize multiple copying  Large number of Overlapping I/O requests

DSA Implementation Provide new features  Flow control, retransmission, reconnection  Optimization Memory registration and deregistration Interrupt handling Lock synchronization  One kernel-level implementation  Two user-level implementations

Kernel-level DSA Implementation Implemented on top of a kernel-level version of the VI specification. Provide the standard I/O interface for storage driver. Support user-level / kernel- level application without modification. Leverage the benefits of VI in kernel-level storage APIs

User-level DSA Implementation Replace all I/O calls to V3 storage Support standard windows I/O interface. Issue I/O requests without kernel involvement. Need kernel involvement for I/O completion. Notify wDSA by the interrupt to complete the corresponding I/O requests.

User-level DSA Implementation A new I/O API Minimize kernel involvement, context switches and interrupt Trade off transparency for performance. May need modification of the application. Application-controlled I/O completion mode either polling or interrupt

System Optimization for DSA VI registration and deregistration  Not feasible to pre-registering all I/O buffers at startup in database systems. Dynamically manage registered memory. kDSA – Windows I/O manager cDSA – Address Windowing Extensions  Batched deregistration

System Optimization Interrupts  Notify the database for a completed asynchronous I/O request.  Interrupt batching to reduce the high cost kDSA –When I/O requests > the threshold, use polling. cDSA – Set a completion flag by RDMA. Polling the flag during a time interval; switch to wait for interrupt after then.

System Optimization Lock Synchronization  Count a significant percentage on the I/O path.  Reduce the lock/unlock operations cDSA has more control of the lock synchronization pairs.

Experiment Platform A mid-size, 4-cpu SMP A large-size, 32-CPU SMP Giganet network Maximum end-to-end user-level bandwidth is about 110MB/s One way latency for a 64-bytes message is about 7 us.

Micro-benchmark Results DSA Overhead V3 configuration  A single application client runs the micro- benchmark.  A single storage node presents the virtual disk Raw VI  A locally attached disk, without any V3 software Reading a data block from the storage server.

DSA Overhead A round-trip delay for a read I/O request cDSA has the lowest CPU overhead

V3 Cached-block Performance

V3 vs. Local Case

On-line Transaction Processing BenchMark – TPC-C Commercial databases issue multiple concurrent I/Os and tolerant high I/O response times and low-throughput disks. Large database configuration Mid-size database configuration

Large Database Configuration

Mid-size Database Configuration

Conclusion cDSA provides a 18% transaction rate improvement over FC for large database configuration. Effective use of VI in I/O intensive environments requires substantial enhancements to flow control, reconnection, interrupt handling, memory registration and lock synchronization. New storage APIs that help minimize kernel involvement in the I/O path are needed to fully exploit the benefits of VI-based communication.