CUDA Interoperability with Graphical Environments

Slides:

Advertisements

Similar presentations

Is There a Real Difference between DSPs and GPUs?

Advertisements

Yafeng Yin, Lei Zhou, Hong Man 07/21/2010

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.

CUDA Tricks Presented by Damodaran Ramani. Synopsis Scan Algorithm Applications Specialized Libraries CUDPP: CUDA Data Parallel Primitives Library Thrust:

Using CUDA Libraries with OpenACC. 3 Ways to Accelerate Applications Applications Libraries “Drop-in” Acceleration Programming Languages OpenACC Directives.

L15: Review for Midterm. Administrative Project proposals due today at 5PM (hard deadline) – handin cs6963 prop March 31, MIDTERM in class L15: Review.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.

Parallelization and CUDA libraries Lei Zhou, Yafeng Yin, Hong Man.

GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.

Antigone Engine Kevin Kassing – Period

1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.

Enhancing GPU for Scientific Computing Some thoughts.

Technology and Historical Overview. Introduction to 3d Computer Graphics  3D computer graphics is the science, study, and method of projecting a mathematical.

CSE 381 – Advanced Game Programming Basic 3D Graphics

Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.

CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: MWF.

Implementing a Speech Recognition System on a GPU using CUDA

Diane Marinkas CDA 6938 April 30, Outline Motivation Algorithm CPU Implementation GPU Implementation Performance Lessons Learned Future Work.

Robert Liao Tracy Wang CS252 Spring Overview Traditional GPU Architecture The NVIDIA G80 Processor CUDA (Compute Unified Device Architecture) LAPACK.

GPU Architecture and Programming

COMPUTER GRAPHICS CSCI 375. What do I need to know?  Familiarity with  Trigonometry  Analytic geometry  Linear algebra  Data structures  OOP.

3D Programming and DirectX API. Content Mathematics Mathematics Prepare to Write a 3D program Prepare to Write a 3D program Program Structure Program.

CUDA-based Volume Rendering in IGT Nobuhiko Hata Benjamin Grauer.

Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

Computer Graphics IN5I11 Nabil H. Mustafa

Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.

MSIM 842 VISUALIZATION II INSTRUCTOR: JESSICA R. CROUCH 1 A Particle System for Interactive Visualization of 3D Flows Jens Krüger Peter Kipfer.

Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,

Linear Algebra Operators for GPU Implementation of Numerical Algorithms J. Krüger R. Westermann computer graphics & visualization Technical University.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.

Martin Kruliš by Martin Kruliš (v1.0)1.

CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.

FFTC: Fastest Fourier Transform on the IBM Cell Broadband Engine David A. Bader, Virat Agarwal.

Martin Kruliš by Martin Kruliš (v1.0)1.

Data Parallel Computations and Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson, slides6c.ppt Nov 4, c.1.

Optimizing the Performance of Sparse Matrix-Vector Multiplication

Single Instruction Multiple Threads

Appendix C Graphics and Computing GPUs

Analysis of Sparse Convolutional Neural Networks

- Introduction - Graphics Pipeline

B/B+ Trees 4.7.

Lecture 13 Sparse Matrix-Vector Multiplication and CUDA Libraries

GPU Computing CIS-543 Lecture 10: CUDA Libraries

Lecture 13 Sparse Matrix-Vector Multiplication and CUDA Libraries

Jens Krüger Technische Universität München

D. Gratadour : Introducing YoGA, Yorick with GPU acceleration

Graphics Processing Unit

3D Graphics Rendering PPT By Ricardo Veguilla.

Multi-Layer Perceptron On A GPU

CS451Real-time Rendering Pipeline

Introduction to cuBLAS

Introduction to CUDA C Slide credit: Slides adapted from

CSC4820/6820 Computer Graphics Algorithms Ying Zhu Georgia State University Transformations.

NVIDIA Fermi Architecture

3D applications in Delphi

Parallelization of Sparse Coding & Dictionary Learning

Chapter I Introduction

CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders

Major Design Strategies

Computer Graphics Matrix Hierarchies / Animation

Major Design Strategies

Figure 3. Converting an expression into a binary expression tree.

Data Parallel Computations and Pattern

Data Parallel Computations and Pattern

CIS 6930: Chip Multiprocessor: GPU Architecture and Programming

Presentation transcript:

CUDA Interoperability with Graphical Environments Martin Kruliš by Martin Kruliš (v1.1) 05.01.2017

Graphics Interoperability Allows CUDA code to read/write graphical buffers Works with OpenGL and Direct3D libraries Motivation Direct visualization of complex simulations Augmenting 3D rendering with visualization routines which are difficult to implement in shaders How it works The graphics resource is registered and represented by struct cudaGraphicResource The resource may be mapped to CUDA memory space cudaGraphicsMapResources(), … by Martin Kruliš (v1.1) 05.01.2017

OpenGL Initialization Resources Device must be selected by cudaGLSetGLDevice() Resources cudaGraphicsGLRegisterBuffer() for buffers The mapped buffers can be accessed in the same way as CUDA allocated memory cudaGraphicsGLRegisterImage() for images and render buffers The image buffers can be also accessed through texture and surface mechanisms Code example at CUDA Programming Gude (page 54-56) CUDA Samples: 2_Graphics\simpleGL – the same example as in the Guide (vertex buffer filling) 2_Graphics\Mandelbrot – fractal rendering 3_Imaging\postProcessGL – render buffer post processing (i.e., reading and writing) 5_Simulations\fluidsGL 5_Simulations\nbody 5_Simulations\particles 5_Simulations\smokeParticles 5_Simulations\oceanFFT Examples by Martin Kruliš (v1.1) 05.01.2017

Direct3D Direct3D Support Versions 9, 10, and 11 are supported Each version has its own API CUDA context may operate with one Direct3D device at a time And special HW mode must be set on the device Initialization is similar to OpenGL cudaD3D[9|10|11]SetDirect3DDevice() Available Direct3D resources Buffers, textures, and surfaces All using cudaGraphicsD3DXXRegisterResource() by Martin Kruliš (v1.1) 05.01.2017

SLI Interoperability GPU SLI Mode Multiple GPUs are interconnected (physically) and cooperating in rendering the scene AFR mode – different GPUs render subsequent frames CUDA interoperability issues Any CUDA allocation on one GPU is automatically performed on all SLI-connected GPUs CUDA has to use separate contexts for each GPU cudaGLGetDevices() – identify, which devices are in SLI cudaGLDeviceListAll cudaGLDeviceListCurrentFrame cudaGLDeviceListNextFrame by Martin Kruliš (v1.1) 05.01.2017

Discussion by Martin Kruliš (v1.1) 05.01.2017

Libraries Using CUDA Martin Kruliš by Martin Kruliš (v1.1) 05.01.2017

CUBLAS CUDA Basic Linear Algebra Subroutines CUDA implementation of standard BLAS library Complete support of all 152 functions on vectors/matrices copy, move, rotate, swap maximum, minimum, multiply by scalar sum, dot products, Euclidean norms matrix multiplications, inverses, linear combinations Some operations have batch versions Supports floats, doubles, and complex numbers by Martin Kruliš (v1.1) 05.01.2017

CUSP CUDA Sparse Linear Algebra Open source C++ library for sparse linear structures (matrices, linear systems, …) Key features Sparse matrix operations (add, substraction, max independent set, polynomial relaxation, …) Supports various matrix formats COO, CSR, DIA, ELL, and HYB Require CUDA CC 2.0 or higher by Martin Kruliš (v1.1) 05.01.2017

CUFFT CUDA Fast Fourier Transform Decompose signal to frequency spectrum 1-3D transforms (up to 128M elements) Many variations (precision, complex/real types, …) API similar to FFTW library Create plan (cufftHandle) which holds the configuration Associate/allocate work space (buffers) cufftExecC2C() (or R2C, C2R) starts execution FFT plan can be associated with CUDA stream For synchronization and overlapping by Martin Kruliš (v1.1) 05.01.2017

Thrust CUDA Thrust C++ template library based on STL API Basic idea is to develop C++ parallel applications with minimal overhead STL like vectors (for devices) and vector operations copy, fill, create sequences, reordering, sorting, … Algorithms Transformations Reductions Prefix-sums by Martin Kruliš (v1.1) 05.01.2017

GPU AI GPU AI for Board Games Specific AI library designed for games with large, but well-defined configuration space Requires CUDA CC 2.0 Currently supports Game Tree Split – alpha/beta pruning Single and multiple recursion (with large depths) Zero-sum games (3D Tic-Tac-Toe, Reversi, …) Sudoku backtracking generator and solver Statistical simulations (Monte Carlo for Go) by Martin Kruliš (v1.1) 05.01.2017

PhysX PhysX APEX Realtime physics engine Originally developed by Ageia for PPU card NVIDIA bought it and re-implemented it for CUDA Most important features Simulation of rigid bodies (collisions, destruction) Cloths and fluid particle systems APEX Framework built on top of PhysX Designed for easy usage (artists, games, …) by Martin Kruliš (v1.1) 05.01.2017

Discussion by Martin Kruliš (v1.1) 05.01.2017