Presentation is loading. Please wait.

Presentation is loading. Please wait.

CUDA Interoperability with Graphical Environments

Similar presentations


Presentation on theme: "CUDA Interoperability with Graphical Environments"— Presentation transcript:

1 CUDA Interoperability with Graphical Environments
Martin Kruliš by Martin Kruliš (v1.1)

2 Graphics Interoperability
Allows CUDA code to read/write graphical buffers Works with OpenGL and Direct3D libraries Motivation Direct visualization of complex simulations Augmenting 3D rendering with visualization routines which are difficult to implement in shaders How it works The graphics resource is registered and represented by struct cudaGraphicResource The resource may be mapped to CUDA memory space cudaGraphicsMapResources(), … by Martin Kruliš (v1.1)

3 OpenGL Initialization Resources
Device must be selected by cudaGLSetGLDevice() Resources cudaGraphicsGLRegisterBuffer() for buffers The mapped buffers can be accessed in the same way as CUDA allocated memory cudaGraphicsGLRegisterImage() for images and render buffers The image buffers can be also accessed through texture and surface mechanisms Code example at CUDA Programming Gude (page 54-56) CUDA Samples: 2_Graphics\simpleGL – the same example as in the Guide (vertex buffer filling) 2_Graphics\Mandelbrot – fractal rendering 3_Imaging\postProcessGL – render buffer post processing (i.e., reading and writing) 5_Simulations\fluidsGL 5_Simulations\nbody 5_Simulations\particles 5_Simulations\smokeParticles 5_Simulations\oceanFFT Examples by Martin Kruliš (v1.1)

4 Direct3D Direct3D Support Versions 9, 10, and 11 are supported
Each version has its own API CUDA context may operate with one Direct3D device at a time And special HW mode must be set on the device Initialization is similar to OpenGL cudaD3D[9|10|11]SetDirect3DDevice() Available Direct3D resources Buffers, textures, and surfaces All using cudaGraphicsD3DXXRegisterResource() by Martin Kruliš (v1.1)

5 SLI Interoperability GPU SLI Mode
Multiple GPUs are interconnected (physically) and cooperating in rendering the scene AFR mode – different GPUs render subsequent frames CUDA interoperability issues Any CUDA allocation on one GPU is automatically performed on all SLI-connected GPUs CUDA has to use separate contexts for each GPU cudaGLGetDevices() – identify, which devices are in SLI cudaGLDeviceListAll cudaGLDeviceListCurrentFrame cudaGLDeviceListNextFrame by Martin Kruliš (v1.1)

6 Discussion by Martin Kruliš (v1.1)

7 Libraries Using CUDA Martin Kruliš by Martin Kruliš (v1.1)

8 CUBLAS CUDA Basic Linear Algebra Subroutines
CUDA implementation of standard BLAS library Complete support of all 152 functions on vectors/matrices copy, move, rotate, swap maximum, minimum, multiply by scalar sum, dot products, Euclidean norms matrix multiplications, inverses, linear combinations Some operations have batch versions Supports floats, doubles, and complex numbers by Martin Kruliš (v1.1)

9 CUSP CUDA Sparse Linear Algebra
Open source C++ library for sparse linear structures (matrices, linear systems, …) Key features Sparse matrix operations (add, substraction, max independent set, polynomial relaxation, …) Supports various matrix formats COO, CSR, DIA, ELL, and HYB Require CUDA CC 2.0 or higher by Martin Kruliš (v1.1)

10 CUFFT CUDA Fast Fourier Transform
Decompose signal to frequency spectrum 1-3D transforms (up to 128M elements) Many variations (precision, complex/real types, …) API similar to FFTW library Create plan (cufftHandle) which holds the configuration Associate/allocate work space (buffers) cufftExecC2C() (or R2C, C2R) starts execution FFT plan can be associated with CUDA stream For synchronization and overlapping by Martin Kruliš (v1.1)

11 Thrust CUDA Thrust C++ template library based on STL API
Basic idea is to develop C++ parallel applications with minimal overhead STL like vectors (for devices) and vector operations copy, fill, create sequences, reordering, sorting, … Algorithms Transformations Reductions Prefix-sums by Martin Kruliš (v1.1)

12 GPU AI GPU AI for Board Games
Specific AI library designed for games with large, but well-defined configuration space Requires CUDA CC 2.0 Currently supports Game Tree Split – alpha/beta pruning Single and multiple recursion (with large depths) Zero-sum games (3D Tic-Tac-Toe, Reversi, …) Sudoku backtracking generator and solver Statistical simulations (Monte Carlo for Go) by Martin Kruliš (v1.1)

13 PhysX PhysX APEX Realtime physics engine
Originally developed by Ageia for PPU card NVIDIA bought it and re-implemented it for CUDA Most important features Simulation of rigid bodies (collisions, destruction) Cloths and fluid particle systems APEX Framework built on top of PhysX Designed for easy usage (artists, games, …) by Martin Kruliš (v1.1)

14 Discussion by Martin Kruliš (v1.1)


Download ppt "CUDA Interoperability with Graphical Environments"

Similar presentations


Ads by Google