CUDA Interoperability with Graphical Environments

Slides:



Advertisements
Similar presentations
Is There a Real Difference between DSPs and GPUs?
Advertisements

Yafeng Yin, Lei Zhou, Hong Man 07/21/2010
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
CUDA Tricks Presented by Damodaran Ramani. Synopsis Scan Algorithm Applications Specialized Libraries CUDPP: CUDA Data Parallel Primitives Library Thrust:
Using CUDA Libraries with OpenACC. 3 Ways to Accelerate Applications Applications Libraries “Drop-in” Acceleration Programming Languages OpenACC Directives.
L15: Review for Midterm. Administrative Project proposals due today at 5PM (hard deadline) – handin cs6963 prop March 31, MIDTERM in class L15: Review.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.
Parallelization and CUDA libraries Lei Zhou, Yafeng Yin, Hong Man.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Antigone Engine Kevin Kassing – Period
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
Enhancing GPU for Scientific Computing Some thoughts.
Technology and Historical Overview. Introduction to 3d Computer Graphics  3D computer graphics is the science, study, and method of projecting a mathematical.
CSE 381 – Advanced Game Programming Basic 3D Graphics
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: MWF.
Implementing a Speech Recognition System on a GPU using CUDA
Diane Marinkas CDA 6938 April 30, Outline Motivation Algorithm CPU Implementation GPU Implementation Performance Lessons Learned Future Work.
Robert Liao Tracy Wang CS252 Spring Overview Traditional GPU Architecture The NVIDIA G80 Processor CUDA (Compute Unified Device Architecture) LAPACK.
GPU Architecture and Programming
COMPUTER GRAPHICS CSCI 375. What do I need to know?  Familiarity with  Trigonometry  Analytic geometry  Linear algebra  Data structures  OOP.
3D Programming and DirectX API. Content Mathematics Mathematics Prepare to Write a 3D program Prepare to Write a 3D program Program Structure Program.
CUDA-based Volume Rendering in IGT Nobuhiko Hata Benjamin Grauer.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Computer Graphics IN5I11 Nabil H. Mustafa
Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.
MSIM 842 VISUALIZATION II INSTRUCTOR: JESSICA R. CROUCH 1 A Particle System for Interactive Visualization of 3D Flows Jens Krüger Peter Kipfer.
Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,
Linear Algebra Operators for GPU Implementation of Numerical Algorithms J. Krüger R. Westermann computer graphics & visualization Technical University.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
Martin Kruliš by Martin Kruliš (v1.0)1.
CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.
FFTC: Fastest Fourier Transform on the IBM Cell Broadband Engine David A. Bader, Virat Agarwal.
Martin Kruliš by Martin Kruliš (v1.0)1.
Data Parallel Computations and Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson, slides6c.ppt Nov 4, c.1.
Optimizing the Performance of Sparse Matrix-Vector Multiplication
Single Instruction Multiple Threads
Appendix C Graphics and Computing GPUs
Analysis of Sparse Convolutional Neural Networks
- Introduction - Graphics Pipeline
B/B+ Trees 4.7.
Lecture 13 Sparse Matrix-Vector Multiplication and CUDA Libraries
GPU Computing CIS-543 Lecture 10: CUDA Libraries
Lecture 13 Sparse Matrix-Vector Multiplication and CUDA Libraries
Jens Krüger Technische Universität München
D. Gratadour : Introducing YoGA, Yorick with GPU acceleration
Graphics Processing Unit
3D Graphics Rendering PPT By Ricardo Veguilla.
Multi-Layer Perceptron On A GPU
CS451Real-time Rendering Pipeline
Introduction to cuBLAS
Introduction to CUDA C Slide credit: Slides adapted from
CSC4820/6820 Computer Graphics Algorithms Ying Zhu Georgia State University Transformations.
NVIDIA Fermi Architecture
3D applications in Delphi
Parallelization of Sparse Coding & Dictionary Learning
Chapter I Introduction
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
Major Design Strategies
Computer Graphics Matrix Hierarchies / Animation
Major Design Strategies
Figure 3. Converting an expression into a binary expression tree.
Data Parallel Computations and Pattern
Data Parallel Computations and Pattern
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

CUDA Interoperability with Graphical Environments Martin Kruliš by Martin Kruliš (v1.1) 05.01.2017

Graphics Interoperability Allows CUDA code to read/write graphical buffers Works with OpenGL and Direct3D libraries Motivation Direct visualization of complex simulations Augmenting 3D rendering with visualization routines which are difficult to implement in shaders How it works The graphics resource is registered and represented by struct cudaGraphicResource The resource may be mapped to CUDA memory space cudaGraphicsMapResources(), … by Martin Kruliš (v1.1) 05.01.2017

OpenGL Initialization Resources Device must be selected by cudaGLSetGLDevice() Resources cudaGraphicsGLRegisterBuffer() for buffers The mapped buffers can be accessed in the same way as CUDA allocated memory cudaGraphicsGLRegisterImage() for images and render buffers The image buffers can be also accessed through texture and surface mechanisms Code example at CUDA Programming Gude (page 54-56) CUDA Samples: 2_Graphics\simpleGL – the same example as in the Guide (vertex buffer filling) 2_Graphics\Mandelbrot – fractal rendering 3_Imaging\postProcessGL – render buffer post processing (i.e., reading and writing) 5_Simulations\fluidsGL 5_Simulations\nbody 5_Simulations\particles 5_Simulations\smokeParticles 5_Simulations\oceanFFT Examples by Martin Kruliš (v1.1) 05.01.2017

Direct3D Direct3D Support Versions 9, 10, and 11 are supported Each version has its own API CUDA context may operate with one Direct3D device at a time And special HW mode must be set on the device Initialization is similar to OpenGL cudaD3D[9|10|11]SetDirect3DDevice() Available Direct3D resources Buffers, textures, and surfaces All using cudaGraphicsD3DXXRegisterResource() by Martin Kruliš (v1.1) 05.01.2017

SLI Interoperability GPU SLI Mode Multiple GPUs are interconnected (physically) and cooperating in rendering the scene AFR mode – different GPUs render subsequent frames CUDA interoperability issues Any CUDA allocation on one GPU is automatically performed on all SLI-connected GPUs CUDA has to use separate contexts for each GPU cudaGLGetDevices() – identify, which devices are in SLI cudaGLDeviceListAll cudaGLDeviceListCurrentFrame cudaGLDeviceListNextFrame by Martin Kruliš (v1.1) 05.01.2017

Discussion by Martin Kruliš (v1.1) 05.01.2017

Libraries Using CUDA Martin Kruliš by Martin Kruliš (v1.1) 05.01.2017

CUBLAS CUDA Basic Linear Algebra Subroutines CUDA implementation of standard BLAS library Complete support of all 152 functions on vectors/matrices copy, move, rotate, swap maximum, minimum, multiply by scalar sum, dot products, Euclidean norms matrix multiplications, inverses, linear combinations Some operations have batch versions Supports floats, doubles, and complex numbers by Martin Kruliš (v1.1) 05.01.2017

CUSP CUDA Sparse Linear Algebra Open source C++ library for sparse linear structures (matrices, linear systems, …) Key features Sparse matrix operations (add, substraction, max independent set, polynomial relaxation, …) Supports various matrix formats COO, CSR, DIA, ELL, and HYB Require CUDA CC 2.0 or higher by Martin Kruliš (v1.1) 05.01.2017

CUFFT CUDA Fast Fourier Transform Decompose signal to frequency spectrum 1-3D transforms (up to 128M elements) Many variations (precision, complex/real types, …) API similar to FFTW library Create plan (cufftHandle) which holds the configuration Associate/allocate work space (buffers) cufftExecC2C() (or R2C, C2R) starts execution FFT plan can be associated with CUDA stream For synchronization and overlapping by Martin Kruliš (v1.1) 05.01.2017

Thrust CUDA Thrust C++ template library based on STL API Basic idea is to develop C++ parallel applications with minimal overhead STL like vectors (for devices) and vector operations copy, fill, create sequences, reordering, sorting, … Algorithms Transformations Reductions Prefix-sums by Martin Kruliš (v1.1) 05.01.2017

GPU AI GPU AI for Board Games Specific AI library designed for games with large, but well-defined configuration space Requires CUDA CC 2.0 Currently supports Game Tree Split – alpha/beta pruning Single and multiple recursion (with large depths) Zero-sum games (3D Tic-Tac-Toe, Reversi, …) Sudoku backtracking generator and solver Statistical simulations (Monte Carlo for Go) by Martin Kruliš (v1.1) 05.01.2017

PhysX PhysX APEX Realtime physics engine Originally developed by Ageia for PPU card NVIDIA bought it and re-implemented it for CUDA Most important features Simulation of rigid bodies (collisions, destruction) Cloths and fluid particle systems APEX Framework built on top of PhysX Designed for easy usage (artists, games, …) by Martin Kruliš (v1.1) 05.01.2017

Discussion by Martin Kruliš (v1.1) 05.01.2017