Martin Kruliš 26. 11. 2015 by Martin Kruliš (v1.0)1.

Slides:

Advertisements

Similar presentations

Affine Transformations Jim Van Verth NVIDIA Corporation

Advertisements

GPGPU Programming Dominik G ö ddeke. 2Overview Choices in GPGPU programming Illustrated CPU vs. GPU step by step example GPU kernels in detail.

Is There a Real Difference between DSPs and GPUs?

Yafeng Yin, Lei Zhou, Hong Man 07/21/2010

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.

GRAPHICS AND COMPUTING GPUS Jehan-François Pâris

CUDA Tricks Presented by Damodaran Ramani. Synopsis Scan Algorithm Applications Specialized Libraries CUDPP: CUDA Data Parallel Primitives Library Thrust:

Using CUDA Libraries with OpenACC. 3 Ways to Accelerate Applications Applications Libraries “Drop-in” Acceleration Programming Languages OpenACC Directives.

1cs542g-term Notes  Assignment 1 will be out later today (look on the web)

L15: Review for Midterm. Administrative Project proposals due today at 5PM (hard deadline) – handin cs6963 prop March 31, MIDTERM in class L15: Review.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

CUDA (Compute Unified Device Architecture) Supercomputing for the Masses by Peter Zalutski.

ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.

Parallelization and CUDA libraries Lei Zhou, Yafeng Yin, Hong Man.

GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.

Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.

Antigone Engine Kevin Kassing – Period

1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.

Enhancing GPU for Scientific Computing Some thoughts.

Technology and Historical Overview. Introduction to 3d Computer Graphics  3D computer graphics is the science, study, and method of projecting a mathematical.

Character Animation Blending Abstract BSP Content Importing Digital Content Pipeline Extending Microsoft’s XNA Framework with BSP Content and Animation.

CSE 381 – Advanced Game Programming Basic 3D Graphics

MATLAB and the GPU Who is AccelerEyes? What’s a GPU?

Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.

GPU Shading and Rendering Shading Technology 8:30 Introduction (:30–Olano) 9:00 Direct3D 10 (:45–Blythe) Languages, Systems and Demos 10:30 RapidMind.

Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.

Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.

CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: MWF.

Implementing a Speech Recognition System on a GPU using CUDA

Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors

HiPC 2010 AN INTEGER PROGRAMMING FRAMEWORK FOR OPTIMIZING SHARED MEMORY USE ON GPUS Wenjing Ma Gagan Agrawal The Ohio State University.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

L17: Introduction to “Irregular” Algorithms and MPI, cont. November 8, 2011.

GPU Architecture and Programming

COMPUTER GRAPHICS CSCI 375. What do I need to know?  Familiarity with  Trigonometry  Analytic geometry  Linear algebra  Data structures  OOP.

Shadow Mapping Chun-Fa Chang National Taiwan Normal University.

Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)

CUDA-based Volume Rendering in IGT Nobuhiko Hata Benjamin Grauer.

Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.

GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.

Based on paper by: Rahul Khardekar, Sara McMains Mechanical Engineering University of California, Berkeley ASME 2006 International Design Engineering Technical.

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

Computer Graphics IN5I11 Nabil H. Mustafa

Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.

MSIM 842 VISUALIZATION II INSTRUCTOR: JESSICA R. CROUCH 1 A Particle System for Interactive Visualization of 3D Flows Jens Krüger Peter Kipfer.

Discontinuous Displacement Mapping for Volume Graphics, Volume Graphics 2006, July 30, Boston, MA Discontinuous Displacement Mapping for Volume Graphics.

GAM666 – Introduction To Game Programming ● Programmer's perspective of Game Industry ● Introduction to Windows Programming ● 2D animation using DirectX.

Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,

Linear Algebra Operators for GPU Implementation of Numerical Algorithms J. Krüger R. Westermann computer graphics & visualization Technical University.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.

Martin Kruliš by Martin Kruliš (v1.0)1.

CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.

FFTC: Fastest Fourier Transform on the IBM Cell Broadband Engine David A. Bader, Virat Agarwal.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.

Creation and Visualization of 3D Scenes with the MRPT library January, 2007 Jose Luis Blanco Claraco Dept. of Automation and System Engineering University.

Optimizing the Performance of Sparse Matrix-Vector Multiplication

Appendix C Graphics and Computing GPUs

CUDA Interoperability with Graphical Environments

GPU Computing CIS-543 Lecture 10: CUDA Libraries

Jens Krüger Technische Universität München

Graphics Processing Unit

Introduction to cuBLAS

Introduction to CUDA C Slide credit: Slides adapted from

CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders

Major Design Strategies

CIS 6930: Chip Multiprocessor: GPU Architecture and Programming

Presentation transcript:

Martin Kruliš by Martin Kruliš (v1.0)1

 Interoperability ◦ Allows CUDA code to read/write graphical buffers  Works with OpenGL and Direct3D libraries ◦ Motivation  Direct visualization of complex simulations  Augmenting 3D rendering with visualization routines which are difficult to implement in shaders ◦ How it works  The graphics resource is registered and represented by struct cudaGraphicResource  The resource may be mapped to CUDA memory space cudaGraphicsMapResources(), … by Martin Kruliš (v1.0)2

 Initialization ◦ Device must be selected by cudaGLSetGLDevice()  Resources ◦ cudaGraphicsGLRegisterBuffer() for buffers  The mapped buffers can be accessed in the same way as CUDA allocated memory ◦ cudaGraphicsGLRegisterImage() for images and render buffers  The image buffers can be also accessed through texture and surface mechanisms by Martin Kruliš (v1.0)3 Examples

 Direct3D Support ◦ Versions 9, 10, and 11 are supported  Each version has its own API ◦ CUDA context may operate with one Direct3D device at a time  And special HW mode must be set on the device ◦ Initialization is similar to OpenGL cudaD3D[9|10|11]SetDirect3DDevice() ◦ Available Direct3D resources  Buffers, textures, and surfaces  All using cudaGraphicsD3DXXRegisterResource() by Martin Kruliš (v1.0)4

 GPU SLI Mode ◦ Multiple GPUs are interconnected (physically) and cooperating in rendering the scene  AFR mode – different GPUs render subsequent frames ◦ CUDA interoperability issues  Any CUDA allocation on one GPU is automatically performed on all SLI-connected GPUs  CUDA has to use separate contexts for each GPU  cudaGLGetDevices() – identify, which devices are in SLI  cudaGLDeviceListAll  cudaGLDeviceListCurrentFrame  cudaGLDeviceListNextFrame by Martin Kruliš (v1.0)5

by Martin Kruliš (v1.0)6

Martin Kruliš by Martin Kruliš (v1.0)7

 CUDA Basic Linear Algebra Subroutines ◦ CUDA implementation of standard BLAS library ◦ Complete support of all 152 functions on vectors/matrices  copy, move, rotate, swap  maximum, minimum, multiply by scalar  sum, dot products, Euclidean norms  matrix multiplications, inverses, linear combinations ◦ Some operations have batch versions ◦ Supports floats, doubles, and complex numbers by Martin Kruliš (v1.0)8

 CUDA Sparse Linear Algebra ◦ Open source C++ library for sparse linear structures (matrices, linear systems, …) ◦ Key features  Sparse matrix operations (add, substraction, max independent set, polynomial relaxation, …)  Supports various matrix formats  COO, CSR, DIA, ELL, and HYB ◦ Require CUDA CC 2.0 or higher by Martin Kruliš (v1.0)9

 CUDA Fast Fourier Transform ◦ Decompose signal to frequency spectrum ◦ 1-3D transforms (up to 128M elements) ◦ Many variations (precision, complex/real types, …) ◦ API similar to FFTW library  Create plan ( cufftHandle ) which holds the configuration  Associate/allocate work space (buffers)  cufftExecC2C() (or R2C, C2R ) starts execution ◦ FFT plan can be associated with CUDA stream  For synchronization and overlapping by Martin Kruliš (v1.0)10

 CUDA Thrust ◦ C++ template library based on STL API ◦ Basic idea is to develop C++ parallel applications with minimal overhead ◦ STL like vectors (for devices) and vector operations  copy, fill, create sequences, reordering, sorting, … ◦ Algorithms  Transformations  Reductions  Prefix-sums by Martin Kruliš (v1.0)11

 GPU AI for Board Games ◦ Specific AI library designed for games with large, but well-defined configuration space ◦ Requires CUDA CC 2.0 ◦ Currently supports  Game Tree Split – alpha/beta pruning  Single and multiple recursion (with large depths)  Zero-sum games (3D Tic-Tac-Toe, Reversi, …)  Sudoku backtracking generator and solver  Statistical simulations (Monte Carlo for Go) by Martin Kruliš (v1.0)12

 PhysX ◦ Realtime physics engine ◦ Originally developed by Ageia for PPU card  NVIDIA bought it and re-implemented it for CUDA ◦ Most important features  Simulation of rigid bodies (collisions, destruction)  Cloths and fluid particle systems  APEX ◦ Framework built on top of PhysX ◦ Designed for easy usage (artists, games, …) by Martin Kruliš (v1.0)13

by Martin Kruliš (v1.0)14