GPU Programming “Languages”

Slides:

Advertisements

Similar presentations

GPGPU Programming Dominik G ö ddeke. 2Overview Choices in GPGPU programming Illustrated CPU vs. GPU step by step example GPU kernels in detail.

Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.

Shading CMSC 435/634. RenderMan Light Displacement Surface Volume Imager.

 The success of GL lead to OpenGL (1992), a platform-independent API that was  Easy to use  Close enough to the hardware to get excellent performance.

The Programmable Graphics Hardware Pipeline Doug James Asst. Professor CS & Robotics.

CS5500 Computer Graphics © Chun-Fa Chang, Spring 2007 CS5500 Computer Graphics April 19, 2007.

Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.

Introduction to Shader Programming

ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.

The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.

Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.

Computer Science – Game DesignUC Santa Cruz Adapted from Jim Whitehead’s slides Shaders Feb 18, 2011 Creative Commons Attribution 3.0 (Except copyrighted.

GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.

REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.

CS 480/680 Computer Graphics Course Overview Dr. Frederick C Harris, Jr. Fall 2012.

GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.

Enhancing GPU for Scientific Computing Some thoughts.

Programmable Pipelines. Objectives Introduce programmable pipelines Vertex shaders Fragment shaders Introduce shading languages Needed to describe.

Real-time Graphical Shader Programming with Cg (HLSL)

Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.

OpenGL Shading Language (Advanced Computer Graphics) Ernest Tatum.

GPU Shading and Rendering Shading Technology 8:30 Introduction (:30–Olano) 9:00 Direct3D 10 (:45–Blythe) Languages, Systems and Demos 10:30 RapidMind.

Programmable Pipelines. 2 Objectives Introduce programmable pipelines Vertex shaders Fragment shaders Introduce shading languages Needed to describe.

Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.

Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.

A Crash Course in HLSL Matt Christian.

Cg Programming Mapping Computational Concepts to GPUs.

CS 450: COMPUTER GRAPHICS REVIEW: INTRODUCTION TO COMPUTER GRAPHICS – PART 2 SPRING 2015 DR. MICHAEL J. REALE.

The Graphics Rendering Pipeline 3D SCENE Collection of 3D primitives IMAGE Array of pixels Primitives: Basic geometric structures (points, lines, triangles,

1 Introduction to Computer Graphics SEN Introduction to OpenGL Graphics Applications.

Computer Graphics I, Fall 2008 Introduction to Computer Graphics.

The programmable pipeline Lecture 3.

CSE 690: GPGPU Lecture 6: Cg Tutorial Klaus Mueller Computer Science, Stony Brook University.

CS 4363/6353 OPENGL BACKGROUND. WHY IS THIS CLASS SO HARD TO TEACH? (I’LL STOP WHINING SOON) Hardware (GPUs) double in processing power ever 6 months!

CS 480/680 Intro Dr. Frederick C Harris, Jr. Fall 2014.

Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.

Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.

Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.

GRAPHICS PIPELINE & SHADERS SET09115 Intro to Graphics Programming.

GPGPU Tools and Source Code Mark HarrisNVIDIA Developer Technology.

CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.

Programmable Pipelines Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts Director, Arts Technology Center University.

CSE 381 – Advanced Game Programming GLSL. Rendering Revisited.

May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.

Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.

David Luebke 1 1/25/2016 Programmable Graphics Hardware.

Ray Tracing using Programmable Graphics Hardware

What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.

Mapping Computational Concepts to GPUs Mark Harris NVIDIA.

GLSL I.  Fixed vs. Programmable  HW fixed function pipeline ▪ Faster ▪ Limited  New programmable hardware ▪ Many effects become possible. ▪ Global.

GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.

An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.

GLSL Review Monday, Nov OpenGL pipeline Command Stream Vertex Processing Geometry processing Rasterization Fragment processing Fragment Ops/Blending.

Computer Science – Game DesignUC Santa Cruz Tile Engine.

COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.

How to use a Pixel Shader CMT3317. Pixel shaders There is NO requirement to use a pixel shader for the coursework though you can if you want to You should.

Programmable Pipelines

Graphics Processing Unit

Introduction to OpenGL

Chapter 6 GPU, Shaders, and Shading Languages

UMBC Graphics for Games

CS5500 Computer Graphics April 17, 2006 CS5500 Computer Graphics

Debugging Tools Tim Purcell NVIDIA.

RADEON™ 9700 Architecture and 3D Performance

Computer Graphics Introduction to Shaders

CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders

Introduction to OpenGL

OpenGL Background CS 4722.

OpenGL-Rendering Pipeline

Presentation transcript:

GPU Programming “Languages”

Renderman The Language Zoo Cg HLSL Sh GLSL SlabOps OpenVidia BrookGPU Rendertexture

Some History Cook and Perlin first to develop languages for performing shading calculations Perlin computed noise functions procedurally; introduced control constructs Cook developed idea of shade Lucasfilm These ideas led to development of Renderman at Pixar (Hanrahan et al) in Renderman is STILL shader language of choice for high quality rendering ! Languages intended for offline rendering; no interactivity, but high quality.

Some History After RenderMan, independent efforts to develop high level shading languages at SGI (ISL), Stanford (RTSL). ISL targeted fixed-function pipeline and SGI cards (remember compiler from previous lecture): goal was to map a RenderMan-like language to OpenGL RTSL took similar approach with programmable pipeline and PC cards (recall compiler from previous lecture) RTSL morphed into Cg.

Some History Cg was pushed by NVIDIA as a platform-neutral, card- neutral programming environment. In practice, Cg tends to work better on NVIDIA cards (better demos, special features etc). ATI made brief attempt at competition with Ashli/RenderMonkey. HLSL was pushed by Microsoft as a DirectX-specific alternative. In general, HLSL has better integration with the DirectX framework, unlike Cg with OpenGL/DirectX.

Newer languages Writing programs on the GPU is a pain ! Need to load shaders, link variables, enable textures, manage buffers… Do I need to understand graphics to program the GPU ? Sh says ‘maybe’ Brook says ‘no’ Other packages also attempt to wrap GPU aspects inside classes/templates so that the user can program at a higher level.

Level 1: Better Than Assembly !

C-like vertex and fragment code Languages are specified in a C-like syntax. The user writes explicit vertex and fragment programs. Code compiled down into pseudo-assembly –this is a source-to-source compilation: no machine code is generated. Knowledge of the pipeline is essential –Passing array = binding texture –Start program = render a quad –Need to set transformation parameters –Buffer management a pain…

Cg Platform neutral, architecture “neutral” shading language developed by NVIDIA. One of the first GPGPU languages used widely. Because Cg is platform-neutral, many of the other GPGPU issues are not addressed –managing pbuffers –rendering to textures –handling vertex buffers “As we started out with Cg it was a great boost to getting programmers used to working with programmable GPUs. Now Microsoft has made a major commitment and in the long term we don’t really want to be in the programming language business” David Kirk, NVIDIA

HLSL Developed by Microsoft; tight coupling with DirectX Because of this tight coupling, many things are easier (no RenderTexture needed !) Xbox programming with DirectX/HLSL (XNA) But… –Cell processor will use OpenGL/Cg

GLSL GLSL is the latest shader language, developed by 3DLabs in conjunction with the OpenGL ARB, specific to OpenGL. Requires OpenGL 2.0 NVIDIA doesn’t yet have drivers for OpenGL 2.0 !! Demos (appear to be) emulated in software ATI appears to have native GL 2.0 support and thus support for GLSL. Multiplicity of languages likely to continue

Data Types Scalars: float/integer/boolean Scalars can have 32 or 16 bit precision (ATI supports 24 bit, GLSL has 16 bit integers) vector: 3 or 4 scalar components. Arrays (but only fixed size) Limited floating point support; no underflow/overflow for integer arithmetic No bit operations Matrix data types Texture data type –power-of-two issues appear to be resolved in GLSL –different types for 1D, 2D, 3D, cubemaps.

Data Binding Data Binding modes: uniform: the parameter is fixed over a glBegin()-glEnd() call. varying: interpolated data sent to the fragment program (like pixel color, texture coordinates, etc) attribute: per-vertex data sent to the GPU from the CPU (vertex coordinates, texture coordinates, normals, etc). Data direction: in: data sent into the program (vertex coordinates) out: data sent out of the program (depth) inout: both of the above (color)

Operations And Control Flow Usual arithmetic and special purpose algebraic ops (trigonometry, interpolation, discrete derivatives, etc) No integer mod… for-loops, while-do loops, if-then-else statements. discard allows you to kill a fragment and end processing. Recursive function calls are unsupported, but simple function calls are allowed Always one “main” function that starts the program, like C.

Writing Shaders: The Mechanics This is the most painful part of working with shaders. All three languages provide a “runtime” to load shaders, link data with shader variables, enable and disable programs. Cg and HLSL compile shader code down to assembly (“source-to-source”). GLSL relies on the graphics vendor to provide a compiler directly to GPU machine code, so no intermediate step takes place.

Step 1: Load the shader Shader source Create Shader Object Load shader from file Compile shader

Step 2: Bind Variables Main C code Shader source float3 main( uniform float v, sampler2D t){ … } handle for v handle for t Get handles Set values for vars

Step 3: Run the Shaders Enable Shader Enable parameters Render something Enable Program Load shader(s) into program In GLSL

Direct compilation Cg code can be compiled to fragment code for different platforms (directx, nvidia, arbfp) HLSL compiles directly to directx GLSL compiles natively. It is often the case that inspecting the Cg compiler output reveals bugs, shows inefficiences etc that can be fixed by writing assembly code (like writing asm routines in C) In GLSL you can’t do this because the code is compiled natively: you have to trust the vendor compiler !

Overview Shading languages like Cg, HLSL, GLSL are ways of approaching Renderman but using the GPU. These will never be the most convenient approach for general purpose GPU programming But they will probably yield the most efficient code –you either need an HLL and great compilers –or you suffer and program in these.

Level 2: We know what you want

Wrapper libraries Writing code that works cross-platform, with all extensions, is hard. Wrappers take care of the low-level issues, use the right commands for the right platform, etc. RenderTexture: –Handles offscreen buffers and render-to-texture cleanly –works in both windows and linux (only for OpenGL though) –de facto class of choice for all Cg programming (use Cg for the code, and RenderTexture for texture management).

OpenVidia Video and image processing library developed at University of Toronto. Contains a collection of fragment programs for basic vision tasks (edge detection, corner tracking, object tracking, video compositing, etc) Provides a high level API for invoking these functions. Works with Cg and OpenGL, only on linux (for now) Level of transparency is low: you still need to set up GLUT, and allocate buffers, but the details are somewhat masked)

OpenVidia: Example Create processing object: d=new FragPipeDisplay( ); Create image filter filter1 = new GenericFilter(…, ); Make some buffers for temporary results: d->init_texture(0, 320, 240, foo); d->init_texture4f(1, 320, 240, foo); Apply filter to buffer, store in output buffer d->applyFilter(filter1, 0,1);

Level 3: I can’t believe it’s not C !

High Level C-like languages Main goal is to hide details of the runtime and distill the essence of the computation. These languages exploit the stream aspect of GPUs explicitly They differ from libraries by being general purpose. They can target different backends (including the CPU) Either embed as C++ code (Sh) or come with an associated compiler (Brook) to compile a C-like language.

Sh Open-source code developed by group led by Michael McCool at Waterloo Technical term is ‘metaprogramming’ Code is embedded inside C++; no extra compile tools are necessary. Sh uses a staged compiler: parts of code are compiled when C++ code is compiled, and the rest (with certain optimizations) is compiled at runtime. Has a very similar flavor to functional programming Parameter passing into streams is seamless, and resource constraints are managed by virtualization.

Sh Example ShPoint3f a(1,2,3); ShMatrix4f M; ShProgram displace = SH_BEGIN_PROGRAM(“gpu:stream”) { ShInputPoint3f b; ShInputAttrib1f s; ShOutputPoint3f c = M | (a + s * normalize(b)); } SH_END; ShChannel p; ShChannel q; ShStream data = p & q; p = displace << data;

Sh Example ShPoint3f a(1,2,3); ShMatrix4f M; ShProgram displace = SH_BEGIN_PROGRAM(“gpu:stream”) { ShInputPoint3f b; ShInputAttrib1f s; ShOutputPoint3f c = M | (a + s * normalize(b)); } SH_END; ShChannel p; ShChannel q; ShStream data = p & q; p = displace << data; Definition of a point

Sh Example ShPoint3f a(1,2,3); ShMatrix4f M; ShProgram displace = SH_BEGIN_PROGRAM(“gpu:stream”) { ShInputPoint3f b; ShInputAttrib1f s; ShOutputPoint3f c = M | (a + s * normalize(b)); } SH_END; ShChannel p; ShChannel q; ShStream data = p & q; p = displace << data; Definition of a matrix

Sh Example ShPoint3f a(1,2,3); ShMatrix4f M; ShProgram displace = SH_BEGIN_PROGRAM(“gpu:stream”) { ShInputPoint3f b; ShInputAttrib1f s; ShOutputPoint3f c = M | (a + s * normalize(b)); } SH_END; ShChannel p; ShChannel q; ShStream data = p & q; p = displace << data;

Sh Example ShPoint3f a(1,2,3); ShMatrix4f M; ShProgram displace = SH_BEGIN_PROGRAM(“gpu:stream”) { ShInputPoint3f b; ShInputAttrib1f s; ShOutputPoint3f c = M | (a + s * normalize(b)); } SH_END; ShChannel p; ShChannel q; ShStream data = p & q; p = displace << data; Specify target architecture

Sh Example ShPoint3f a(1,2,3); ShMatrix4f M; ShProgram displace = SH_BEGIN_PROGRAM(“gpu:stream”) { ShInputPoint3f b; ShInputAttrib1f s; ShOutputPoint3f c = M | (a + s * normalize(b)); } SH_END; ShChannel p; ShChannel q; ShStream data = p & q; p = displace << data; Construct channels and streams

Sh Example ShPoint3f a(1,2,3); ShMatrix4f M; ShProgram displace = SH_BEGIN_PROGRAM(“gpu:stream”) { ShInputPoint3f b; ShInputAttrib1f s; ShOutputPoint3f c = M | (a + s * normalize(b)); } SH_END; ShChannel p; ShChannel q; ShStream data = p & q; p = displace << data; Run the code !

Sh GPU Example ShProgram vsh = SH_BEGIN_VERTEX_PROGRAM { ShOutputPosition4f opos; ShOutputNormal3f onrm; ShOutputVector3f olightv; } ShProgram fsh = SH_BEGIN_FRAGMENT_PROGRAM { ShInputPosition4f ipos; ShInputNormal3f inrm; ShInputVector3f ilightv; } shBind(vsh); shBind(fsh);

And more… All kinds of other functions to extract data from streams and textures. Lots of useful ‘primitive’ streams like passthru programs and generic vertex/fragment programs, as well as specialized lighting shaders. Sh is closely bound to OpenGL; you can specify all usual OpenGL calls, and Sh is invoked as usual via a display() routine. Plan is to have DirectX binding ready shortly (this may be already be in) Because of the multiple backends, you can debug a shader on the CPU backend first, and then test it on the GPU.

BrookGPU Open-source code developed by Ian Buck and others at Stanford. Intended as a pure stream programming language with multiple backends. Is not embedded in C code; uses its own compiler (brcc) that generates C code from a.br file. Workflow: –Write Brook program (.br) –Compile Brook program to C (brcc) –Compile C code (gcc/VC)

BrookGPU Designed for general-purpose computing (this is primary difference in focus from Sh) You will almost never use any graphics commands in Brook. Basic data type is the stream. Types of functions: –Kernel: takes one or more input streams and produces an output stream. –Reduce: takes input streams and reduces them to scalars (or smaller output streams) –Scatter: a[o i ] = s i. Send stream data to array, putting values in different locations. –Gather: Inverse of scatter operation. s i = a[o i ]. The last two operations are not fully supported yet.

Brook Example void main() { float a,b,c; float ip; kernel void prod(float a<>, float b<>, out float c<>) { c = a * b; } reduce void SUM( float4 a<>, reduce float4 b <>) { b = b + a;} prod(a,b,c); reduce(c, ip); }

Brook Example float a,b,c; float ip; kernel void prod(float a<>, float b<>, out float c<>) { c = a * b; } reduce void SUM( float4 a<>, reduce float4 b <>) { b = b + a;} prod(a,b,c); reduce(c, ip); Input streams

Brook Example float a,b,c; float ip; kernel void prod(float a<>, float b<>, out float c<>) { c = a * b; } reduce void SUM( float4 a<>, reduce float4 b <>) { b = b + a;} prod(a,b,c); reduce(c, ip); multiply components

Brook Example float a,b,c; float ip; kernel void prod(float a<>, float b<>, out float c<>) { c = a * b; } reduce void SUM( float4 a<>, reduce float4 b <>) { b = b + a;} prod(a,b,c); reduce(c, ip); Compute final sum

Sh vs Brook Brook is more general: you don’t need to know graphics to run it. Very good for prototyping  You need to rely on compiler being good.  Many special GPU features cannot be expressed cleanly. Sh allows better control over mapping to hardware. Embeds in C++; no extra compilation phase necessary.  Lots of behind-the-scenes work to get virtualization: is there a performance hit ?  Still requires some understanding of graphics.

The Big Picture The advent of Cg, and then Brook/Sh signified a huge increase in the number of GPU apps. Having good programming tools is worth a lot ! The tools are still somewhat immature; almost non-existent debuggers and optimizers, and only one GPU simulator (Sm). I shouldn’t have to worry about the correct parameters to pass when setting up a texture for use as a buffer: we need better wrappers. Low-level shaders are not going away soon; you need them to extract the best performance from a card. Compiler efforts are lagging application development: more work is needed to allow for high level language development without compromising performance. In order to do this, we need to study stream programming. Maybe draw ideas from the functional programming world ? Libraries are probably the way forward for now.

Questions ?