GPU Programming Yanci Zhang Game Programming Practice.

GPU Programming Yanci Zhang Game Programming Practice

Outline Parallel computing GPU overview
OpenGL shading language overview Vertex / Geometry / Fragment shader Using GLSL in OpenGL Application: Per-pixel shading Game Programming Practice

Why Parallel Computing?
Performance of CPU increased 50% per year from 1986 to 2002 Simply wait for the next generation of CPU in order to obtain increased performance Single-processor performance improvement slowed down to 20% since 2002 The road to rapidly increasing performance lay in the direction of parallelism Game Programming Practice

Why Parallel Computing?
Performance of CPU increased 50% per year from 1986 to 2002 Simply wait for the next generation of CPU in order to obtain increased performance Single-processor performance improvement slowed down to 20% since 2002 The road to rapidly increasing performance lay in the direction of parallelism Put multiple processors on a single circuit rather than developing ever-faster monolithic processor Game Programming Practice

What is GPU ? GPU: Graphics Processing Unit
Developed rapidly from being primitive drawing devices to being major computing resources Extremely powerful and flexible processor Tremendous memory bandwidth and computational power High level languages have emerged Capable of general-purpose computation beyond graphics applications GPU has evolved into an extremely powerful and flexible processor. The latest graphics architectures provide tremendous memory bandwidth and computational power, with fully programmable vertex and pixel processing units that support vector operations up to 32-bits floating point precision. High level languages have emerged for graphics hardware, making this computational power accessible. Architecturally, GPUs are highly parallel streaming processors optimized for vector operations, with both MIMD (vertex) and SIMD (pixel) pipelines. Not surprisingly, these processors are capable of general-purpose computation beyond the graphics applications for which they were designed. Game Programming Practice

Motivation In many respects GPU is more powerful than CPU
Computational power: FLOPS (Floating point Operations Per Second) Parallelism Bandwidth Performance growth rate Game Programming Practice

Floating Point Calculation
FLOPS: A common benchmark measurement for rating the speed of FPU CPU Intel Core i7 980 XE (quad-core): GFLOPS GPU nVidia GeForce GTX 480: 2.02 TFLOPS Modern GPUs support high precision 32-bit floating point throughout the pipeline No support for a double precision format a common benchmark measurement for rating the speed of microprocessors. Floating-point operations include any operations that involve fractional numbers. Game Programming Practice

Parallelism Parallelism: allows simultaneous operations at the same time CPU Do not adequately exploit parallelism Dual-core, quad-core GPU GeForce GTX 480: 512 kernels CPU programming models are generally serial ones that do not adequately expose data parallelism in their applications. They do an admirable job of taking advantage of IP and allow some DP execution, but the degree of parallelism exploited by CPU is much less than that of GPU Game Programming Practice

Bandwidth Peak performance of computer systems is often far in excess of actual application performance The bandwidth between key components ultimately dictates system performance CPU 64bits DDR dual-channel: 17GB/s GPU GeForce GTX 480: 384bits, 177.4GB/s Peak performance of computer systems is often far in excess of actual application performance, due to the memory gap problem, the mismatch of memory and processor performance. In data-intensive applications, the processing elements (PEs) often spend most of the time waiting for data. GPUs have traditionally been optimized for high data throughput, with wide data buses (256 bit) and the latest memory technology (GDDR3). Game Programming Practice

Getting Faster and Faster
CPU Annual growth ~ 1.5x -> decade growth ~60x Moore’s law GPU Annual growth ~2.0x -> decade growth > 1000x Faster than Moore’s law Multi-billion dollar video game market is a pressure cooker that drives innovation Game Programming Practice

Keys to High-Perf. Computing
Efficient computation Maximize the hardware devoted to computation Allow parallelism Task parallelism Data parallelism Instruction parallelism Ensure each computation unit operates at maximum efficiency We can envision several ways to exploit parallelism and permit simultaneous execution. TP: Run tasks on different data at the same time DP: Within a stage, if we are running a task on several data elements, we may be able to exploit DP in evaluating them at the same time IP: Within the complex evaluation of a single data element, we may be able to evaluate several simple operations at the same time Game Programming Practice

Keys to High-Perf. Computing
Efficient communication Simply providing large amounts of computation is not sufficient PEs often spend most of the time waiting for data Minimize off-chip communication As both clock speeds and chip sizes increase, the amount of time it takes for a signal to travel across an entire chip, measured in clock cycles, is also increasing. On today’s fastest processors, sending a signal from one side of a chip to another typically requires multiple clock cycles, and this amount of time increases with each new process generation. Game Programming Practice

Stream Programming Model
A programming model allowing high efficiency in computation and communication Two basic components Stream All data is represented as a stream An ordered set of data of the same data type Kernels: operations on streams Applications are constructed by chaining multiple kernels together Part of the reason that CPUs are poorly suited to many of these high-performance applications is their serial programming model, which does not expose the parallelism and communication patterns in the application. In the stream programming model, all data is represented as a stream, which we define as an ordered set of data of the same data type. That data type can be simple (a stream of integers or floating-point numbers) or complex (a stream of points or triangles or transformation matrices). While a stream can be any length, we will see that operations on streams are most efficient if streams are long. Game Programming Practice

Kernel Operates on entire streams of elements and produces new streams
Within a kernel, computations on one stream element are never dependent on computations on another element Input elements and intermediate computed data are stored locally Fits perfectly onto data-parallel hardware A kernel operates on entire streams, taking one or more streams as inputs and producing one or more streams as outputs. The defining characteristic of a kernel is that it operates on entire streams of elements as opposed to individual elements. The most typical use of a kernel is to evaluate a function on each element of an input stream (a “map” operation); for example, a transformation kernel may project each element of a stream of points into a different coordinate system. Kernel outputs are functions only of their kernel inputs, and within a kernel, computations on one stream element are never dependent on computations on another element. These restrictions have two major advantages. First, the data required for kernel execution is completely known when the kernel is written (or compiled). Kernels can thus be highly efficient when their input elements and their intermediate computed data are stored locally or are carefully controlled global references. Second, requiring independence of computation on separate stream elements within a single kernel allows mapping what appears to be a serial kernel calculation onto data-parallel hardware. Game Programming Practice

Efficient Computation (1)
Use of transistors can be divided to three categories: Control: direct the computation Datapath: perform computation Storage: store data Game Programming Practice

Efficient Computation (2)
Only simple control flow in kernel execution Devote most of transistors to datapath hardware rather than control hardware Streams expose parallelism in the application Allows a hardware implementation to specialize hardware Because kernels operate on entire streams, stream elements can be processed in parallel using data-parallel hardware. Long streams with many elements allow this data-level parallelism to be highly efficient. Within the processing of a single element, we can exploit instruction-level parallelism. And because applications are constructed from multiple kernels, multiple kernels can be deeply pipelined and processed in parallel, using task-level parallelism. Dividing the application of interest into kernels allows a hardware implementation to specialize hardware for one or more kernels’ execution. Special-purpose hardware, with its superior efficiency over programmable hardware, can thus be used appropriately in this programming model. Finally, allowing only simple control flow in kernel execution (such as the data-parallel evaluation of a function on each input element) permits hardware implementations to devote most of their transistors to datapath hardware rather than control hardware Game Programming Practice

Efficient Communication
Off-chip communication is efficient Intermediate results between kernels are kept on-chip to minimize off-chip communication High degree of latency tolerance First, off-chip (global) communication is more efficient when entire streams, rather than individual elements, are transferred to or from memory, because the fixed cost of initiating a transfer can be amortized over an entire stream rather than a single element. Next, structuring applications as chains of kernels allows the intermediate results between kernels to be kept on-chip and not transferred to and from memory. Efficient kernels attempt to keep their inputs and their intermediate computed data local within kernel execution units; therefore, data references within kernel execution do not go off-chip or across a chip to a data cache, as would typically happen in a CPU. And finally, deep pipelining of execution allows hardware implementations to continue to do useful work while waiting for data to return from global memories. This high degree of latency tolerance allows hardware implementations to optimize for throughput rather than latency. Game Programming Practice

Instruction-Stream-Based (CPU)
Prescribes both the operation to be executed and the required data Only a limited prefetch of the input data can occur Jumps are expected in the instruction stream L2 cache consumes lots of the transistors in CPU CPU programming models are generally serial ones that do not adequately expose data parallelism in their applications. They do an admirable job of taking advantage of IP and allow some DP execution, but the degree of parallelism exploited by CPU is much less than that of GPU One reason parallel hardware is less prevalent in CPU is the designers’ decision to devote more transistors to control hardware. CPU programs have more complex control requirements than GPU programs, so a large fraction of a CPU’s transistors and wires implements complex control functionality such as branch prediction and out-of-order execution. Game Programming Practice

Data-Stream-Based (GPU)
Separates two tasks: Configuring PEs Controlling data-flow to and from PEs Data elements can be assembled from memory before processing Uses only small caches and devotes the majority of transistors to computation Game Programming Practice

Mapping Pipeline to Stream Model
The stream formulation of the graphics pipeline All data as streams All computation as kernels Both user-programmable and nonprogrammable stages can be expressed as kernels The graphics pipeline is a good match for the stream model for several reasons. The graphics pipeline is traditionally structured as stages of computation connected by data flow between the stages. This structure is analogous to the stream and kernel abstractions of the stream programming model. Data flow between stages in the graphics pipeline is highly localized, with data produced by a stage immediately consumed by the next stage; in the stream programming model, streams passed between kernels exhibit similar behavior. And the computation involved in each stage of the pipeline is typically uniform across different primitives, allowing these stages to be easily mapped to kernels. Game Programming Practice

Fixed vs. Programmable Fixed Programmable Very fast
Can not modify the pipeline, only can turn on/off some functions Hard to implement advanced techniques on GPU Programmable Allows programmers to write shaders to change the pipeline Implementing the pipeline in hardware make processing polygons much faster, but the developer could not modify the pipeline Game Programming Practice

Basic Programmable Graphics Hardware
Three programmable kernels in pipeline Vertex shader Geometry shader Pixel shader Load shaders through graphics API The fixed pipeline are replaced by shaders Game Programming Practice

OpenGL 4.3 Pipelines OpenGL 4.3 Pipelines GPGPU programming pipeline
Tessellation Evaluation Shader graphics rendering pipeline Game Programming Practice

Vertex Processor MIMD: Multiple Instruction stream, Multiple Data stream A number of processors that function asynchronously and independently Game Programming Practice

Vertex Shader: Basic Function
Operate on a single input vertex and produce a single output vertex Replace transformation & lighting unit Now you have to do everything by yourself Transformation Lighting Texture coordinates generation As a minimum, a vertex shader must output vertex position in homogeneous clip space The vertex-shader (VS) stage processes vertices from the input assembler, performing per-vertex operations such as transformations, skinning, morphing, and per-vertex lighting. Vertex shaders always operate on a single input vertex and produce a single output vertex. Instead of setting parameters to control the pipeline, you write a vertex shader program that executes on the graphics hardware. A vertex shader is a graphics processing function used to add special effects to objects in a 3D environment by performing mathematical operations on the objects' vertex data. Each vertex can be defined by many different variables. For instance, a vertex is always defined by its location in a 3D environment using the x-, y-, and z- coordinates. Vertices may also be defined by colors, coordinates. Vertices may also be defined by colors, textures, and lighting characteristics. Vertex Shaders don't actually change the type of data; they simply change the values of the data, so that a vertex emerges with a different color, different textures, or a different position in space. As a minimum, a vertex shader must output vertex position in homogeneous clip space. Optionally, the vertex shader can output texture coordinates, vertex color, vertex lighting, fog factors, and so on. Game Programming Practice

Vertex Shader: Advanced Function
What else we can do? Displacement mapping Object deformation Vertex blending Game Programming Practice

Vertex Shader: Limitations
We can not Add or delete any vertices Change the primitive type Change the order of vertices form the primitives No knowledge of the type of primitive and neighboring vertices Game Programming Practice

Fragment Processor SIMD: Single Instruction, Multiple Data
Achieves data level parallelism “get this pixel, get the next one” -> “get lots of pixel” Game Programming Practice

Fragment Shader: Basic Function
Invoked once for each fragment covered by the primitive Computes the final pixel color and depth Can output up to 8 32-bit 4-component data for the current pixel location Game Programming Practice

Fragment Shader: Advanced Function
Enables rich shading techniques Per-pixel lighting, bump mapping, normal mapping Fluid simulation … Game Programming Practice

Fragment Shader: Limitations
Dynamic branching less efficient than vertex proc. Can not change the screen coordinate of a fragment No arbitrary memory write Game Programming Practice

Geometry Shader New for 2007 Executed after vertex shaders
Input: whole primitive, possibly with adjacent information Invoked once for every primitive Output: multiple vertices forming a single selected topology (tristrip, linestrip, pointlist) Output may be fed to rasterizer and/or to a vertex buffer in memory The geometry-shader (GS) stage runs application-specified shader code with vertices as input and the ability to generate vertices on output. Unlike vertex shaders, which operate on a single vertex, the geometry shader's inputs are the vertices for a full primitive (two vertices for lines, three vertices for triangles, or single vertex for point). Geometry shaders can also bring in the vertex data for the edge-adjacent primitives as input (an additional two vertices for a line, an additional three for a triangle). The geometry-shader stage is capable of outputting multiple vertices forming a single selected topology (GS stage output topologies available are: tristrip, linestrip, and pointlist). The number of primitives emitted can vary freely within any invocation of the geometry shader, though the maximum number of vertices that could be emitted must be declared statically. When a geometry shader is active, it is invoked once for every primitive passed down or generated earlier in the pipeline. Each invocation of the geometry shader sees as input the data for the invoking primitive, whether that is a single point, a single line, or a single triangle. A triangle strip from earlier in the pipeline would result in an invocation of the geometry shader for each individual triangle in the strip (as if the strip were expanded out into a triangle list). All the input data for each vertex in the individual primitive is available (i.e. 3 vertices for triangle), plus adjacent vertex data if applicable/available. A geometry shader outputs data one vertex at a time by appending vertices to an output stream object. The topology of the streams is determined by a fixed declaration Game Programming Practice

Geometry Shader: Applications
Point Sprite Expansion Single Pass Render-to-Cubemap Dynamic Particle Systems Fur/Fin Generation Shadow Volume Generation Game Programming Practice

Programmable GPUs: Applications
Graphics applications Per-pixel lighting Ray tracing Deformation GPGPU Computer vision Physically-based simulation Image processing Database queries Game Programming Practice

GPGPU General-purpose Computation on GPUs
Capable of performing more than the specific graphics computations Goal: make the inexpensive power of the GPU available to developers as a sort of computational coprocessor Example applications range from in-game physics simulation to conventional computational science With the increasing programmability of commodity graphics processing units (GPUs), these chips are capable of performing more than the specific graphics computations for which they were designed. They are now capable coprocessors, and their high speed makes them useful for a variety of applications. The goal of this page is to catalog the current and historical use of GPUs for general-purpose computation. Game Programming Practice

Shading Language Production rendering Real-time rendering
Geared towards maximum image quality Example: RenderMan Real-time rendering GLSL: OpenGL shading language HLSL: DirectX High-level shading language CG: C for Graphic, NVidia Game Programming Practice

OpenGL Shading Language
High level shading language based on C Not a hardware-specific language Cross platform compatibility on multiple OS Each hardware vender includes GLSL compiler in their driver Game Programming Practice

Before Using GLSL Check whether your GPU supports GLSL
GLSL is part of OpenGL 2.0 If OpenGL 2.0 is not available, then use OpenGL extensions Game Programming Practice

Extensions Required GL_ARB_shader_object GL_ARB_fragment_shader
Adds API calls that are necessary to manage shader objects and program objects GL_ARB_fragment_shader Adds functionality to define fragment shader objects GL_ARB_vertex_shader Adds functionality to define vertex shader objects Game Programming Practice

GLEW 1/2 GLEW: The OpenGL Extension Wrangler Library ( Initialize GLEW #include <GL/glew.h> #include <GL/glut.h> ... glutInit(&argc, argv); glutCreateWindow("GLEW Test"); GLenum err = glewInit(); if (GLEW_OK != err) { /* Problem: glewInit failed, something is seriously wrong. */ fprintf(stderr, "Error: %s\n", glewGetErrorString(err)); ... } Game Programming Practice

GLEW 2/2 Check extensions Check core OpenGL functionality
if (GLEW_ARB_vertex_shader) { /* It is safe to use the GL_ARB_vertex_shader extension here. */ } if (GLEW_VERSION_2_0) { /* Yay! OpenGL 2.0 is supported! */ } Game Programming Practice

Data Types Scalar Vector Matrix Texture bool, int, float
Supports 2D, 3D, 4D vector: vec{2,3,4}, ivec{2,3,4}, bvec{2,3,4} Matrix Square matrix: mat2, mat3, mat4 mat2x3, mat2x4, mat3x2, mat3x4, mat4x2, mat4x3 Texture sampler1D, sampler2D, sampler3D samplerCube sampler1DShadow, sampler2DShadow Game Programming Practice

Variables 1/3 Pretty much the same as in C
Flexible when initializing variables using other variables float a,b; // two float variables (the comments are like in C) int c = 2; // initialize a variable when declaring it vec3 g = vec3(1.0,2.0,3.0); //declare and initialize a vector vec2 a = vec2(1.0,2.0); vec2 b = vec2(3.0,4.0); vec4 c = vec4(a,b) // c = vec4(1.0,2.0,3.0,4.0); Game Programming Practice

Variables 2/3 Flexible when accessing a vector
{x, y, z, w}: accessing vectors that represent points or normals {r, g, b, a}: accessing vectors that represent colors {s, t, p, q}: accessing vectors that represent texture coordinates Game Programming Practice

Variables 3/3 Accessing components beyond those declared for the vector type is an error vec4 a = vec4(1.0, 2.0, 3.0, 4.0); float posX = a.x; //posX = 1.0 float posY = a[1]; //posY = 2.0 float depth = a.w; //depth = 4.0 Vec3 b = a.xxy; // b = vec3(1.0, 1.0, 2.0) Vec3 c = a.bra; // b = vec3(3.0, 1.0, 4.0) vec2 t = vec2(1.0, 2.0); float tt = t.z; //incorrect! Game Programming Practice

Vector and Matrix Operations
Operations are component-wise vec3 u, v, w; float f; mat3 a1, a2, a3; u = v+ f; u = v + w; u = v * a1; a1 = a2 * a3; u.x = v.x + f; u.y = v.y + f; u.z = v.z + f; u.x = v.x + w.x; u.y = v.y + w.y; u.z = v.z + w.z; u.x = dot(v, a1[0]); u.y = dot(v, a1[1]); u.z = dot(v, a1[2]); Game Programming Practice

Control Flow Statements
selection (if-else) iteration (for, while, and do-while) jumps (discard, return, break, and continue) discard is only allowed within fragment shaders discard causes the fragment to be discarded and no updates to any buffers will occur if (depth > 0.5) discard; Game Programming Practice

Function Definition The function main() is used as the entry point to a shader executable returnType functionName (type0 arg0, type1 arg1, ..., typen argn) { // do some computation return returnValue; } Game Programming Practice

Important Build-in Variables 1/2
gl_Position (vec4) Output of vertex shader Homogeneous vertex position Must write a value into this variable gl_FragCoord (vec4) Holds the window relative coordinates x, y, z, and 1/w values for the fragment Read-only variable in fragment shader Game Programming Practice

Important Build-in Variables 2/2
gl_FragColor (vec4) Output of fragment shader Writing to gl_FragColor specifies the fragment color gl_FragDepth (float) Default value: gl_FragCoord.z If you write to gl_FragDepth, then it is your responsibility for always writing it Game Programming Practice

Build-in Functions Angle and trigonometry functions
sin, cos, asin, acos … Exponential functions pow, exp, sqrt … Common functions abs, clamp, smoothstep … Geometric functions length, dot, cross … Game Programming Practice

Build-in Functions Matrix functions Vector relational functions
outerProduct, transpose … Vector relational functions lessThan, equal … Texture lookup functions texture2D, texture2DLod… Fragment processing functions Noise functions Game Programming Practice

Important Build-in Functions
ftransform() For vertex shaders only Produces exactly the same result as would be produced by OpenGL’s fixed functionality transform reflect(vec3 I, vec3 N) Computes reflection vector by incident vector I and normal vector N gl_Position = ftransform() Game Programming Practice

First Example Vertex shader Fragment shader void main() {
gl_Position = ftransform(); } void main() { gl_FragColor = vec4(1.0, 1.0, 1.0, 1.0); } Game Programming Practice

Make Fun of Fragment Shader
void main() { vec4 t = vec4(1.0, 0.6, 0.3, 0.0); gl_FragColor = t.xxxx; //flexible vector accessing } void main() { gl_FragColor = vec4(gl_FragCoord.zzz, 1.0); //let’s view the depth map } void main() { if (gl_FragCoord.x > 320) discard; //try discard gl_FragColor = vec4(1.0, 1.0, 1.0, 1.0); } Game Programming Practice

More Build-in Variables
Vertex shader build-in attributes gl_Vertex, gl_Normal, gl_Color, gl_MultiTexCoord[] … Vertex shader build-in output variables gl_FrontColor, gl_TexCoord[] … Fragment shader build-in input variables gl_Color, gl_TexCoord[] … Built-In uniform state gl_ModelViewMatrix, gl_ProjectionMatrix … Game Programming Practice

Example: Using Build-in Matrixes
void main() { gl_Position = ftransform(); } void main() { gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex; } void main() { gl_Position = gl_ModelViewMatrix * gl_Vertex; gl_Position = gl_ProjectionMatrix * gl_Position; } Game Programming Practice

Example: Using Colors Vertex shader Fragment shader void main() {
gl_Position = ftransform(); gl_FrontColor = gl_Color; } void main() { gl_FragColor = gl_Color; } Game Programming Practice

Example: Using Texture Coordinates
Vertex shader Fragment shader void main() { gl_Position = ftransform(); gl_TexCoord[0] = vec4(gl_MultiTexCoord0.xy, 1.0, 0.0); } void main() { gl_FragColor = gl_TexCoord[0]; } Game Programming Practice

gl_NormalMatrix Important to per-vertex and per-pixel lighting
Transpose of the inverse of the upper leftmost 3x3 of gl_ModelViewMatrix Converts normal vector from object space to eye space Game Programming Practice

View Normal Vectors Vertex shader Fragment shader void main() {
gl_Position = ftransform(); gl_FrontColor = vec4(gl_Normal, 1.0); } void main() { gl_Position = ftransform(); gl_FrontColor = vec4(gl_NormalMatrix * gl_Normal, 1.0); } void main() { gl_FragColor = gl_Color; } Game Programming Practice

Communications Communication between OpenGL and shader
One way communication Use uniform qualifier when declaring variables Communication between vertex and fragment shader Use varying qualifier when declaring variables Game Programming Practice

Uniform Used to declare global variables
Variable values are the same across the entire primitive being processed Read-only Initialized externally either at link time or through the API uniform vec4 lightPosition; uniform vec3 color = vec3(0.7, 0.7, 0.2); // value assigned at link time Game Programming Practice

OpenGL Setup Game Programming Practice

Creating Shader Object
_ShaderID = glCreateShader(GL_VERTEX_SHADER); if (_ShaderID == 0) //glCreateShader() return 0 if it fails to create a shader object { printf("Fail to create shader object!\n"); exit(-1); } //load the shader source file to a string _pShaderSource glShaderSource(_ShaderID, 1, (const GLchar **)&_pShaderSource, &fileLen); CheckGLError(__FILE__, __LINE__); glCompileShader(_ShaderID); glGetShaderiv(_ShaderID, GL_COMPILE_STATUS, &ShaderStatus); if (ShaderStatus == GL_FALSE) printf("Fail to compile the shader: %s\n", vFileName); Game Programming Practice

Creating Program Object
_ProgramID = glCreateProgram(); if (_ProgramID == 0) { printf("Fail to create shader program object!\n"); exit(-1); } glAttachShader(_ProgramID, VertexShaderID); //attach vertex shader CheckGLError(__FILE__, __LINE__); glAttachShader(_ProgramID, FragShaderID); //attach fragment shader glLinkProgram(_ProgramID); glGetProgramiv(_ProgramID, GL_LINK_STATUS, &ProgramStatus); if (ProgramStatus == GL_FALSE) printf("Fail to link the program!\n"); glUseProgram(_ProgramID); Game Programming Practice

Initialize Uniform Variables
Suppose an uniform variable is declared in shader: Initialize uniform variable by OpenGL uniform vec3 u_Color; loc = glGetUniformLocation(_ProgramID, “u_Color”); if (loc == -1) { cout << "Error: can't find uniform variable! \n"; } glUniform3f(loc, v0, v1, v2); Game Programming Practice

Application: Per-Pixel Shading
Three types of light in OpenGL Ambient light Diffuse light Specular light Fixed pipeline conducts vertex-based shading Fast but poor quality Per-pixel shading is possible by utilizing the programmable ability of modern GPU Game Programming Practice

Assignment Add specular light Game Programming Practice

GPU Programming Yanci Zhang Game Programming Practice.

Similar presentations

Presentation on theme: "GPU Programming Yanci Zhang Game Programming Practice."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GPU Programming Yanci Zhang Game Programming Practice.

Similar presentations

Presentation on theme: "GPU Programming Yanci Zhang Game Programming Practice."— Presentation transcript:

Similar presentations

About project

Feedback