Download presentation
1
GPUGI: Global Illumination Effects on the GPU
László Szirmay-Kalos, TU Budapest László Szécsi, TU Budapest Mateu Sbert, U of Girona
2
Tutorial outline Basics: LI/GI, GPU (20 mins)
Simple Improvements to the LI (20 mins): Shadow and IBL Specular effects with rasterization (30 mins) Reflection, refraction, caustics Ray Tracing (20 mins) Stochastic Radiosity (10 mins) Final gathering of indirect illumination (10 mins) Precomputation Aided Global Illumination (20 mins) Illumination networks (10 mins) Fake GI: obscurances (20 mins) The content of this tutorial is shown here. First the basic concepts and terms are discussed, including the meaning of local and global illumination and the architecture of Shader Model 3.0 compatible GPUs. Since these GPUs are built for local illumination graphics, we address the fundamental problem of how to use them for a different task. Then to warm up, two simple improvements of the pure local illumination solution are examined which are still in the local illumination context. These are shadow generation and image based lighting. The third chapter of this talk deals with specular effects, such as mirror reflections, refractions, and caustics generation, transforming the problems to make them appropriate for the rasterization solution of current graphics hardware. Then we address these specular effects from the point of view of classical ray tracing instead of rasterization, and discuss how the ray tracing algorithm can be implemented on the GPU. After the break, new chapters of the field of global illumination are examined from the point of view of GPU implementation. Including the diffuse radiosity, and final gathering. Precomputation aided methods assume that certain properties of the scene do not change, and precompute the related data before starting the real-time simulation. After discussing this for surface models, it is also shown that the idea works for participating media as well, where the classical illumination network concept is ported to the GPU. Finally, it is shown that in many cases it is worth simplifying the problem to be solved in order to speed it up. This is the area of fake global illumination approaches, which are not solving the rendering equation, but provide results that are similar to costly global illumination simulations.
3
Basics: LI, GI, GPU László Szirmay-Kalos
Let us start with the basic terms of local illumination and global illumination, and let me provide a quick introduction of current GPU architecture.
4
Global Illumination rendering
Visibility determination Illumination computation New visibility and Illumination tasks pixel Rendering consists of the determination of those points that are visible through the pixels of the virtual camera, and the computation of their radiance in the direction of the eye. In global illumination rendering, when the radiance of a point is needed, the indirect illumination of all those points of the scene is taken into account, which are visible from here. This means that the radiance or color computation of a point generates new visibility problems to be solved, and new color computation tasks to determine the intensity of the indirect illumination.
5
Pure Local Illumination
Color computation uses just local information 2. Visibility only from the camera pixel To simplify this, pure local illumination approaches ignore indirect illumination and take into account just the illumination of the direct light sources when the color of a point is determined. Assuming that the direct light sources are always visible, this simplification reduces the required visibility calculations for the camera. This means that color computation does not require any visibility information, and the reflected radiance can be determined using only local properties, including the reflectance and normal vector, in addition to the global light source data. The consequence of the simplification is that shadows, mirror like and diffuse/glossy interreflections disappear from the image. The advantages of requiring only local properties are very important. It means that the color of all points can be computed independently and parallelly, and using just small amount of data. This opens a lot of possibilities for parallel execution. In fact the only task when we need the scene geometry as a whole is the calculation of the visibility from the eye. So if we determine the color of points independently, we should composite the results taking account of geometric occlusions.
6
Visibility solution: depth or alpha compositing
pixel Ray tracing versus Rasterization The solution of the geometric occlusion based compositing is well known in graphics. An image centric approach is ray tracing which processes pixels one by one and obtains those points which are visible in a pixel. If the objects are opaque, then the color of the closest object is written in the pixel. An object centric approach, on the other hand, rasterizes objects one by one, and for each object it determines those pixels that are inside the object’s projection. If the objects are opaque, we should guarantee that finally that object’s color is stored, which is closest to the camera. This is provided by the famous depth compositing or depth buffer algorithm. If the objects are transparent, then they should be processed in the order of their distance from the camera, and the weighted sum of the object’s color and the already accumulated color is computed. This is called alpha compositing.
7
Local illumination processing pipeline
x y z eye Local illumination processing pipeline Rasterization Compositing Color + Depth image objects Transformation and lighting Clipping 1 Independent processing: 1. Pipeline operation 2. Parallel branches Current graphics hardware is designed to solve the pure local illumination problem using rasterization and either depth of alpha compositing. Objects are rasterized in screen coordinates, where the visibility rays are parallel with axis z. The rasterization of different objects is independent, and the depth compositing hardware keeps eventually the color of that point in each pixel which is closest to the camera. Objects are usually defined in other coordinate systems, therefore they should be transformed to screen coordinates, which again can be done completely independently for each object.
8
Why is it fast? Proc 1 Proc 2 Pipelining Proc 21 Proc 1 Proc 22
This kind of independent processing is very good for parallelization. Pipelining means that while an object is rasterized, for example, the next object is transformed, keeping all stages busy, and multiplying the processing speed by the number of processing stages. On the other hand, single stages may be replaced by parallel branches, for example, rasterizing several objects at the same time, again multiplying the speed with the number of parallel branches. Parallelism Elements are processed INDEPENDENTLY No internal storage Parallel execution without synchronization
9
GPU hardware achitecture
Interface vertices Vertex Shader Fragment Transform+ Illumination Clipping + Hom.division + Viewport transform triangles Projection + Rasterization + Linear interpolation fragments The architecture of the non-programmable graphics hardware reflects the tasks of the classical incremental rendering pipeline, and exploits parallel processing enormously. The vertices of the meshes are transformed and their illumination is computated. Objects represented by homogeneous coordinates are clipped, then homogeneous coordinates are converted to Cartesian coordinates, and scaled according to the current viewport resolution. In screen space the pixels or fragments inside the 2D projections of the triangles are filled. When a single fragment is processed, the fragment properties are derived from the vertex properties of the triangle using linear interpolation. These properties include the depth, the color, and the texture coordinates. Taking the interpolated texture coordinates, the texture memory is looked up and the stored color determines the fragment color. Finally, the fragment color is composited with the pixel in the frame buffer, taking into account the depth and the transparency. Programmable GPUs allow the modification of this pipeline at two stages, the operation of the transform and illumination stage can be defined by a vertex shader program, while the operation of the texturing stage is defined by a fragment shader program. All other stages and the interfaces remain fixed, so the vertex shader is still responsible for determining the homogeneous clipping space position and color and texture coordinates of each vertex. The fragment shader, on the other hand, is responsible for the calculation of the color of this fragment. Texturing Texture memory Compositing (Z-buffer, transparency)
10
Vertex shader glBegin(GL_TRIANGLES) glVertex glNormal glColor
glTextCoord CPU glEnd( ) GPU POSITION, NORMAL, COLOR0, TEXTCOORD0,… State Transforms Lightsources Materials Vertex shader Illumination POSITION, COLOR0, TEXTCOORD0, … for triangle vertices Let us see the details, which must be understood to take advantage of the programmability of the pipeline. When we start a rendering pass using the graphics API, calls put their parameters into the respective input registers of the vertex shader processor. For example, assuming OpenGL API, glNormal causes the loading the NORMAL register, while glColor eventually loads the COLOR0 register. glVertex uploads the POSITION register, and triggers the operation of the vertex shader, which is responsible for calculating the content of the output registers, including the POSITION, COLOR, and output TEXTURE COORDINATE registers. Assuming standard OpenGL operation, the output POSITION is computed multiplying the input POSITION with the model-view-projection matrix. Texture coordinates are simply copied, the color is either copied or calculated according to the Phong-Blinn reflection formula, depending on whether or not the GL_LIGHTING switch is enabled. Clipping waits for three vertices defining a triangle, and keeps only that part which is inside a origin centered two-sided, axis aligned cube. In homogeneous coordinates, the X,Y,Z,W coordinates of the preserved points satisfy the following inequalities. Converting homogeneous coordinates to Cartesian ones is done by dividing the first three homogeneous coordinates by the fourth one, then the triangle is scaled according to the viewport resolution. Clipping: -w<X<w, -w<Y<w, -w<Z<w, 0<color<1 Homogeneous division: x=X/w, y=Y/w, z=Z/w Viewport transform: xv = center.x + viewsize.x * x / 2 POSITION, COLOR0, TEXTCOORD0,… for triangle vertices
11
Fragment shader POSITION, COLOR0, TEXTCOORD0,… for triangle vertices
Projection, Rasterization and linear interpolation POSITION, COLOR0, TEXTCOORD0 for fragments Z-cull State Texture id, texturing environment Fragment shader Texturing: tex2d(u,v)*color0 POSITION, COLOR The triangle is rasterized, which means that the hardware visits all those pixels, also called fragments, that are inside the projection of the triangle. For every pixel, the properties, including the position, color, and the texture coordinates are interpolated to obtain the set of properties for every fragment. The final fragment color is usually computed with texturing. For example, setting modulative texturing, the texture memory is looked up using the interpolated texture coordinates and the texture color is multiplied with the color interpolated from the vertex colors. The color and the interpolated depth value goes through compositing, where the z-buffer decides whether this fragment is visible, and the alpha blending hardware computes a weighted sum of the new and the old colors. Finally, the result of visible fragments is written to the frame buffer. Texture memory Compositing: blending, z-buffering Frame buffer
12
GPU stream programming
Vertices + properties: Input stream of vertices x 4 floats CPU Vertex Shader Change of vertex properties Clipping Conditional reduction Triangle setup + rasterization+ Linear interpolation We can conclude that the architecture of current GPUs still reflects the pure local illumination rendering pipeline, with two programmable stages. At these steps, the vertex and the fragment shaders have access to the input registers and to the texture memory, and vertices and fragments are processed completely independently. Recall that in global illumination rendering, however, points cannot be processed independently since the illumination of a point depends on the geometry and the illumination of all other points, that is the complete description of the scene. This scene description does not fit into the input registers of the vertex and fragment shaders due to obvious size limitations. The number of input registers is in the order of ten, each holding 4 float variables, which is far less than required by the representation of a non-trivial scene. So the only practical option is to put all geometric and indirect illumination information needed to shade a point into the texture memory. To fully exploit the power of the GPU, the indirect illumination store in the texture memory should also be computed by the GPU and not uploaded by the CPU. Vertex and pixel shaders are not allowed to write the texture memory, but the final image can be directed to the texture memory using the render-to-texture feature. This feature requires the decomposition of the rendering process into several passes since the result of a pass is available just in the following passes. Texture memory Pixel Shader Color computation from properties and textures Compositing Framebuffer
13
Keys to GPUGI Multipass rendering Render-to-texture
Results of a previous pass can only be used Render-to-texture Results of one pass can be used by another pass Floating point textures HDR radiance and geometric data in texture memory Storing global geometric and illumination information in textures: how? So multipass rendering and the render-to-texture feature are the keys to global illumination computations on the GPU since these two techniques allow some dependence of later computations on the results of earlier ones. These features are loopholes in the independent processing concept of the GPU, which are required to implement global illumination calculations. Additionally, since geometry and radiance values require higher dynamic range and precision than provided by the classical 8 bit per color channel format, the possibility of storing texture data in floating point format is also essential. So to summarize, GI cries for dependent computations while GPU offers independent computations. To implement dependencies, we can store geometric and illumination information in textures and take advantage of the multipass rendering, render-to-texture and floating point texture features of current GPUs.
14
Texture atlas = unfolding
z y x v In local illumination computations those points are needed that are visible from a pin-hole camera. To select these points, the modelviewprojection transformation transforms the points being inside the view frustrum into a origin centered cube in screen coordinates, where clipping takes place. In global illumination rendering, however, we usually cannot restrict the shading for the visible points, since points not visible from the camera may also produce indirect illumination effects. So instead of the classical pin-hole camera, we should apply a transformation that guarantees that the final pixels represent all surfaces points with sufficient sampling accuracy. One option for this transformation is well known in computer graphics: this is the parameterization used in texture mapping. So if some property, such as radiance, position, normal vector, etc. is needed for all surface points, then we have to find a texturing parameterization and store the property in this texture. If this property needs to be processed, that is, we have to guarantee that each sample point is visited once in the fragment shader, then the appropriate camera transformation should map the unit square of the texture space onto viewport. This requires the rendering of a single full screen quadrilateral. So rendering full screen quadrilaterals is a common operation in global illumination computations, when we can process all elements of an array stored in the texture memory. u
15
How to store geometry in textures?
It is up to you! The texture memory is a RAM addressed by u,v pairs. E.g. nï‚´ m NURBS surface defined by control points Cij = T[i/n][j/m].xyz, wij = T[i/n][j/m].w Depth image, Z-buffered environment maps (distance impostors). Geometry image. Normal texture mapping is a straightforward approach to store reflected radiance for a single direction per sample point. The representation of geometry in textures, however, is a little more tricky. Before talking about techniques that have proven to be successful, I have to emphasize that there are very many different alternatives. The texture memory is in fact an array or RAM, that is addressed by two scalars being in the unit interval. For example, if we wish to store a NURBS surface of nxm control points, then an nxm or larger resolution texture is defined, and at texture coordinates i/n and j/m the position of this point is stored in the r,g,b, channels, and the weight is stored in the alpha channel. So it is basically up to you how the texture memory is organized. However, there are popular choices as well.
16
Z-buffered env.maps (Patow95), distance impostor
radiance If we are interested in the geometry of the scene that is visible from a point, we can use the depth image computed also for depth mapped shadows as a sampled geometric representation of the scene. If the viewing direction is not restricted, several depths maps are used, that are conveniently organized in a z-buffered environment map. It is often more convenient to store the distance between the reference point and the visible point and not its depth coordinate. In this case we can call the environment map as distance impostor. Interestingly the idea of replacing the geometry by a distance impostor was proposed by Gustavo Patow long before the emergence of programmable GPUs.
17
Geometry image (Gu02) Texture maps storing the 3D points and normal vectors The other popular choice to encode geometry in texture is the so called geometry image. The idea is very simple. Having a parameterization, each texel’s r,g,b channels store the x,y,z coordinates of the position of the corresponding point in local modeling space. The normal vector of the surface can be stored in another texture. After this we can forget about the original geometry and topology and work only with these images. [r,g,b] = [x,y,z]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.