GLSL Applications: 1 of 2 Joseph Kider Source: Patrick Cozzi – Spring 2011 University of Pennsylvania CIS Fall 2011
Agenda GLSL Applications Per-Fragment Lighting Image Processing Finish last week’s slides OpenGL Drawing OpenGL Multithreading
Per-Fragment Lighting Slide from
Per-Fragment Lighting Images from Per-Vertex LightingPer-Fragment Lighting
Per-Fragment Lighting: Diffuse Slide from
Per-Fragment Lighting: Diffuse Slide from
Per-Fragment Lighting: Diffuse uniform vec3 u_Color; in vec3 fs_Incident; in vec3 fs_Normal; in vec3 fs_Texcoord; out vec3 out_Color; void main(void) { vec3 incident = normalize(fs_Incident); vec3 normal = normalize(fs_Normal); float diffuse = max(0.0f, dot(-incident, normal)); out_Color = vec4(diffuse * u_Color, 1.0f); }
Per-Fragment Lighting: Diffuse uniform vec3 u_Color; in vec3 fs_Incident; in vec3 fs_Normal; in vec3 fs_Texcoord; out vec3 out_Color; void main(void) { vec3 incident = normalize(fs_Incident); vec3 normal = normalize(fs_Normal); float diffuse = max(0.0f, dot(-incident, normal)); out_Color = vec4(diffuse * u_Color, 1.0f); } in vectors are not normalized. Why? Good practice: don’t write to in variables
Per-Fragment Lighting: Diffuse uniform vec3 u_Color; in vec3 fs_Incident; in vec3 fs_Normal; in vec3 fs_Texcoord; out vec3 out_Color; void main(void) { vec3 incident = normalize(fs_Incident); vec3 normal = normalize(fs_Normal); float diffuse = max(0.0f, dot(-incident, normal)); out_Color = vec4(diffuse * u_Color, 1.0f); } Know the graph: Why max ? Graph from
Per-Fragment Lighting: Specular Slide from
Per-Fragment Lighting: Specular Slide from
Per-Fragment Lighting: Specular uniform vec3 u_Color; uniform vec3 u_SpecColor; uniform float u_SpecHardness; in vec3 fs_Incident; in vec3 fs_Viewer; in vec3 fs_Normal; in vec3 fs_Texcoord; out vec3 out_Color; void main(void) { vec3 incident = normalize(fs_Incident); vec3 normal = normalize(fs_Normal); vec3 H = normalize(-incident + fs_Viewer); float specular = pow(max(0.0f, dot(H, normal)), u_SpecHardness); float diffuse = max(0.0f, dot(-incident, normal)); out_Color = vec4(diffuse*u_Color + specular*u_SpecColor,1.0f); }
Per-Fragment Lighting: Specular uniform vec3 u_Color; uniform vec3 u_SpecColor; uniform float u_SpecHardness; in vec3 fs_Incident; in vec3 fs_Viewer; in vec3 fs_Normal; in vec3 fs_Texcoord; out vec3 out_Color; void main(void) { vec3 incident = normalize(fs_Incident); vec3 normal = normalize(fs_Normal); vec3 H = normalize(-incident + fs_Viewer); float specular = pow(max(0.0f, dot(H, normal)), u_SpecHardness); float diffuse = max(0.0f, dot(-incident, normal)); out_Color = vec4(diffuse*u_Color + specular*u_SpecColor,1.0f); } “half” vector
Per-Fragment Lighting: Specular uniform vec3 u_Color; uniform vec3 u_SpecColor; uniform float u_SpecHardness; in vec3 fs_Incident; in vec3 fs_Viewer; in vec3 fs_Normal; in vec3 fs_Texcoord; out vec3 out_Color; void main(void) { vec3 incident = normalize(fs_Incident); vec3 normal = normalize(fs_Normal); vec3 H = normalize(-incident + fs_Viewer); float specular = pow(max(0.0f, dot(H, normal)), u_SpecHardness); float diffuse = max(0.0f, dot(-incident, normal)); out_Color = vec4(diffuse*u_Color + specular*u_SpecColor,1.0f); } Blinn-Phong shading
Per-Fragment Lighting: Specular Slide from
Image Processing Our first look at GPGPU General-Purpose computation on Graphics Processing Units Input: Image Output: Processed image A kernel runs on each pixel
Image Processing Examples Images from Image Negative
Image Processing Examples Images from Edge Detection
Image Processing Examples Images from Toon Rendering
Image Processing Questions Is the GPU a good fit for image processing? Is image processing data-parallel? What about bus traffic? What type of shader should implement an image processing kernel?
Image Processing: GPU Setup Input: Texture Viewport-aligned quad, a.k.a full-screen quad Output: framebuffer …for now Kernel: fragment shader
Image Processing: GPU Setup 1. Render viewport-aligned quad Fragment shader 2. A fragment is invoked for each screen pixel 3. Each fragment shader can access any part of the image stored as a texel: gather 4. Each fragment shader executes the kernel and writes the color to the framebuffer Images from
Image Processing: GPU Setup 1. Render viewport-aligned quad Fragment shader 2. A fragment is invoked for each screen pixel 3. Each fragment shader can access any part of the image stored as a texel: gather 4. Each fragment shader executes the kernel and writes the color to the framebuffer Images from
Image Processing: GPU Setup 1. Render viewport-aligned quad Fragment shader 2. A fragment is invoked for each screen pixel 3. Each fragment shader can access any part of the image stored as a texel: gather 4. Each fragment shader executes the kernel and writes the color to the framebuffer Images from
Image Processing: GPU Setup 1. Render viewport-aligned quad Fragment shader 2. A fragment is invoked for each screen pixel 3. Each fragment shader can access any part of the image stored as a texel: gather 4. Each fragment shader executes the kernel and writes the color to the framebuffer Images from
Image Processing: GPU Setup 1. Render viewport-aligned quad Fragment shader 2. A fragment is invoked for each screen pixel 3. Each fragment shader can access any part of the image stored as a texel: gather 4. Each fragment shader executes the kernel and writes the color to the framebuffer Images from
Image Processing: GPU Setup How do we model the viewport-aligned quad? Two triangles? One big triangle? Screen
Image Processing: GPU Setup Which has more vertex shader overhead? Does it matter? Which is simpler to implement? Which has less fragment shader overhead?
Image Processing: GPU Setup Triangle edges are redundantly shaded Fragments are processed in 2x2 blocks Why?
Image Processing: GPU Setup Triangle edges are redundantly shaded Image and Chart from Number of vertices FPS FanStripMax area
Image Processing: GPU Setup The viewport has the same width and height as the image to be processed The texture also has the same dimensions How does the fragment shader access texels?
Image Processing: GPU Setup Store texture coordinates per vertex in vec3 Position; in vec2 Texcoords; out vec2 fs_Texcoords; void main(void) { fs_Texcoords = Texcoords; gl_Position = vec4(Position, 1.0); } Vertex Shader uniform sampler2D u_Image; in vec2 fs_Texcoords; out vec4 out_Color; void main(void) { out_Color = texture(u_Image, fs_Texcoords); } Fragment Shader
Image Processing: GPU Setup Store texture coordinates per vertex What memory costs does this incur? Does it matter? What bandwidth costs does this incur? What non-obvious optimization does it allow?
Image Processing: GPU Setup Compute texture coordinate in fragment shader in vec3 Position; void main(void) { gl_Position = vec4(Position, 1.0); } Vertex Shader uniform sampler2D u_Image; uniform vec2 u_inverseViewportDimensions; out vec4 out_Color; void main(void) { vec2 txCoord = u_inverseViewportDimensions * gl_FragCoord.xy; out_Color = texture(u_Image, txCoord); } Fragment Shader
Image Processing: GPU Setup Compute texture coordinate in fragment shader in vec3 Position; void main(void) { gl_Position = vec4(Position, 1.0); } Vertex Shader uniform sampler2D u_Image; uniform vec2 u_inverseViewportDimensions; out vec4 out_Color; void main(void) { vec2 txCoord = u_inverseViewportDimensions * gl_FragCoord.xy; out_Color = texture(u_Image, txCoord); } Fragment Shader What is u_inverseViewportDimensions ?
Image Processing: GPU Setup How do you access adjacent texels? uniform sampler2D u_Image; in vec2 fs_Texcoords; out vec4 out_Color; void main(void) { vec4 c0 = texture(u_Image, fs_Texcoords); vec4 c1 = textureOffset(u_Image, fs_Texcoords, ivec2(-1, 0)); vec4 c2 = textureOffset(u_Image, fs_Texcoords, ivec2( 1, 0)); vec4 c3 = textureOffset(u_Image, fs_Texcoords, ivec2( 0, -1)); vec4 c4 = textureOffset(u_Image, fs_Texcoords, ivec2( 0, 1)); out_Color = (c0 + c1 + c2 + c3 + c4) * 0.2; }
Image Processing: GPU Setup How do you access adjacent texels? uniform sampler2D u_Image; in vec2 fs_Texcoords; out vec4 out_Color; void main(void) { vec4 c0 = texture(u_Image, fs_Texcoords); vec4 c1 = textureOffset(u_Image, fs_Texcoords, ivec2(-1, 0)); vec4 c2 = textureOffset(u_Image, fs_Texcoords, ivec2( 1, 0)); vec4 c3 = textureOffset(u_Image, fs_Texcoords, ivec2( 0, -1)); vec4 c4 = textureOffset(u_Image, fs_Texcoords, ivec2( 0, 1)); out_Color = (c0 + c1 + c2 + c3 + c4) * 0.2; } (-1, 0)(1, 0) (0, -1) (0, 1)
Image Processing: GPU Setup textureOffset requires constants E.g. ivec2(x, y) is not allowed How else do you access adjacent texels?
Image Processing: GPU Setup How else do you access adjacent texels? uniform sampler2D u_Image; uniform vec2 u_inverseViewportDimensions; out vec4 out_Color; void main(void) { vec2 txCoord = u_inverseViewportDimensions * gl_FragCoord.xy; vec2 delta = 1.0 / textureSize(u_Image); vec4 c0 = texture(u_Image, txCoord); vec4 c1 = texture(u_Image, txCoord + (delta * vec2(-1.0, 0.0))); vec4 c2 = texture(u_Image, txCoord + (delta * vec2( 1.0, 0.0))); vec4 c3 = texture(u_Image, txCoord + (delta * vec2( 0.0, -1.0))); vec4 c4 = texture(u_Image, txCoord + (delta * vec2( 0.0, 1.0))); out_Color = (c0 + c1 + c2 + c3 + c4) * 0.2; }
Image Processing: GPU Setup How else do you access adjacent texels? uniform sampler2D u_Image; uniform vec2 u_inverseViewportDimensions; out vec4 out_Color; void main(void) { vec2 txCoord = u_inverseViewportDimensions * gl_FragCoord.xy; vec2 delta = 1.0 / textureSize(u_Image); vec4 c0 = texture(u_Image, txCoord); vec4 c1 = texture(u_Image, txCoord + (delta * vec2(-1.0, 0.0))); vec4 c2 = texture(u_Image, txCoord + (delta * vec2( 1.0, 0.0))); vec4 c3 = texture(u_Image, txCoord + (delta * vec2( 0.0, -1.0))); vec4 c4 = texture(u_Image, txCoord + (delta * vec2( 0.0, 1.0))); out_Color = (c0 + c1 + c2 + c3 + c4) * 0.2; }
Image Processing: GPU Setup How else do you access adjacent texels? uniform sampler2D u_Image; uniform vec2 u_inverseViewportDimensions; out vec4 out_Color; void main(void) { vec2 txCoord = u_inverseViewportDimensions * gl_FragCoord.xy; vec2 delta = 1.0 / textureSize(u_Image); vec4 c0 = texture(u_Image, txCoord); vec4 c1 = texture(u_Image, txCoord + (delta * vec2(-1.0, 0.0))); vec4 c2 = texture(u_Image, txCoord + (delta * vec2( 1.0, 0.0))); vec4 c3 = texture(u_Image, txCoord + (delta * vec2( 0.0, -1.0))); vec4 c4 = texture(u_Image, txCoord + (delta * vec2( 0.0, 1.0))); out_Color = (c0 + c1 + c2 + c3 + c4) * 0.2; } (-1, 0)(1, 0) (0, -1) (0, 1)
Image Processing: Kernel Examples Image Negative uniform sampler2D u_Image; in vec2 fs_Texcoords; out vec4 out_Color; void main(void) { out_Color = vec4(1.0) - texture(u_Image, fs_Texcoords); } Images from
Image Processing: Kernel Examples Image from 2D Gaussian Gaussian Blur
Filter for 3x3 Gaussian Blur: [1 2 1] 1/16 * [2 4 2] [1 2 1] The elements add to one Other filters are also used for Edge detection Sharpen Emboss … Image Processing: Kernel Examples
Gaussian Blur How would you implement the fragment shader? How is the memory coherence? 3x3, 5x5, etc.
Image Processing: Kernel Examples Image from
Image Processing: Kernel Examples What does this filter do? [1 1 1] 1/9 * [1 1 1] [1 1 1]
Image Processing: Read backs How do we get the contents of the framebuffer into system memory? Print Screen? It doesn’t matter if we are using: Efficient read backs are important
Image Processing: Read backs glReadPixels glUseProgram(/*... */); glDraw*(/*... */); unsigned char rgb = new unsigned char[width * height * 3]; glReadPixels(0, 0, width, height, GL_RGB, GL_UNSIGNED_BYTE, rgb); //... delete [] rgb;
Image Processing: Read backs glReadPixels glUseProgram(/*... */); glDraw*(/*... */); unsigned char rgb = new unsigned char[width * height * 3]; glReadPixels(0, 0, width, height, GL_RGB, GL_UNSIGNED_BYTE, rgb); //... delete [] rgb; Use GPU for image processing
Image Processing: Read backs glReadPixels glUseProgram(/*... */); glDraw*(/*... */); unsigned char rgb = new unsigned char[width * height * 3]; glReadPixels(0, 0, width, height, GL_RGB, GL_UNSIGNED_BYTE, rgb); //... delete [] rgb; Allocate buffer for processed image
Image Processing: Read backs glReadPixels glUseProgram(/*... */); glDraw*(/*... */); unsigned char rgb = new unsigned char[width * height * 3]; glReadPixels(0, 0, width, height, GL_RGB, GL_UNSIGNED_BYTE, rgb); //... delete [] rgb; Ask for framebuffer’s color buffer
Image Processing: Read backs glReadPixels glUseProgram(/*... */); glDraw*(/*... */); unsigned char rgb = new unsigned char[width * height * 3]; glReadPixels(0, 0, width, height, GL_RGB, GL_UNSIGNED_BYTE, rgb); //... delete [] rgb; You guys are sharp
Image Processing: Read backs What is the major problem with glReadPixels ?
Image Processing: Use Cases Photoshop-type applications Post-processing in games On the fly video manipulation Augmented reality These last three don’t even need read backs