Status – Week 242 Victor Moya. Summary Current status. Current status. Tests. Tests. XBox documentation. XBox documentation. Post Vertex Shader geometry.

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS SOFTWARE.
Advertisements

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Graphics Pipeline.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
CS 4363/6353 BASIC RENDERING. THE GRAPHICS PIPELINE OVERVIEW Vertex Processing Coordinate transformations Compute color for each vertex Clipping and Primitive.
Computer Graphic Creator: Mohsen Asghari Session 2 Fall 2014.
9/25/2001CS 638, Fall 2001 Today Shadow Volume Algorithms Vertex and Pixel Shaders.
Introduction to Geometry Shaders Patrick Cozzi Analytical Graphics, Inc.
1 Shader Performance Analysis on a Modern GPU Architecture Victor Moya, Carlos González, Jordi Roca, Agustín Fernández Jordi Roca, Agustín Fernández Department.
Status – Week 250 Victor Moya. Summary Current State. Current State. Next Tasks. Next Tasks. Future Work. Future Work. Creditos investigación. Creditos.
Status – Week 249 Victor Moya. Summary MemoryController. MemoryController. Streamer. Streamer. TraceDriver. TraceDriver. Statistics. Statistics.
A Crash Course on Programmable Graphics Hardware Li-Yi Wei 2005 at Tsinghua University, Beijing.
Status – Week 247 Victor Moya. Summary Streamer. Streamer. TraceDriver. TraceDriver. bGPU bGPU Signal Traffic Analyzer. Signal Traffic Analyzer.
Status – Week 243 Victor Moya. Summary Current status. Current status. Tests. Tests. XBox documentation. XBox documentation. Post Vertex Shader geometry.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Status – Week 231 Victor Moya. Summary Primitive Assembly Primitive Assembly Clipping triangle rejection. Clipping triangle rejection. Rasterization.
Status – Week 277 Victor Moya.
GPU Simulator Victor Moya. Summary Rendering pipeline for 3D graphics. Rendering pipeline for 3D graphics. Graphic Processors. Graphic Processors. GPU.
1 A Single (Unified) Shader GPU Microarchitecture for Embedded Systems Victor Moya, Carlos González, Jordi Roca, Agustín Fernández Department of Computer.
Status – Week 248 Victor Moya. Summary Streamer. Streamer. TraceDriver. TraceDriver. bGPU bGPU Signal Traffic Analyzer. Signal Traffic Analyzer. How to.
Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer.
Status – Week 272 Victor Moya. Vertex Shader VS 2.0+ (NV30) based Vertex Shader model. VS 2.0+ (NV30) based Vertex Shader model. Multithreaded?? Implemented.
Status – Week 276 Victor Moya. Hardware Pipeline Command Processor. Command Processor. Vertex Shader. Vertex Shader. Rasterization. Rasterization. Pixel.
Status – Week 279 Victor Moya. Rasterization Setup triangles (calculate slope values). Setup triangles (calculate slope values). Fill triangle: Interpolate.
Status – Week 240 Victor Moya. Summary Post Geometry Pipeline. Post Geometry Pipeline. Rasterization. Rasterization. Triangle Setup. Triangle Setup. Triangle.
Status – Week 283 Victor Moya. 3D Graphics Pipeline Akeley & Hanrahan course. Akeley & Hanrahan course. Fixed vs Programmable. Fixed vs Programmable.
Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future.
Status – Week 239 Victor Moya. Summary Primitive Assembly Primitive Assembly Clipping triangle rejection. Clipping triangle rejection. Rasterization.
1 Angel: Interactive Computer Graphics 4E © Addison-Wesley 2005 Models and Architectures Ed Angel Professor of Computer Science, Electrical and Computer.
Status – Week 275 Victor Moya. Simulator model Boxes. Boxes. Perform the actual work. Perform the actual work. Parameters: wires in, wires out, child.
Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
Status – Week 245 Victor Moya. Summary Streamer Streamer Creditos investigación. Creditos investigación.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
Programmable Pipelines. Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
Graphics Graphics Korea University cgvr.korea.ac.kr 1 Using Vertex Shader in DirectX 8.1 강 신 진
Programmable Pipelines. 2 Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
CSC 461: Lecture 3 1 CSC461 Lecture 3: Models and Architectures  Objectives –Learn the basic design of a graphics system –Introduce pipeline architecture.
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.
1Computer Graphics Lecture 4 - Models and Architectures John Shearer Culture Lab – space 2
Programmable Pipelines Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts Director, Arts Technology Center University.
Xbox MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk.
A User-Programmable Vertex Engine Erik Lindholm Mark Kilgard Henry Moreton NVIDIA Corporation Presented by Han-Wei Shen.
Review on Graphics Basics. Outline Polygon rendering pipeline Affine transformations Projective transformations Lighting and shading From vertices to.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
09/25/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Shadows Stage 2 outline.
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
Mesh Skinning Sébastien Dominé. Agenda Introduction to Mesh Skinning 2 matrix skinning 4 matrix skinning with lighting Complex skinning for character.
1 E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley 2012 Models and Architectures 靜宜大學 資訊工程系 蔡奇偉 副教授 2012.
GLSL Review Monday, Nov OpenGL pipeline Command Stream Vertex Processing Geometry processing Rasterization Fragment processing Fragment Ops/Blending.
Programmable Pipelines
A Crash Course on Programmable Graphics Hardware
Graphics on GPU © David Kirk/NVIDIA and Wen-mei W. Hwu,
Graphics Processing Unit
Chapter 6 GPU, Shaders, and Shading Languages
The Graphics Rendering Pipeline
Models and Architectures
Models and Architectures
Introduction to Computer Graphics with WebGL
Day 05 Shader Basics.
Introduction to Programmable Hardware
Models and Architectures
Where does the Vertex Engine fit?
Models and Architectures
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Status – Week 242 Victor Moya

Summary Current status. Current status. Tests. Tests. XBox documentation. XBox documentation. Post Vertex Shader geometry. Post Vertex Shader geometry. Rasterization. Rasterization.

Current Status Basic Command Processor. Basic Command Processor. Read/Write GPU registers. Read/Write GPU registers. Read/Write GPU memory. Read/Write GPU memory. GPU commands. GPU commands. No DMA/AGP data access. No DMA/AGP data access. Basic Memory Controller. Basic Memory Controller. 1 transaction per cycle served. 1 transaction per cycle served. Memory module access latency accounted. Memory module access latency accounted. Transmission latency accounted. Transmission latency accounted. 3 buses (req/write + data): CP, StreamerFetch, StreamerLoader. 3 buses (req/write + data): CP, StreamerFetch, StreamerLoader.

Current Status Shader (Vertex Shader). Shader (Vertex Shader). Multithreaded. Multithreaded. F/D/E/W pipeline. F/D/E/W pipeline. Variable execution latency. Variable execution latency. Dependency checking is full register right now, should be component based. Dependency checking is full register right now, should be component based. Problems with ‘ending’ instruction (requires something to fetch after it and takes many cycles). Problems with ‘ending’ instruction (requires something to fetch after it and takes many cycles). No branches (support code but instructions not implemented). No branches (support code but instructions not implemented). No texture access (memory). No texture access (memory).

Current Status Streamer. Streamer. Pipelined: Pipelined: Hit: Fetch/OCache/Insert/Commit Hit: Fetch/OCache/Insert/Commit Miss: Fetch/OCache/IRQInsert/IRQRead/AttrLoad/Sh/Store/Co mmit. Miss: Fetch/OCache/IRQInsert/IRQRead/AttrLoad/Sh/Store/Co mmit. Stream and index based modes implemented. Stream and index based modes implemented. No pre T&L cache (should be added to Streamer Loader?). No pre T&L cache (should be added to Streamer Loader?). Supports out of order vertexes (shader or memory). Supports out of order vertexes (shader or memory). Doesn’t support data from the AGP. Doesn’t support data from the AGP.

Current Status Streamer: Streamer: Streamer Loader pipeline should be (in hardware): Streamer Loader pipeline should be (in hardware): Insert in the IRQ. Insert in the IRQ. Load from IRQ. Load from IRQ. Setup Input: start address + address increment for each active attribute. Setup Input: start address + address increment for each active attribute. Attribute Load: request attribute to MC, increment address generators. Attribute Load: request attribute to MC, increment address generators. Issue to Shader. Issue to Shader. IRQ should be implemented with a pre T&L cache. IRQ should be implemented with a pre T&L cache.

Current Status Comments: Comments: Currently the signal latency/bandwidth is specified with raw numbers. Alternatives: Currently the signal latency/bandwidth is specified with raw numbers. Alternatives: Use constants. Store in a single ‘signal definition’ file for all units or in separate units (must be shared between the two boxes connected by the signal). Use constants. Store in a single ‘signal definition’ file for all units or in separate units (must be shared between the two boxes connected by the signal). Use some kind of Architecture Description for signal delays, bandwidth, data bus width (to be used in memory transmission calculations and similar). Use some kind of Architecture Description for signal delays, bandwidth, data bus width (to be used in memory transmission calculations and similar). Currently most units only support single issue/fetch/process. Should be ‘generalized’ to multiissue/fetch/process and parametrized. Currently most units only support single issue/fetch/process. Should be ‘generalized’ to multiissue/fetch/process and parametrized.

Current Status Signal Trace Analyzer -> Carlos. Signal Trace Analyzer -> Carlos.

Tests OpenGL test trace: OpenGL test trace: Used glutSolidSphere with (1, 100, 100) as parameter: Used glutSolidSphere with (1, 100, 100) as parameter: 100 batches. 100 batches. –2 triangle strips (200 triangles). –98 quad strips (9800 quads) vertexs vertexs. Added a lightning shader replacing the normal model view + project matrix transformation: one green light in the infinity with diffuse and specular component. Added a lightning shader replacing the normal model view + project matrix transformation: one green light in the infinity with diffuse and specular component. 10 shader instructions. 10 shader instructions.

Tests Light shader: Light shader:// // i0 Vertex Position // i2 Vertex Normal // // c0 - c3 Model View-Project Matrix. // c4 Light Direction // c5 Light Half Vector // c6.x Material shininess // c7 Light ambient color // c8 Light diffuse color // c9 Light specular color // // o0 Vertex position (transformed) // o1 Vertex color. //

Tests // Vertex Model View-Project transformation dp4 o0.x, c0, i0 dp4 o0.y, c1, i0 dp4 o0.z, c2, i0 dp4 o0.w, c3, i0 // Compute diffuse and specular dot products and // use LIT to compute lightning coefficients dp3 r0.x, i2, c4 dp3 r0.y, i2, c5 mov r0.w, c6.x lit r0, r0

Tests // Accumulate color contributions mad r1, r0.y, c8, c7 mad o1, r0.z, c9, r1 // Finish shader. end

Tests Results: Results: Simulated cycles: ~350K. Simulated cycles: ~350K. Simulation time: ~30s. Simulation time: ~30s. Signal trace size: ~150MB. Signal trace size: ~150MB.

Tests

Tests Bugs: Bugs: TraceReader::parseFP() failed to correctly read a negative number with a 0 before the decimal point. TraceReader::parseFP() failed to correctly read a negative number with a 0 before the decimal point. GPU_CLAMP was using ‘ ’ when it should be using ‘ =‘. GPU_CLAMP was using ‘ ’ when it should be using ‘ =‘. ShaderDecodeExecute was allowing the execution of the instruction in the same thread after a blocked instruction (data dependency). ShaderDecodeExecute was allowing the execution of the instruction in the same thread after a blocked instruction (data dependency).

Tests Changes: Changes: Now ShaderDecodeExecute ignores any instruction received after an end instruction. Now ShaderDecodeExecute ignores any instruction received after an end instruction. Added QUAD and QUADSTRIP support to the simulator (GPU.h, Rasterizer, Drawer). Added QUAD and QUADSTRIP support to the simulator (GPU.h, Rasterizer, Drawer). Vertex color is clamped to 0.0 – 1.0 before being send to OpenGL (Drawer). The correct behaviour should be that color attributes should be clampled when they exit the shader. Vertex color is clamped to 0.0 – 1.0 before being send to OpenGL (Drawer). The correct behaviour should be that color attributes should be clampled when they exit the shader. Added glNormal3f and glFrustum OpenGL functions to the TraceReader and OGLtoAGPTransaction. Added glNormal3f and glFrustum OpenGL functions to the TraceReader and OGLtoAGPTransaction.

Tests Changes: Changes: OGLtoAGPTransaction now supports a third vertex attribute: normal. OGLtoAGPTransaction now supports a third vertex attribute: normal. OGLtoAGPTransaction now supports a ‘special’ shader mode (the one used for the light test). No support for OpenGL lightning is implemented. OGLtoAGPTransaction now supports a ‘special’ shader mode (the one used for the light test). No support for OpenGL lightning is implemented.

Tests Further tests: Further tests: Try to implement a sphere using Icosahedron subdivision to create a triangle strip mesh to test the index stream mode. Try to implement a sphere using Icosahedron subdivision to create a triangle strip mesh to test the index stream mode.

XBox Documentation Interesting information about the Vertex Shader architecture and the T&L pipeline down to the Primitive Assembly Cache and the Triangle Setup. Interesting information about the Vertex Shader architecture and the T&L pipeline down to the Primitive Assembly Cache and the Triangle Setup. Includes estimated sizes and clock latencies for most of the operations. Includes estimated sizes and clock latencies for most of the operations.

Memory Pre T&L Cache Vertex Shader Post T&L Cache Primitive Assembly Triangle Setup cache line (raw vertex data) raw vertex transformed and lit vertex 3 transformed and lit vertices Rasterization 4 KB 4-way set associative B cache lines 16 – 24 entry FIFO 200 MHz 3 vertices

XBOX Differences: Differences: No Pre T&L cache. No Pre T&L cache. The Post T&L cache seems to be accessed by the Primitive Assembly Cache. However we push the vertex to the Rasterizer (or whatever lays after the shader). The Post T&L cache seems to be accessed by the Primitive Assembly Cache. However we push the vertex to the Rasterizer (or whatever lays after the shader). Sending the shaded vertex to the primitive assembly takes multiple cycles (2+) depending on the number of attributes used by the vertex. Sending the shaded vertex to the primitive assembly takes multiple cycles (2+) depending on the number of attributes used by the vertex.

XBOX Vertex Shader Registers: Registers: 16 input registers. 16 input registers. 12 temporary registers. 12 temporary registers. 192 constant registers. 192 constant registers. 1 address register. 1 address register. 11 output registers. 11 output registers.

XBOX Vertex Shader Instructions: Instructions: Shader Operations: Shader Operations: 13 MAC opcodes. 13 MAC opcodes. 7 ILU (inverse logic unit) opcodes. 7 ILU (inverse logic unit) opcodes. 136 microcode instructions. Each instruction can: 136 microcode instructions. Each instruction can: Read three register with swizzle and negation. Read three register with swizzle and negation. Compute one MAC op and one ILU op. Compute one MAC op and one ILU op. Write up one output register and two temporary registers with masking. Write up one output register and two temporary registers with masking. Shader types: Shader types: Normal vertex shaders. Normal vertex shaders. Read/write vertex shaders. Read/write vertex shaders. Vertex state shaders. Vertex state shaders.

XBOX Vertex Shaders Timing: Timing: The cycle speed is 250 MHz The cycle speed is 250 MHz For normal shaders, instructions take between one-half cycle and one cycle to complete. For normal shaders, instructions take between one-half cycle and one cycle to complete. For read/write and vertex state shaders, instructions take between one cycle and six cycles to complete. For read/write and vertex state shaders, instructions take between one cycle and six cycles to complete.

XBOX Vertex Shaders Multithreaded: Multithreaded: Two copies of the vertex shader pipeline (2 VS). Two copies of the vertex shader pipeline (2 VS). Each copy can run up to three threads (3 active threads per shader). Each copy can run up to three threads (3 active threads per shader). Read/write vertex shaders and vertex state shaders run single threaded, on a single pipeline. Read/write vertex shaders and vertex state shaders run single threaded, on a single pipeline. Stalling: Stalling: Instructions take six cycles to compute their outputs. Instructions take six cycles to compute their outputs. Bypasses: ALU, ILU and MLU bypasses. Bypasses: ALU, ILU and MLU bypasses. Three cycles latency with bypasses. Three cycles latency with bypasses. Bypass allows swizzling and negate of the result. Bypass allows swizzling and negate of the result.

Post Vertex Shader (based in 3DLabs OpenGL2 overview). (based in 3DLabs OpenGL2 overview). Primitive assembly. Primitive assembly. User clipping. User clipping. Frustum clipping. Frustum clipping. Perspective projection. Perspective projection. Viewport Mapping. Viewport Mapping. Polygon offset. Polygon offset. Polygon mode. Polygon mode. Shade mode. Shade mode. Culling. Culling.

Post Vertex Shader Primitive Assembly: Primitive Assembly: Get the three vertexes of a triangle. Get the three vertexes of a triangle. Triangles: keep the last three vertexes, generate primitive with each new three vertexes. Triangles: keep the last three vertexes, generate primitive with each new three vertexes. Triangle strip: keep the last three vertexes, generate primitive with each new vertex (after the second) Triangle strip: keep the last three vertexes, generate primitive with each new vertex (after the second) Triangle fan: keep the first vertex and the last two vertex, generate primitive with each new vertex (after the second). Triangle fan: keep the first vertex and the last two vertex, generate primitive with each new vertex (after the second). Similar with other primitives. Similar with other primitives.

Post Vertex Shader User clipping: User clipping: At least 6 user clip planes. At least 6 user clip planes. Define a clip volume. Define a clip volume. glClipPlane(enum p, double eqn[4]). glClipPlane(enum p, double eqn[4]). (p1 p2 p3 p4) (x y z w) >= 0 (p1 p2 p3 p4) (x y z w) >= 0 Frustum clipping: Frustum clipping: View volume. View volume. -w <= x <= w -w <= x <= w -w <= y <= w -w <= y <= w -w <= z <= w -w <= z <= w

Post Vertex Shader Clipping: Clipping: Clip polygon => add new vertexes => tesselate. Clip polygon => add new vertexes => tesselate. Clip triangle => add new vertexes => retesselate. Clip triangle => add new vertexes => retesselate. Use rasterization in homogeneous coordinates: just add more clipping edges. Use rasterization in homogeneous coordinates: just add more clipping edges. Guard Band Clipping (scissor). Guard Band Clipping (scissor).

Post Vertex Shader Divide by w. Divide by w. Viewport transformation. Viewport transformation. Scale to screen/window coordinate system. Scale to screen/window coordinate system. glViewport(x, y, w, h) glViewport(x, y, w, h) glDepthRange(clampd n, clampd f) glDepthRange(clampd n, clampd f) xw = (px/2)*xd + ox xw = (px/2)*xd + ox yw = (py/2)*yd + oy yw = (py/2)*yd + oy zw = [(f-n)/2]*zd + (n + f)/2 zw = [(f-n)/2]*zd + (n + f)/2 ox = x + w/2 ox = x + w/2 oy = y + h/2 oy = y + h/2 px = w px = w py = h py = h

Post Vertex Shader Back face culling: Back face culling: Can be calculated using the area of the triangle (determinant three vertex in homogeneous coordinates). Can be calculated using the area of the triangle (determinant three vertex in homogeneous coordinates). Negative or possitive area. Negative or possitive area. Can be also used to cull zero area triangles Can be also used to cull zero area triangles

Post Vertex Shader Discard degenerate triangles: Discard degenerate triangles: If two or more vertex are the same (could be index based or full vertex comparition) the triangle can be discarded. If two or more vertex are the same (could be index based or full vertex comparition) the triangle can be discarded.

Rasterization Alternatives: Alternatives: Scanline incremental interpolation (DDA). Scanline incremental interpolation (DDA). Rasterization in homogeneous coordinates. Rasterization in homogeneous coordinates. Two phases: Two phases: Triangle setup. Triangle setup. Set interpolation registers. Set interpolation registers. Fragment generation. Fragment generation. Incrementally update the interpolants. Incrementally update the interpolants.