Download presentation
Presentation is loading. Please wait.
1
Status – Week 260 Victor Moya
2
Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
3
shSim Currently working: Currently working: Command Processor: reads a text based trace file (programs, parameters, vertexs, commands to rasterizer). Command Processor: reads a text based trace file (programs, parameters, vertexs, commands to rasterizer). Shader: simulates a N multithreaded, variable latency support, VS1 capable ‘vertex’ shader. Shader: simulates a N multithreaded, variable latency support, VS1 capable ‘vertex’ shader. Rasterizer: OpenGL ‘emulator’, accepts resolution and clip planes changes, recieves ‘shaded’ vertexs from the shader (only 2 QuadFloats, vertex positon + color), displays the triangles in a GL window. Rasterizer: OpenGL ‘emulator’, accepts resolution and clip planes changes, recieves ‘shaded’ vertexs from the shader (only 2 QuadFloats, vertex positon + color), displays the triangles in a GL window.
4
shSim Tests: Tests: 2/4 multithread (with another 2/4 input buffers) single shader. 2/4 multithread (with another 2/4 input buffers) single shader. Fixed 3 latency cycles. Shader to Rasterizer latency of 4. CommandProcessor to Rasterizer latency of 6. Fixed 3 latency cycles. Shader to Rasterizer latency of 4. CommandProcessor to Rasterizer latency of 6. Simple coordinate change traces (shader.input, shader.input.2). Simple coordinate change traces (shader.input, shader.input.2). Ripple vertex shader example from DX8 & DX9 SDK (ripple.input): Ripple vertex shader example from DX8 & DX9 SDK (ripple.input): Around 300 triangles (1100 vertexs). Around 300 triangles (1100 vertexs). Color is calculated from vertex position. Color is calculated from vertex position.
5
shSim Ripple.vsh. Ripple.vsh.
6
shSim Screenshots from frames rendered by shSim: Screenshots from frames rendered by shSim:
16
GPU Architecture Based in current GPUs: Based in current GPUs: NV30 NV30 R300 R300 Based in other graphic processors: Based in other graphic processors: PS3 PS3 Imagine Imagine
17
GPU Architecture Based in an API: Based in an API: DX8 DX8 DX9 DX9 DX10 DX10 OpenGL 1.4 and extensions. OpenGL 1.4 and extensions. OpenGL 2.0 OpenGL 2.0 Based in an architecture model: Based in an architecture model: Vector Vector Scalar Scalar Multithreaded Multithreaded
18
GPU Specification Shader Model: Shader Model: Language: Language: DX9: DX9: –VS2.0/PS2.0. –VS3.0/PS3.0. OpenGL: OpenGL: –NV_vertex_program_2/NV_fragment_program. –ARB_vertex_program/ARB_fragment_program. Our own language. Our own language.
19
GPU Specification Shader Architecture: Shader Architecture: Architectural model: Architectural model: Scalar. Scalar. SIMD. SIMD. Multithreaded. Multithreaded. Vector. Vector. Out-of-order. Out-of-order.
20
GPU Specification Configuration: Configuration: Integer Unit: Integer Unit: –Number. –Precission. –SIMD or scalar? Float Point Unit: Float Point Unit: –Number. –Precission. –SIMD or scalar?
21
GPU Specification Memory Unit: Memory Unit: –Number. –Texture modes. –Filtering modes. Register Banks: Register Banks: –Number. –Ports. –Size. –Scalar or SIMD?
22
XBOX (NV2A) Vertex Shader
23
Future Work Shader: Shader: Add branch/call/ret instructions. Add branch/call/ret instructions. Add texture instructions (Pixel Shader). Add texture instructions (Pixel Shader). Command Processor: Command Processor: Define a trace specification: binary, gzipped? Define a trace specification: binary, gzipped? Define an interface with OpenGL (Mesa?) or DX8/DX9 (driver?). Define an interface with OpenGL (Mesa?) or DX8/DX9 (driver?). Primitive Assembly: Primitive Assembly: Implement vertex cache and primitive assembly (only triangles?). Implement vertex cache and primitive assembly (only triangles?). Implement culling and clipping? Implement culling and clipping?
24
Future Work Deferred rendering? Deferred rendering? Transformed geometry must be stored in video memory. Transformed geometry must be stored in video memory. Geometry must be sorted: Geometry must be sorted: Tiles. Tiles. Front to back. Front to back. Rasterization: Rasterization: Triangle Setup and Fragment Generation. Triangle Setup and Fragment Generation. Any suited method: Olano & Greer, DDA?. Any suited method: Olano & Greer, DDA?. MSAA support? MSAA support?
25
Future Work Early Z and Hierarchical Z? Pixel Shader: Early Z and Hierarchical Z? Pixel Shader: Implement unified with vertex shaders? Implement unified with vertex shaders? Queue/buffering mechanism? (memory/texture latency very large). Queue/buffering mechanism? (memory/texture latency very large). Pixel Shader: Pixel Shader: Unified shader architecture? Unified shader architecture? Pixels need a lot of buffering (memory/texture operations). Pixels need a lot of buffering (memory/texture operations). Implement a TMU simulator (filter algorithms, memory access, texture compression, cache). Implement a TMU simulator (filter algorithms, memory access, texture compression, cache).
26
Future Work Fixed fragment operations: Fixed fragment operations: Implement using the shader? Implement using the shader? Fog: remove? Fog: remove? Pixel Ownership: remove? Pixel Ownership: remove? Scissor Test: implement (needed if clipping is not implemented). Scissor Test: implement (needed if clipping is not implemented). Alpha test: same as Z Test. Alpha test: same as Z Test. Z Test and Stencil Test: must be implemented, but could be added to a generic shader unit? Z Test and Stencil Test: must be implemented, but could be added to a generic shader unit? Blending: add to shader? Blending: add to shader? Dithering: remove. Dithering: remove. Logical Op: remove or add to shader. Logical Op: remove or add to shader. MSAA Operations: ? MSAA Operations: ?
27
Future Work Framebuffer: Framebuffer: Z compression. Z compression. Color compression. Color compression. SSAA or MSAA support? SSAA or MSAA support?
28
News and Rumors NV30 architecture: NV30 architecture: 4x2 pixel pipes? 4x2 pixel pipes? 8x zixel pipes (Z Test & Stencil only). 8x zixel pipes (Z Test & Stencil only). ATI ready to release R350 and RV350 in a couple of weeks. ATI ready to release R350 and RV350 in a couple of weeks. R350: Updated R300 core with additional features (?) and increased clock frequency (375 – 400 MHz). R350: Updated R300 core with additional features (?) and increased clock frequency (375 – 400 MHz). RV350: value chip based in R300 core. Maybe 8x1 core, 128 bits bus. Clock frequency 300 – 400 MHz. 75 Million transistors. RV350: value chip based in R300 core. Maybe 8x1 core, 128 bits bus. Clock frequency 300 – 400 MHz. 75 Million transistors.
29
Imagine ‘Computer Graphics on a Stream Architecture’, John Douglas Owens, PhD dissertation. ‘Computer Graphics on a Stream Architecture’, John Douglas Owens, PhD dissertation. Not read yet either. Not read yet either.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.