Download presentation
Presentation is loading. Please wait.
1
Status – Week 230 Victor Moya
2
Summary Simulator parameters. Simulator parameters. Oclusion culling (Z-Buffer). Oclusion culling (Z-Buffer). To be done. To be done.
3
Simulator Parameters Parametrized boxes: Parametrized boxes: Size of data buses. Size of data buses. Start latencies. Start latencies. Buffer and memory sizes. Buffer and memory sizes. Signals bandwidth and latencies. Signals bandwidth and latencies. Alternatives: Alternatives: Signals are globally parametrized. Signals are globally parametrized. Box features are parametrized through the box. Box features are parametrized through the box.
4
Simulator Parameters Example: (current) Example: (current) ShaderFetch( ShaderFetch( ShaderEmulator *shEmu ShaderEmulator *shEmu u32bit nThreads u32bit nThreads u32bit nInputBuffers u32bit nInputBuffers u32bit issue u32bit issue char *name char *name char *prefsix char *prefsix Box *parent Box *parent )
5
Simulator Parameters Example: Example: Promote MAXSHADERTHREADINSTRUCTIONS to parameter? Promote MAXSHADERTHREADINSTRUCTIONS to parameter? OUTPU_TRANSMISSION_LATENCY 0.5 => promote to parameter. OUTPU_TRANSMISSION_LATENCY 0.5 => promote to parameter. OUTPUT_DELAY_LATENCY promote to parameters (related with a signal). OUTPUT_DELAY_LATENCY promote to parameters (related with a signal). Signals: Signals: CommShaderCommand 1, 1 CommShaderCommand 1, 1 ShaderCommand 1, 1 ShaderCommand 1, 1 ShaderNewPC 2, 1 ShaderNewPC 2, 1 ShaderDecodeState 1, 1 ShaderDecodeState 1, 1 ShaderState1, 1 ShaderState1, 1 ShaderInstruction1, 1 ShaderInstruction1, 1 ShaderOutput1, 11 ShaderOutput1, 11 ConsumerState1, 1 ConsumerState1, 1
6
Simulator Parameters Configuration file: bgpu.ini Configuration file: bgpu.ini [Shader] [Shader] fetchRate = 1 fetchRate = 1 numThreads = 4 numThreads = 4...... Command line options: Command line options: -shaderFetchRate 1 –shaderThreads 1 … -shaderFetchRate 1 –shaderThreads 1 … Configuration in compilation? Configuration in compilation?
7
Occlusion culling. Try to do not draw what is not needed to be drawn. Try to do not draw what is not needed to be drawn. Main technique: Z-Buffer. Main technique: Z-Buffer. But uses a lot of bandwidth: But uses a lot of bandwidth: 1 read and 1 write if fragment must be drawn. 1 read and 1 write if fragment must be drawn. 1 read if fragment is not drawn. 1 read if fragment is not drawn. And it is done after all the ‘hard work’ has already performed. And it is done after all the ‘hard work’ has already performed.
8
Occlusion Culling Alternatives: Alternatives: Early Z: Early Z: Test before fragment processing. Test before fragment processing. Reduces fill rate/fragment processing because of occluded fragments. Reduces fill rate/fragment processing because of occluded fragments. Normal Z test still remains. Normal Z test still remains. But the read is more than likely going to hit the cache. But the read is more than likely going to hit the cache. Performance depends in the draw order (front to back). Performance depends in the draw order (front to back).
9
Occlusion Culling Alternatives: Alternatives: Hierarchical Z: Hierarchical Z: Store a less detailed Z map on chip. Store a less detailed Z map on chip. ATI HyperZ: 8x8 blocks. ATI HyperZ: 8x8 blocks. Reduces bandwidth. Reduces bandwidth. Used with early Z. Used with early Z. Performance depends in the draw order (front to back). Performance depends in the draw order (front to back).
10
Occlusion Culling Alternatives: Alternatives: Z compression: Z compression: Block based (ATI: 8x8). Block based (ATI: 8x8). Diferential based (fragments of the same primitive use to have a similar Z). Diferential based (fragments of the same primitive use to have a similar Z). Reduces bandwidth. Reduces bandwidth. Fast Z Buffer Clear: Fast Z Buffer Clear: Reduces bandwidth. Reduces bandwidth. Reduces scene initialization time. Reduces scene initialization time. Blocks: compressed or cleared. Blocks: compressed or cleared.
11
Occlusion Culling Other alternatives: Other alternatives: Deferred rendering: Deferred rendering: Order primitives based in their Z. Order primitives based in their Z. Ray trace the deferred primitives. Ray trace the deferred primitives. Tile primitives into an internal Z buffer (0 external bandwidth required). Tile primitives into an internal Z buffer (0 external bandwidth required). Delay Streams. Delay Streams.
12
Occlusion Culling Related topics: Related topics: Coverage: Coverage: Supersampling. Supersampling. Multisampling. Multisampling. Occlusion queries. Occlusion queries.
13
To be done Change batch synchronization mechanism to the one discussed the other week: Change batch synchronization mechanism to the one discussed the other week: Last ‘special’ triangle marks end of batch. Last ‘special’ triangle marks end of batch. Boxes wait for that triangle. Boxes wait for that triangle. Decide which synchronization mechanism is going to be used between consumer/producer boxes: Decide which synchronization mechanism is going to be used between consumer/producer boxes: Unified mechanism? Unified mechanism? Or whatever works. Or whatever works.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.