Status – Week 230 Victor Moya. Summary Simulator parameters. Simulator parameters. Oclusion culling (Z-Buffer). Oclusion culling (Z-Buffer). To be done.

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Advertisements

1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
Introduction to Massive Model Visualization Patrick Cozzi Analytical Graphics, Inc.
Week 10 - Monday.  What did we talk about last time?  Global illumination  Shadows  Projection shadows  Soft shadows.
Visibility in Games Harald Riegler. 2 / 18 Visibility in Games n What do we need it for? u Increase of rendering speed by removing unseen scene data from.
Two Methods for Fast Ray-Cast Ambient Occlusion Samuli Laine and Tero Karras NVIDIA Research.
Mark Nelson Rendering algorithms Fall 2013
Status – Week 228 Victor Moya. Summary Hierarchical Z-Buffer. Hierarchical Z-Buffer.
Vertices and Fragments I CS4395: Computer Graphics 1 Mohan Sridharan Based on slides created by Edward Angel.
1 Shader Performance Analysis on a Modern GPU Architecture Victor Moya, Carlos González, Jordi Roca, Agustín Fernández Jordi Roca, Agustín Fernández Department.
Status – Week 229 Victor Moya. Summary Simulator parameters. Simulator parameters. Hierarchical Z-Buffer. Hierarchical Z-Buffer.
Status – Week 250 Victor Moya. Summary Current State. Current State. Next Tasks. Next Tasks. Future Work. Future Work. Creditos investigación. Creditos.
Status – Week 274 Victor Moya. Simulator model Boxes. Boxes. Perform the actual work. Perform the actual work. A box can only access its own data, external.
Status – Week 249 Victor Moya. Summary MemoryController. MemoryController. Streamer. Streamer. TraceDriver. TraceDriver. Statistics. Statistics.
Status – Week 206 Victor Moya. Summary Fetch Cache. Fetch Cache. ColorCache. ColorCache. ColorWrite. ColorWrite. Next week. Next week.
Status – Week 259 Victor Moya. Summary OpenGL Traces. OpenGL Traces. DirectX Traces. DirectX Traces. Proxy CPU. Proxy CPU. Command Processor. Command.
Status – Week 247 Victor Moya. Summary Streamer. Streamer. TraceDriver. TraceDriver. bGPU bGPU Signal Traffic Analyzer. Signal Traffic Analyzer.
Status – Week 243 Victor Moya. Summary Current status. Current status. Tests. Tests. XBox documentation. XBox documentation. Post Vertex Shader geometry.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Status – Week 231 Victor Moya. Summary Primitive Assembly Primitive Assembly Clipping triangle rejection. Clipping triangle rejection. Rasterization.
Status – Week 277 Victor Moya.
Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.
GPU Simulator Victor Moya. Summary Rendering pipeline for 3D graphics. Rendering pipeline for 3D graphics. Graphic Processors. Graphic Processors. GPU.
Status – Week 248 Victor Moya. Summary Streamer. Streamer. TraceDriver. TraceDriver. bGPU bGPU Signal Traffic Analyzer. Signal Traffic Analyzer. How to.
Status – Week 265 Victor Moya. Summary ShaderEmulator ShaderEmulator ShaderFetch ShaderFetch ShaderDecodeExecute ShaderDecodeExecute Communication storage.
Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer.
Status – Week 270 Victor Moya. Summary ShaderEmulator. ShaderEmulator. ShaderSimulator. ShaderSimulator. Schedule. Schedule. Name. Name. Projects. Projects.
Status – Week 246 Victor Moya. Summary Signal Trace Format. Signal Trace Format. Creditos investigación. Creditos investigación.
Status – Week 272 Victor Moya. Vertex Shader VS 2.0+ (NV30) based Vertex Shader model. VS 2.0+ (NV30) based Vertex Shader model. Multithreaded?? Implemented.
Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.
Status – Week 240 Victor Moya. Summary Post Geometry Pipeline. Post Geometry Pipeline. Rasterization. Rasterization. Triangle Setup. Triangle Setup. Triangle.
Status – Week 283 Victor Moya. 3D Graphics Pipeline Akeley & Hanrahan course. Akeley & Hanrahan course. Fixed vs Programmable. Fixed vs Programmable.
Some Things Jeremy Sugerman 22 February Jeremy Sugerman, FLASHG 22 February 2005 Topics Quick GPU Topics Conditional Execution GPU Ray Tracing.
Status – Week 239 Victor Moya. Summary Primitive Assembly Primitive Assembly Clipping triangle rejection. Clipping triangle rejection. Rasterization.
Underlying Technologies Part Two: Software Mark Green School of Creative Media.
Status – Week 275 Victor Moya. Simulator model Boxes. Boxes. Perform the actual work. Perform the actual work. Parameters: wires in, wires out, child.
Status – Week 207 Victor Moya. Summary Z Test box. Z Test box. Z Compression. Z Compression. Z Cache. Z Cache. Stencil. Stencil. HZ Box. HZ Box. HZ Test.
Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
Status – Week 266 Victor Moya. Summary ShaderEmulator ShaderEmulator ShaderFetch ShaderFetch ShaderDecodeExecute ShaderDecodeExecute Communication storage.
1 A Hierarchical Shadow Volume Algorithm Timo Aila 1,2 Tomas Akenine-Möller 3 1 Helsinki University of Technology 2 Hybrid Graphics 3 Lund University.
Hidden Surface Removal
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
1 KIPA Game Engine Seminars Jonathan Blow Seoul, Korea November 29, 2002 Day 4.
© Copyright Khronos Group, Page 1 Harnessing the Horsepower of OpenGL ES Hardware Acceleration Rob Simpson, Bitboys Oy.
1 ATTILA: A Cycle-Level Execution-Driven Simulator for Modern GPU Architectures Victor Moya, Carlos González, Jordi Roca, Agustín Fernández Jordi Roca,
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.
Week 6 - Wednesday.  What did we talk about last time?  Light  Material  Sensors.
Occlusion Query. Content Occlusion culling Collision detection (convex) Etc. Fall
David Luebke11/26/2015 CS 551 / 645: Introductory Computer Graphics David Luebke
Xbox MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk.
CSCE 552 Spring D Models By Jijun Tang. Triangles Fundamental primitive of pipelines  Everything else constructed from them  (except lines and.
Emerging Technologies for Games Deferred Rendering CO3303 Week 22.
Global Illumination. Local Illumination  the GPU pipeline is designed for local illumination  only the surface data at the visible point is needed to.
Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna.
Mobile Graphics Patrick Cozzi University of Pennsylvania CIS Spring 2012.
4P13 Week 12 Talking Points Device Drivers 1.Auto-configuration and initialization routines 2.Routines for servicing I/O requests (the top half)
Maths & Technologies for Games Graphics Optimisation - Batching CO3303 Week 5.
1 Computer Graphics Week11 : Hidden Surface Removal.
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Real-Time Soft Shadows with Adaptive Light Source Sampling
Petri Nordlund Chief Architect Bitboys Oy
Petri Nordlund Chief Architect Bitboys Oy
So far we have covered … Basic visualization algorithms
5.2 Eleven Advanced Optimizations of Cache Performance
A unified instruction and data cache
Understanding Theory and application of 3D
UMBC Graphics for Games
A Hierarchical Shadow Volume Algorithm
Frame Buffer Applications
Presentation transcript:

Status – Week 230 Victor Moya

Summary Simulator parameters. Simulator parameters. Oclusion culling (Z-Buffer). Oclusion culling (Z-Buffer). To be done. To be done.

Simulator Parameters Parametrized boxes: Parametrized boxes: Size of data buses. Size of data buses. Start latencies. Start latencies. Buffer and memory sizes. Buffer and memory sizes. Signals bandwidth and latencies. Signals bandwidth and latencies. Alternatives: Alternatives: Signals are globally parametrized. Signals are globally parametrized. Box features are parametrized through the box. Box features are parametrized through the box.

Simulator Parameters Example: (current) Example: (current) ShaderFetch( ShaderFetch( ShaderEmulator *shEmu ShaderEmulator *shEmu u32bit nThreads u32bit nThreads u32bit nInputBuffers u32bit nInputBuffers u32bit issue u32bit issue char *name char *name char *prefsix char *prefsix Box *parent Box *parent )

Simulator Parameters Example: Example: Promote MAXSHADERTHREADINSTRUCTIONS to parameter? Promote MAXSHADERTHREADINSTRUCTIONS to parameter? OUTPU_TRANSMISSION_LATENCY 0.5 => promote to parameter. OUTPU_TRANSMISSION_LATENCY 0.5 => promote to parameter. OUTPUT_DELAY_LATENCY promote to parameters (related with a signal). OUTPUT_DELAY_LATENCY promote to parameters (related with a signal). Signals: Signals: CommShaderCommand 1, 1 CommShaderCommand 1, 1 ShaderCommand 1, 1 ShaderCommand 1, 1 ShaderNewPC 2, 1 ShaderNewPC 2, 1 ShaderDecodeState 1, 1 ShaderDecodeState 1, 1 ShaderState1, 1 ShaderState1, 1 ShaderInstruction1, 1 ShaderInstruction1, 1 ShaderOutput1, 11 ShaderOutput1, 11 ConsumerState1, 1 ConsumerState1, 1

Simulator Parameters Configuration file: bgpu.ini Configuration file: bgpu.ini [Shader] [Shader] fetchRate = 1 fetchRate = 1 numThreads = 4 numThreads = Command line options: Command line options: -shaderFetchRate 1 –shaderThreads 1 … -shaderFetchRate 1 –shaderThreads 1 … Configuration in compilation? Configuration in compilation?

Occlusion culling. Try to do not draw what is not needed to be drawn. Try to do not draw what is not needed to be drawn. Main technique: Z-Buffer. Main technique: Z-Buffer. But uses a lot of bandwidth: But uses a lot of bandwidth: 1 read and 1 write if fragment must be drawn. 1 read and 1 write if fragment must be drawn. 1 read if fragment is not drawn. 1 read if fragment is not drawn. And it is done after all the ‘hard work’ has already performed. And it is done after all the ‘hard work’ has already performed.

Occlusion Culling Alternatives: Alternatives: Early Z: Early Z: Test before fragment processing. Test before fragment processing. Reduces fill rate/fragment processing because of occluded fragments. Reduces fill rate/fragment processing because of occluded fragments. Normal Z test still remains. Normal Z test still remains. But the read is more than likely going to hit the cache. But the read is more than likely going to hit the cache. Performance depends in the draw order (front to back). Performance depends in the draw order (front to back).

Occlusion Culling Alternatives: Alternatives: Hierarchical Z: Hierarchical Z: Store a less detailed Z map on chip. Store a less detailed Z map on chip. ATI HyperZ: 8x8 blocks. ATI HyperZ: 8x8 blocks. Reduces bandwidth. Reduces bandwidth. Used with early Z. Used with early Z. Performance depends in the draw order (front to back). Performance depends in the draw order (front to back).

Occlusion Culling Alternatives: Alternatives: Z compression: Z compression: Block based (ATI: 8x8). Block based (ATI: 8x8). Diferential based (fragments of the same primitive use to have a similar Z). Diferential based (fragments of the same primitive use to have a similar Z). Reduces bandwidth. Reduces bandwidth. Fast Z Buffer Clear: Fast Z Buffer Clear: Reduces bandwidth. Reduces bandwidth. Reduces scene initialization time. Reduces scene initialization time. Blocks: compressed or cleared. Blocks: compressed or cleared.

Occlusion Culling Other alternatives: Other alternatives: Deferred rendering: Deferred rendering: Order primitives based in their Z. Order primitives based in their Z. Ray trace the deferred primitives. Ray trace the deferred primitives. Tile primitives into an internal Z buffer (0 external bandwidth required). Tile primitives into an internal Z buffer (0 external bandwidth required). Delay Streams. Delay Streams.

Occlusion Culling Related topics: Related topics: Coverage: Coverage: Supersampling. Supersampling. Multisampling. Multisampling. Occlusion queries. Occlusion queries.

To be done Change batch synchronization mechanism to the one discussed the other week: Change batch synchronization mechanism to the one discussed the other week: Last ‘special’ triangle marks end of batch. Last ‘special’ triangle marks end of batch. Boxes wait for that triangle. Boxes wait for that triangle. Decide which synchronization mechanism is going to be used between consumer/producer boxes: Decide which synchronization mechanism is going to be used between consumer/producer boxes: Unified mechanism? Unified mechanism? Or whatever works. Or whatever works.