Status – Week 228 Victor Moya. Summary Hierarchical Z-Buffer. Hierarchical Z-Buffer.

Slides:



Advertisements
Similar presentations
Machine cycle.
Advertisements

361 Computer Architecture Lecture 15: Cache Memory
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Lecture 12 Reduce Miss Penalty and Hit Time
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
Performance of Cache Memory
CSCE614 HW4: Implementing Pseudo-LRU Cache Replacement Policy
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
Computer Graphics1 The A-buffer an Antialiased Hidden Surface Method.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Virtual Memory I Steve Ko Computer Sciences and Engineering University at Buffalo.
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
Status – Week 229 Victor Moya. Summary Simulator parameters. Simulator parameters. Hierarchical Z-Buffer. Hierarchical Z-Buffer.
Status – Week 274 Victor Moya. Simulator model Boxes. Boxes. Perform the actual work. Perform the actual work. A box can only access its own data, external.
Status – Week 206 Victor Moya. Summary Fetch Cache. Fetch Cache. ColorCache. ColorCache. ColorWrite. ColorWrite. Next week. Next week.
Status – Week 243 Victor Moya. Summary Current status. Current status. Tests. Tests. XBox documentation. XBox documentation. Post Vertex Shader geometry.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Status – Week 231 Victor Moya. Summary Primitive Assembly Primitive Assembly Clipping triangle rejection. Clipping triangle rejection. Rasterization.
Status – Week 277 Victor Moya.
Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.
GPU Simulator Victor Moya. Summary Rendering pipeline for 3D graphics. Rendering pipeline for 3D graphics. Graphic Processors. Graphic Processors. GPU.
Status – Week 230 Victor Moya. Summary Simulator parameters. Simulator parameters. Oclusion culling (Z-Buffer). Oclusion culling (Z-Buffer). To be done.
1 A Single (Unified) Shader GPU Microarchitecture for Embedded Systems Victor Moya, Carlos González, Jordi Roca, Agustín Fernández Department of Computer.
Status – Week 265 Victor Moya. Summary ShaderEmulator ShaderEmulator ShaderFetch ShaderFetch ShaderDecodeExecute ShaderDecodeExecute Communication storage.
Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Status – Week 272 Victor Moya. Vertex Shader VS 2.0+ (NV30) based Vertex Shader model. VS 2.0+ (NV30) based Vertex Shader model. Multithreaded?? Implemented.
Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.
EECS 470 Cache Systems Lecture 13 Coverage: Chapter 5.
Status – Week 240 Victor Moya. Summary Post Geometry Pipeline. Post Geometry Pipeline. Rasterization. Rasterization. Triangle Setup. Triangle Setup. Triangle.
Status – Week 283 Victor Moya. 3D Graphics Pipeline Akeley & Hanrahan course. Akeley & Hanrahan course. Fixed vs Programmable. Fixed vs Programmable.
Status – Week 239 Victor Moya. Summary Primitive Assembly Primitive Assembly Clipping triangle rejection. Clipping triangle rejection. Rasterization.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
Status – Week 207 Victor Moya. Summary Z Test box. Z Test box. Z Compression. Z Compression. Z Cache. Z Cache. Stencil. Stencil. HZ Box. HZ Box. HZ Test.
Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
Status – Week 245 Victor Moya. Summary Streamer Streamer Creditos investigación. Creditos investigación.
Status – Week 227 Victor Moya. Summary How to lose a week. How to lose a week. Rasterization. Rasterization.
COOL Chips IV A High Performance 3D Graphics Rasterizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee*, Seung-Gi Lee, Moon-Hee Choi, Won-Jong.
Shadows Computer Graphics. Shadows Shadows Extended light sources produce penumbras In real-time, we only use point light sources –Extended light sources.
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
CS 153 Design of Operating Systems Spring 2015 Final Review.
CSC418 Computer Graphics n BSP tree n Z-Buffer n A-buffer n Scanline.
|Processors designed for low power |Architectural state is correct at basic block granularity rather than instruction granularity 2.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Proposal Presentation Aircraft Combat (MINI Game) Tan Siyu Miao Yun 10/9/2013.
Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
Lecture 20 Last lecture: Today’s lecture: Types of memory
Week 12 Methods for passing actual parameters to formal parameters.
Cache Data Compaction: Milestone 2 Edward Ma, Siva Penke, Abhijeeth Nuthan.
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
CS2100 Computer Organization
Lecture: Large Caches, Virtual Memory
Associativity in Caches Lecture 25
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
Day 22 Virtual Memory.
5.2 Eleven Advanced Optimizations of Cache Performance
ECE 445 – Computer Organization
Module IV Memory Organization.
Virtual Memory فصل هشتم.
Adapted from slides by Sally McKee Cornell University
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Presentation transcript:

Status – Week 228 Victor Moya

Summary Hierarchical Z-Buffer. Hierarchical Z-Buffer.

Hierarchical Z-Buffer Two Level Hierarchical Z-Buffer for 3D Graphics Hardware. Cheng-Hsien Chen, Chen-Yi Lee. Two Level Hierarchical Z-Buffer for 3D Graphics Hardware. Cheng-Hsien Chen, Chen-Yi Lee. System, Method and Apparatus for Multi-Level Hierarchical Z Buffering. US Patent Application 2003/ System, Method and Apparatus for Multi-Level Hierarchical Z Buffering. US Patent Application 2003/

Hierarchical Z-Buffer PACLIPRASTINTPSFTEST HZ ZTEST

Hierarchical Z-Buffer HZ MEM Z CACHE DECODE ENCODE COMBINING CACHE MEMORY (Z-BUFFER) MERGE LOGIC

Hierarchical Buffer Model from ATI patent application. Model from ATI patent application. See also two other ATI patents about early implementations of the HZ. See also two other ATI patents about early implementations of the HZ. Two level HZ. Two level HZ. but only one used? but only one used? Early Z access is performed in two phases: Early Z access is performed in two phases: access to the HZ L2. access to the HZ L2. if passes, access the Z cache. if passes, access the Z cache. if miss, access memory (fetch new Z cache line). if miss, access memory (fetch new Z cache line).

Hierarchical Z-Buffer Z Cache Z Cache small lines? 32 bit per pixel, 4 pixels per line? small lines? 32 bit per pixel, 4 pixels per line? data is compressed in the Z-Buffer. data is compressed in the Z-Buffer. decode/decompress at line fetch. decode/decompress at line fetch. encode/compress at line evict. encode/compress at line evict. compress mechanism is also used to calculate the farthest Z value in the line. compress mechanism is also used to calculate the farthest Z value in the line. size? size? replacement policy? replacement policy?

Hierarchical Z-Buffer HZ Memory HZ Memory each position stores the farthest Z value in a NxM tile of the original Z Buffer. each position stores the farthest Z value in a NxM tile of the original Z Buffer. data precission? 8 bits? 16 bits? 32 bits? data precission? 8 bits? 16 bits? 32 bits? combine cache to build the tile farthest value. combine cache to build the tile farthest value. number of HZ levels? number of HZ levels? L1 on die, L2 on cache/memory. L1 on die, L2 on cache/memory.

Hierarchical Z-Buffer HZ memory HZ memory combining cache size? combining cache size? replacement policy? FIFO? replacement policy? FIFO? update mechanism update mechanism

Hierarchical Z-Buffer HZ Memory HZ Memory size? size? Example: Example: –8x8 tiles –8 bits per value –2048x2048 resolution –64KB –a second level can be implemented using pointers. –LARGE!

Hierarchical Z-Buffer Bit-mask Cache HZ Buffer Z write

Hierarchical Z-Buffer TMPZ COVERAGE MASK COVERAGE?

Hierarchical Z-Buffer Light weight Z-Buffer? Light weight Z-Buffer? HZ Buffer: HZ Buffer: 2 bit pointer (L2 HZ). 2 bit pointer (L2 HZ). 4 x 8 bit Z values (L1 HZ). 4 x 8 bit Z values (L1 HZ). ~49 KB for 1024x768. ~49 KB for 1024x768. HZ test per primitive. HZ test per primitive. HZ test per fragment. HZ test per fragment.

Hierarchical Z-Buffer Bit-mask cache: Bit-mask cache: builds HZ blocks. builds HZ blocks. holds the current farthest Z for the block. holds the current farthest Z for the block. 1 bit per block pixel: covered. 1 bit per block pixel: covered. FIFO replacing policy. FIFO replacing policy. if full block covered update HZ buffer. if full block covered update HZ buffer.

Hierarchical Z-Buffer access HZ access Z Cache cull discard passes test hitcull pass discard pass miss Access memory Replace test

Hierarchical Z-Buffer Triangle Traversal Hierarchical Z Interpolator fragment test Z test? Z Cache Memor Controller HZ TEST BOXES

Hierarchical Z Buffer Z test fragment Z cache Memory Controller Hierarchical Z Z TEST AND Z AND HZ UPDATE BOXES

Cache simulation c1c2c3c4 box1 box2 acc1 res1acc2 res2 accc3 acc3 res3 Latency 2 Throughput 1 Box2 = cache c5

Cache simulation c1c2c3c4 box1 acc1res1res2 accc3 res3 c5 acc2 Latency 1 cycle Throughput 1 Box1 => included cache

Cache simulation Use always 2+ cycle access caches? Use always 2+ cycle access caches? Use always in-box caches? Use always in-box caches? Shared cache? Shared cache?