Presentation is loading. Please wait.

Presentation is loading. Please wait.

Status – Week 207 Victor Moya. Summary Z Test box. Z Test box. Z Compression. Z Compression. Z Cache. Z Cache. Stencil. Stencil. HZ Box. HZ Box. HZ Test.

Similar presentations


Presentation on theme: "Status – Week 207 Victor Moya. Summary Z Test box. Z Test box. Z Compression. Z Compression. Z Cache. Z Cache. Stencil. Stencil. HZ Box. HZ Box. HZ Test."— Presentation transcript:

1 Status – Week 207 Victor Moya

2 Summary Z Test box. Z Test box. Z Compression. Z Compression. Z Cache. Z Cache. Stencil. Stencil. HZ Box. HZ Box. HZ Test. HZ Test. Traces. Traces.

3 Z Test box Z Test box includes: Z Test box includes: Z cache. Z cache. Z encoder (compress and reference value). Z encoder (compress and reference value). Z decoder (decompress). Z decoder (decompress). Z test. Z test. Z update. Z update. Stencil test. Stencil test. Stencil update. Stencil update.

4 Stencil Test Read Fetch Z Test Stencil Update Write Z Cache Enc Dec Fragments/Stamps Reference Z value Compressed Z Line/Block

5 Z Compression. ATI HOT 3D in Eurographics 2000. ATI HOT 3D in Eurographics 2000. 8x8 pixel block (Z cache line). 8x8 pixel block (Z cache line). DDPCM : differential differential pulse code modulation. DDPCM : differential differential pulse code modulation. Two modes: Two modes: ½ of original size. ½ of original size. ¼ of original size. ¼ of original size. Entropy encoder. Entropy encoder. Entropy encoders? Entropy encoders? Hufffman. Hufffman. Arithmetic encoder. Arithmetic encoder.

6 Entropy Encoder ---- ---- --- -- 8 input z values 1D Z Compression

7 2D Z Compression 64 pixels 2D DDPCM Entropy Encoder Packer

8 Z Compression ATI patent application 20030038803. ATI patent application 20030038803. Two reference values MAX and MIN. Two reference values MAX and MIN. Offset values. Offset values. Windows. Windows. Other method I don’t understand yet … Other method I don’t understand yet … S3 patent 6,411,295. S3 patent 6,411,295. Similar approach. Similar approach. Others. Others.

9 Z Compression Method 1: Method 1: MIN and MAX per cache line/block. MIN and MAX per cache line/block. 1 bit flag per pixel/Z value telling which reference value to use. 1 bit flag per pixel/Z value telling which reference value to use. The offset from MIN or MAX reference values are stored in the compressed output. The offset from MIN or MAX reference values are stored in the compressed output. The offsets must be inside a window of T values (log2T = bits per offset) from MIN and MAX. The offsets must be inside a window of T values (log2T = bits per offset) from MIN and MAX.

10 Z = 0Z = 1 ZminZmax z = Zmin + T - 1z = Zmax - T + 1

11 MAX MIN

12 Z Compression Method 2: Method 2: Z values are divided into upper and lower bits. Z values are divided into upper and lower bits. Keep UMAX and UMIN. Keep UMAX and UMIN. Calculate A = Umin - 1, B = UMAX + 1. Calculate A = Umin - 1, B = UMAX + 1. 2-bit flag per pixel/Z value references the upper bits from { UMAX, UMIN, A, B}. 2-bit flag per pixel/Z value references the upper bits from { UMAX, UMIN, A, B}. Lower bits per pixel/Z value are stored in the compressed output. Lower bits per pixel/Z value are stored in the compressed output.

13 Z = 0Z = 1 Umin << aUmax << a ZminZmax AB

14 Umin

15 Z Compression Reference values in the compressed output. Reference values in the compressed output. Compression flags on die. Compression flags on die. Useful for fast clear too. Useful for fast clear too.

16 Z Cache Normal cache? Normal cache? Or ‘fetch’ cache? Or ‘fetch’ cache? Normal cache that supports a large number of active misses (miss on miss, miss on hit). Normal cache that supports a large number of active misses (miss on miss, miss on hit). Or prefetching? Or prefetching?

17 Z Cache Fetch vs Prefetch. Fetch vs Prefetch. Fetch needs additional state (bits) per cache line. Fetch needs additional state (bits) per cache line. Fetch needs additional port to the cache tag file. Fetch needs additional port to the cache tag file. Fetch implies a large queue or stalls somewhere. Fetch implies a large queue or stalls somewhere. Prefetch requires a predictor. Prefetch requires a predictor. Prefetch may request data that won’t be used (failed predictions). Prefetch may request data that won’t be used (failed predictions).

18 Z Cache Prefetching. Prefetching. Very easy to predict next data inside a triangle (large). Very easy to predict next data inside a triangle (large). Quite common (middle-small triangles). Quite common (middle-small triangles). Easy to predict next data inside a tristrip or triangle list batch. Easy to predict next data inside a tristrip or triangle list batch. Very common. Very common. Hard to predict next data between batches (or meshes)? Hard to predict next data between batches (or meshes)? But will happen rarely. But will happen rarely.

19 Z Cache “Fetch cache” “Fetch cache” In fact prefetching. In fact prefetching. Texture Prefetching Architecture. Texture Prefetching Architecture. Akeley course. Akeley course. Igehy, Eldridge, Proudfoot, Prefetching in a texture cache architecture. Igehy, Eldridge, Proudfoot, Prefetching in a texture cache architecture. –Not read yet. Slightly different concept: Slightly different concept: Our fetch cache is accessing twice the tag file. Our fetch cache is accessing twice the tag file. But simulated is the same as we are not taxing the tag file access!! But simulated is the same as we are not taxing the tag file access!! Change mechanism so that fetch returns pointer to the cache line. Change mechanism so that fetch returns pointer to the cache line.

20 Rasterizer FIFO Stall Cache Tags Request FIFO Reorder Buffer Texture Memory Cache Data Texture Filter Texture Apply

21 Stencil Stencil and Z share a 32 bit word per pixel: Stencil and Z share a 32 bit word per pixel: 8/24. 8/24. 0/32. 0/32. 2x16 (Z only!!). 2x16 (Z only!!).

22 Stencil Stencil compression: Stencil compression: If stencil is not active and is cleared: If stencil is not active and is cleared: Remove stencil field from compressed data. Remove stencil field from compressed data. If stencil is active or not cleared: If stencil is active or not cleared: Compress stencil? Compress stencil? –Independent of Z compression. –Needs more compression flag bits. –Which is the average stencil value? Or log2 of the value? –How much can be saved? 8b to 1b, 2b, 4b? Worth of it?

23 HZ Box Hierarchical Z buffer. Hierarchical Z buffer. Number of levels? Number of levels? Size? Size? On die? On die? Includes: Includes: Memory for storing the different levels. Memory for storing the different levels. Update mechanism. Update mechanism. Process requests and updates. Process requests and updates.

24 HZ Box ATI model (from patents XXX, and XXX). ATI model (from patents XXX, and XXX). 2 levels. 2 levels. 1 st level is from original 8x8 blocks (z cache line). 1 st level is from original 8x8 blocks (z cache line). 2 nd level is 2x2 (?) values from level 1. 2 nd level is 2x2 (?) values from level 1. Update mechanism: Update mechanism: Z Max (or Z Min) from the Z encoder (compressor) for a 8x8 block (cache line). Z Max (or Z Min) from the Z encoder (compressor) for a 8x8 block (cache line). Combining cache for level 2 (?). Combining cache for level 2 (?). Write and update on eviction from combining cache (?). Write and update on eviction from combining cache (?).

25 HZ Test Compares the incoming Z value from a graphic object to the reference Z value stored in one or more of the Hierarchical Z levels. Compares the incoming Z value from a graphic object to the reference Z value stored in one or more of the Hierarchical Z levels. What can be tested: What can be tested: Triangle Z (or 3 vertex Z). Triangle Z (or 3 vertex Z). Cull a whole triangle. Cull a whole triangle. Blocks of fragments: Blocks of fragments: Good for recursive descent or tiled!!. Good for recursive descent or tiled!!. Large blocks to level 2. Large blocks to level 2. 8x8 (or less) blocks to level 1. 8x8 (or less) blocks to level 1. Stamps (2x2) or fragments: Stamps (2x2) or fragments: Against level 1 (slow access? fast update?). Against level 1 (slow access? fast update?). Against level 2 (fast access? slow update?). Against level 2 (fast access? slow update?).

26 HZ L2 HZ L1

27 Traces I stalled Carlos work so delayed until next week. I stalled Carlos work so delayed until next week.

28 Web I’m writing my web page. I’m writing my web page. GPU3D page? GPU3D page? Public/private. Public/private.


Download ppt "Status – Week 207 Victor Moya. Summary Z Test box. Z Test box. Z Compression. Z Compression. Z Cache. Z Cache. Stencil. Stencil. HZ Box. HZ Box. HZ Test."

Similar presentations


Ads by Google