Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Dynamic Adaptive Multi-resolution GPU Data Structure Adaptive Shadow Maps, Octree 3D Paint, Adaptive PDE Solver Aaron Lefohn University of California,

Similar presentations


Presentation on theme: "A Dynamic Adaptive Multi-resolution GPU Data Structure Adaptive Shadow Maps, Octree 3D Paint, Adaptive PDE Solver Aaron Lefohn University of California,"— Presentation transcript:

1 A Dynamic Adaptive Multi-resolution GPU Data Structure Adaptive Shadow Maps, Octree 3D Paint, Adaptive PDE Solver Aaron Lefohn University of California, Davis

2 Aaron Lefohn University of California, Davis 2 Problem Statement Goal Goal Dynamic, adaptive, multi-resolution GPU data structure Dynamic, adaptive, multi-resolution GPU data structure Efficient read, write, structure change Efficient read, write, structure change Adaptive shadow maps, octree 3D paint, adaptive PDE solver Adaptive shadow maps, octree 3D paint, adaptive PDE solver Challenges Challenges All operations must be data-parallel All operations must be data-parallel Trees difficult to update and cause incoherent accesses Trees difficult to update and cause incoherent accesses Solution Solution Leverage virtual memory research from architecture Leverage virtual memory research from architecture Page-table based structure Page-table based structure Decouple levels of indirection from resolution levels Decouple levels of indirection from resolution levels Easy implementation with the Glift template library Easy implementation with the Glift template library

3 Aaron Lefohn University of California, Davis 3 Collaborators Joe Kniss University of Utah Joe Kniss University of Utah Robert Strzodka CAESAR Research Institute Robert Strzodka CAESAR Research Institute Shubhabrata Sengupta University of California, Davis Shubhabrata Sengupta University of California, Davis John Owens University of California, Davis John Owens University of California, Davis

4 Aaron Lefohn University of California, Davis 4 Assumptions This talk heavily relies on the contents of the “Glift” generic data structure talk This talk heavily relies on the contents of the “Glift” generic data structure talk

5 Aaron Lefohn University of California, Davis 5 Is This GPGPU Programming? Yes Yes Inseparable mix of GPGPU stream programming and traditional graphics Inseparable mix of GPGPU stream programming and traditional graphics High-quality interactive rendering High-quality interactive rendering Updating complex GPU data structures Updating complex GPU data structures

6 Aaron Lefohn University of California, Davis 6 Previous Work Binotto et al. Binotto et al. Carr et al. Carr et al. Coombe et al. Coombe et al. Ertl et al. Ertl et al. Lefebvre et al. Lefebvre et al. Purcell et al. Purcell et al.

7 Aaron Lefohn University of California, Davis 7 Why A New Structure? What’s Missing? What’s Missing? Fully GPU-based adaptive multi-resolution structure Fully GPU-based adaptive multi-resolution structure GPU based address translator GPU based address translator GPU based updates of address translator GPU based updates of address translator Trilinear/Quadlinear mipmap filtering support Trilinear/Quadlinear mipmap filtering support Uniform, coherent memory accesses Uniform, coherent memory accesses

8 Aaron Lefohn University of California, Davis 8 Applications Adaptive shadow maps Adaptive shadow maps Octree Octree 3D paint 3D paint Adaptive partial differential equation solver Adaptive partial differential equation solver......

9 Aaron Lefohn University of California, Davis 9 Adaptive Shadow Maps Fernando et al., ACM SIGGRAPH 2001 Fernando et al., ACM SIGGRAPH 2001 Elegant solution to shadow map aliasing Elegant solution to shadow map aliasing Quadtree of small shadow maps Quadtree of small shadow maps Many recent (2004) shadow papers cite ASMs as high quality solution but not possible on graphics hardware Many recent (2004) shadow papers cite ASMs as high quality solution but not possible on graphics hardware Application

10 Aaron Lefohn University of California, Davis 10 ASM Data Structure Requirements Adaptive Adaptive Multiresolution Multiresolution Fast, parallel random-access read Fast, parallel random-access read 2x2 native Percentage Closer Filtering (PCF) 2x2 native Percentage Closer Filtering (PCF) Trilinear interpolated mipmapped PCF Trilinear interpolated mipmapped PCF Fast, parallel write Fast, parallel write Fast, parallel insert and erase Fast, parallel insert and erase Application

11 Aaron Lefohn University of California, Davis 11 Octree 3D Paint Problem Problem Apply paint to non-parameterized surface Apply paint to non-parameterized surface Complex topology Complex topology Implicit surface Implicit surface Solution Solution Octree textures, brick maps, etc. Octree textures, brick maps, etc. Benson & Davis and DeBry et al., SIGGRAPH 2002 Benson & Davis and DeBry et al., SIGGRAPH 2002

12 Aaron Lefohn University of California, Davis 12 Octree 3D Paint Requirements Adaptive Adaptive Multiresolution Multiresolution Fast, parallel random-access read Fast, parallel random-access read 3x3 native trilinear filtering 3x3 native trilinear filtering Quadlinear interpolated mipmapping Quadlinear interpolated mipmapping Fast, parallel write Fast, parallel write Fast, parallel insert and erase Fast, parallel insert and erase Application

13 Aaron Lefohn University of California, Davis 13 Adaptive PDE Solver WARNING : Work in progress… WARNING : Work in progress… Problem Problem Large 3D partial differential equation solvers are slow Large 3D partial differential equation solvers are slow Solution Solution Adaptive solver that focuses computation on regions of interest Adaptive solver that focuses computation on regions of interest Octree simulation domain Octree simulation domain Losasso et al., SIGGRAPH 2004 Losasso et al., SIGGRAPH 2004

14 Aaron Lefohn University of California, Davis 14 Adaptive PDE Solver Requirements Adaptive Adaptive Multiresolution? Multiresolution? Fast, parallel neighborhood read Fast, parallel neighborhood read Fast, parallel write Fast, parallel write Efficient stream processing of octree nodes Efficient stream processing of octree nodes Fast, parallel insert and erase Fast, parallel insert and erase Application

15 Aaron Lefohn University of California, Davis 15 GPU Dynamic, Adaptive Data Structure Three applications have nearly identical requirements Three applications have nearly identical requirements Describe structure in 2D for ASM Describe structure in 2D for ASM

16 Aaron Lefohn University of California, Davis 16 ASM Virtual Domain Shadow map coordinates Shadow map coordinates (0,0) (1,0) (1,1)(0,1)

17 Aaron Lefohn University of California, Davis 17 ASM Physical Domain Paged 2D texture memory Paged 2D texture memory All physical pages identical size (very important!) All physical pages identical size (very important!) Physical DomainVirtual Domain ?

18 Aaron Lefohn University of California, Davis 18 ASM Address Translator Mipmapped page table Mipmapped page table Physical DomainVirtual Domain

19 Aaron Lefohn University of California, Davis 19 ASM Address Translator Start with page table Start with page table Coarse, uniform discretization of virtual domain Coarse, uniform discretization of virtual domain Very common in GPU structures Very common in GPU structures LOTS of architecture literature LOTS of architecture literature O(N) memory, O(1) insert, O(1) computation, O(1) erase uniform consistency, partial mapping (sparse) O(N) memory, O(1) insert, O(1) computation, O(1) erase uniform consistency, partial mapping (sparse) Application

20 Aaron Lefohn University of California, Davis 20 ASM Address Translator Page table example Page table example Application Physical MemoryPage TableVirtual Domain vpn = va / pageSizeppa = pageTable(vpn) off = va % pageSize pa = ppa + off

21 Aaron Lefohn University of California, Davis 21 ASM Data Structure Requirements Adaptive Adaptive Multiresolution Multiresolution Fast, parallel random-access read Fast, parallel random-access read 2x2 native Percentage Closer Filtering (PCF) 2x2 native Percentage Closer Filtering (PCF) Trilinear interpolated mipmapped PCF Trilinear interpolated mipmapped PCF Fast, parallel write Fast, parallel write Fast, parallel insert and erase Fast, parallel insert and erase Application

22 Aaron Lefohn University of California, Davis 22 ASM Address Translator Adaptive Page Table Adaptive Page Table Map multiple virtual pages to single physical page Map multiple virtual pages to single physical page Application Physical MemoryVirtual Domain ppa = pageTable(vpn).ppa()vpn = va / pageSize s = pageTable(vpn).s() off = (va * s) % pageSize pa = ppa + off Page Table

23 Aaron Lefohn University of California, Davis 23 ASM Data Structure Requirements Adaptive Adaptive Multiresolution Multiresolution Fast, parallel random-access read Fast, parallel random-access read 2x2 native Percentage Closer Filtering (PCF) 2x2 native Percentage Closer Filtering (PCF) Trilinear interpolated mipmapped PCF Trilinear interpolated mipmapped PCF Fast, parallel write Fast, parallel write Fast, parallel insert and erase Fast, parallel insert and erase Application

24 Aaron Lefohn University of California, Davis 24 ASM Address Translator Multiresolution Page Table Multiresolution Page Table Application Physical MemoryVirtual Domain Mipmap Page Table

25 Aaron Lefohn University of California, Davis 25 ASM Data Structure Requirements Adaptive Adaptive Multiresolution Multiresolution Fast, parallel random-access read Fast, parallel random-access read 2x2 native Percentage Closer Filtering (PCF) 2x2 native Percentage Closer Filtering (PCF) Trilinear interpolated mipmapped PCF Trilinear interpolated mipmapped PCF Fast, parallel write Fast, parallel write Fast, parallel insert and erase Fast, parallel insert and erase Application

26 Aaron Lefohn University of California, Davis 26 ASM Data Structure Requirements How support bilinear filtering? How support bilinear filtering? Duplicate 1 column and 1 row of texels in each page Duplicate 1 column and 1 row of texels in each page Mipmapped trilinear? Mipmapped trilinear? “By-hand” interpolation between mipmap levels “By-hand” interpolation between mipmap levels Application

27 Aaron Lefohn University of California, Davis 27 ASM Data Structure Requirements Adaptive Adaptive Multiresolution Multiresolution Fast, parallel random-access read Fast, parallel random-access read 2x2 native Percentage Closer Filtering (PCF) 2x2 native Percentage Closer Filtering (PCF) Trilinear interpolated mipmapped PCF Trilinear interpolated mipmapped PCF Fast, parallel write Fast, parallel write Fast, parallel insert and erase Fast, parallel insert and erase Application

28 Aaron Lefohn University of California, Davis 28 How Define ASM Structure in Glift? Start with generic page table AddrTrans Start with generic page table AddrTrans Use mipmapped PhysMem for page table Use mipmapped PhysMem for page table Change template parameter to add adaptivity Change template parameter to add adaptivity Write page allocator Write page allocator alloc_pages, free_pages alloc_pages, free_pages Finally… Finally… typedef PageTableAddrTrans PageTable; typedef PhysMemGPU PMem2D; typedef VirtMemGPU VPageTable; typedef AdaptiveMem ASM; Application

29 Aaron Lefohn University of California, Davis 29 ASM Data Structure Usage float4 main(uniform VMem2D asm, float3 shadowCoord, float4 litColor) : COLOR { float isInLight = asm.vTex2Ds( shadowCoord ); return lerp( black, litColor, isInLight ); } asm.bind_for_read( … ); asm.bind_for_write( … ); asm.alloc_pages( … ); asm.free_page( … ); … Application

30 Aaron Lefohn University of California, Davis 30 Adaptive Shadow Map Algorithm Faithful to Fernando et al. 2001 Faithful to Fernando et al. 2001 Refinement algorithm Refinement algorithm Identify shadow pixels w/ resolution mismatch (GPU) Identify shadow pixels w/ resolution mismatch (GPU) Compact pixels into small stream (GPU) Compact pixels into small stream (GPU) CPU reads back compacted stream (GPU  CPU) CPU reads back compacted stream (GPU  CPU) Allocate pages Allocate pages Draw new PTEs into mipmap page tables (CPU  GPU) Draw new PTEs into mipmap page tables (CPU  GPU) Draw depth into ASM for each new page (GPU) Draw depth into ASM for each new page (GPU) Application

31 Aaron Lefohn University of California, Davis 31 Stream Compaction Daniel Horn, GPU Gems II, ch. 36 Daniel Horn, GPU Gems II, ch. 36

32 Aaron Lefohn University of California, Davis 32 [Thanks to Yong Kil for the tree model] ASM: Effective resolution 131,072 2 (37 MB); SM: 2048 2

33 Aaron Lefohn University of California, Davis 33 “Octree” 3D Paint 3D version of ASM data structure 3D version of ASM data structure Differs from previous work: Differs from previous work: Quadrilinear filtering Quadrilinear filtering O(1), uniform access O(1), uniform access Interactive with effective resolutions between 64 3 and 2048 3 Interactive with effective resolutions between 64 3 and 2048 3 Application

34 Aaron Lefohn University of California, Davis 34 Adaptive PDE Solver Work in progress… Work in progress… Key feature is defining GPU iterators Key feature is defining GPU iterators Iterator Iterator Vertex buffer object of quads (one per page) Vertex buffer object of quads (one per page) Create iterators with RTVA Create iterators with RTVA

35 Aaron Lefohn University of California, Davis 35 Demo

36 Aaron Lefohn University of California, Davis 36 ASM Performance Results Fernando Results Fernando Results 5 fps (asynchronous, incremental refinement) 5 fps (asynchronous, incremental refinement) Fixed light Fixed light 31K polys, 512 2 image, 65K 2 - 524K 2 ASMs 31K polys, 512 2 image, 65K 2 - 524K 2 ASMs Our results Our results 15-20 fps while moving camera including refinement 15-20 fps while moving camera including refinement 7-12 fps while moving light 7-12 fps while moving light 45k polys, 512 2 image, 131K 2 ASM 45k polys, 512 2 image, 131K 2 ASM Lookup time compared to 2048 2 shadow map: Lookup time compared to 2048 2 shadow map: Bilinear filtered: 90% Bilinear filtered: 90% Trilinear filtered mipmapped: 73% Trilinear filtered mipmapped: 73%

37 Aaron Lefohn University of California, Davis 37 Page Table Memory Coherency 1- and 2-level page tables bandwidth bound below 8 x 8 page 1- and 2-level page tables bandwidth bound below 8 x 8 page RGBA8 textures, NVIDIA GeForce 6800 GT, NVIDIA driver 75.22, Cg 1.4a

38 Aaron Lefohn University of California, Davis 38 Data Structure Limitations Assume page-level coherency Assume page-level coherency Page table memory consumption Page table memory consumption Trade more levels of indirection for memory Trade more levels of indirection for memory Depth-limited tree Depth-limited tree

39 Aaron Lefohn University of California, Davis 39 Conclusions Dynamic adaptive multires data structure Dynamic adaptive multires data structure Coherent accesses if pages are larger than 8 x 8 Coherent accesses if pages are larger than 8 x 8 Decouple levels of indirection from levels of resolution Decouple levels of indirection from levels of resolution Page table literature Page table literature Continuum all the way from 1-level to full tree Continuum all the way from 1-level to full tree Based on assumption that accesses are coherent within page Based on assumption that accesses are coherent within page

40 Aaron Lefohn University of California, Davis 40 Conclusions Adaptive Shadow Maps Adaptive Shadow Maps Interactive adaptive refinement Interactive adaptive refinement Effective shadow map resolution up to 131,072 2 Effective shadow map resolution up to 131,072 2 Octree 3D paint Octree 3D paint Interactive GPU-based octree 3D painting Interactive GPU-based octree 3D painting Effective paint resolution up to 2048 3 Effective paint resolution up to 2048 3 Adaptive PDE solver Adaptive PDE solver Work in progress… Work in progress…

41 Aaron Lefohn University of California, Davis 41 Acknowledgements Craig Kolb, Nick Triantos NVIDIA Craig Kolb, Nick Triantos NVIDIA Fabio Pellacini Cornell/Pixar Fabio Pellacini Cornell/Pixar Adam Moerschell, Yong Kil UCDavis Adam Moerschell, Yong Kil UCDavis Serban Porumbescu, Chris Co, …. National Science Foundation Graduate Fellowship National Science Foundation Graduate Fellowship Department of Energy Department of Energy Pixar Animation Studios Pixar Animation Studios

42 Aaron Lefohn University of California, Davis 42 More Information ACM SIGGRAPH Sketches 2005 ACM SIGGRAPH Sketches 2005 “Dynamic Adaptive Shadow Maps” “Dynamic Adaptive Shadow Maps” “Octree Textures on Graphics Hardware” “Octree Textures on Graphics Hardware” “GPU Programming,” Thursday, 1:45pm “GPU Programming,” Thursday, 1:45pm Upcoming ACM Transactions on Graphics paper Upcoming ACM Transactions on Graphics paper “Glift : An Abstraction for Generic, Efficient GPU Data Structures” “Glift : An Abstraction for Generic, Efficient GPU Data Structures” Google “Lefohn GPU” Google “Lefohn GPU”


Download ppt "A Dynamic Adaptive Multi-resolution GPU Data Structure Adaptive Shadow Maps, Octree 3D Paint, Adaptive PDE Solver Aaron Lefohn University of California,"

Similar presentations


Ads by Google