02/22/ Manocha Interactive Modeling and Simulation using Graphics Processors Dinesh Manocha University of North Carolina at Chapel Hill
02/22/ Manocha UNC Collaborators Co-PI Ming C. Lin Research Staff Naga Govindaraju Dave Tuft Graduate Students Russ Gayle Brandon Lloyd Brian Salomon Avneesh Sud Sungeui Yoon Talha Zaman
02/22/ Manocha Collaborative Effort RDECOM Maria Bauer Angel Rodriguez SAIC Eric Root Marlo Verdesca Jaeson Munro
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Current Desktop System CPU (3 GHz) System Memory (2 GB) AGP Memory (512 MB) 6.4 GB/s bandwidth PCI-E Bus (4 GB/s) 35.2 GB/s bandwidth Video Memory (512 MB) GPU (500 MHz) Video Memory (512 MB) GPU (500 MHz) 2 x 1 MB Cache
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL GeForce 7800 – 302M Transistors
02/22/ Manocha CPU vs. GPU
02/22/ Manocha CPU vs. GPU (Henry Moreton: NVIDIA, Aug. 2005) PEE GTXGPU/CPU Graphics GFLOPs Shader GFLOPs Die Area (mm2) Die Area normalized Transistors (M) Power (W) GFLOPS/mm GFLOPS/tr GFLOPS/W
02/22/ Manocha This graph highlights the relative growth rate of GPUs vs. CPUs. GPUs have been growing at a rate faster than Moore’s law and this trend is expected to continue for at least 5 more years. Goal:Exploit GPUs for CGF Computations GPUs: Growing Faster than Moore’s Law
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Quad SLI: 1.3 Billion transistors Jan’2006
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL GPGP: General Purpose computation using GPUs Scientific applications Geometric computations Scientific visualization Physical simulation Robotics & navigation Database computation Financial applications Cryptography Modeling and simulation
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL vertex setup rasterizer pixel texture image per-pixel texture, fp16 blending Graphics Pipeline programmable vertex processing (fp32) programmable per- pixel math (fp32) polygon polygon setup, culling, rasterization Z-buf, fp16 blending, anti-alias (MRT) memory
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL data setup rasterizer data data fetch, fp16 blending NON-Graphics Pipeline programmable MIMD processing (fp32) programmable SIMD processing (fp32) lists SIMD “rasterization” predicated write, fp16 blend, multiple output memory Courtesy: David Kirk, Chief Scientist, NVIDIA
02/22/ Manocha Issues in using GPUs Programmability Precision Handling large data
02/22/ Manocha GPU-based Computations Accelerating OneSAF using GPUs Interactive collision detection Simulations Database and data streaming Sorting and scientific computations
02/22/ Manocha GPU-based Computations Accelerating OneSAF using GPUs Interactive collision detection Simulations Database and data streaming Sorting and scientific computations
Real-time Computational Challenges for Computer Generated Forces (CGF) Atmospheric transport models Vehicle dynamics Wide area sensors Petabyte Urban Terrain Databases
Real-time Terrain Reasoning for Computer Generated Forces Best algorithms are O(N 2 ) where N = objects/entities in the CGF database (e.g., sensors, platforms, buildings, people) Currently over 40% of CGF CPU time for battalion-level scenarios spent in: – Collision detection – Line of sight computation – Terrain placement Current system can barely handle 300 entities on a 300K polygon terrain models at 10m x 10m resolution Need times improvement to handle sub-meter resolution terrain model CPUs progressing at Moore’s law (1.7x per year) need more than 7-8 years to catch on
02/22/ Manocha Project Accomplishments GPU-based LOS algorithm x improvement in LOS query Integration into OneSAF: 15-20x simulation speed improvement (5000 entities)
02/22/ Manocha Project Accomplishments GPU-based LOS algorithm x improvement in LOS query Integration into OneSAF: 15-20x simulation speed improvement (5000 entities) Region-based visibility algorithms to accelerate LOS (Supported by ATO) 4-10x further improvement in LOS query Integrations into OneSAF: 10x simulation speed improvement in urban environments (3000 entities)
02/22/ Manocha Project Accomplishments GPU-based LOS algorithm x improvement in LOS query Integration into OneSAF: 15-20x simulation speed improvement (5000 entities) Region-based visibility algorithms to accelerate LOS 4-10x further improvement in LOS query Integrations into OneSAF: 10x simulation speed improvement in urban environments (3000 entities) GPU-based route planning 10-30X improvement in route computation 10x simulation speed improvement (3000 entities)
02/22/ Manocha Project Accomplishments GPU-based LOS algorithm x improvement in LOS query Integration into OneSAF: 15-20x simulation speed improvement (5000 entities) Region-based visibility algorithms to accelerate LOS: 4-10x further improvement in LOS query Integrations into OneSAF: 10x simulation speed improvement in urban environments (3000 entities) GPU-based route planning 10-30X improvement in route computation 10x simulation speed improvement (3000 entities) GPU-based collision detection 10x estimated improvement in collision query 10x simulation speed improvement (150 entities)
11/28/ Manocha LOS Integration Process OneSAF/GPU Requirements (SAIC/UNC) OneSAF/GPU Requirements (SAIC/UNC) OneSAF Technical Report (SAIC) OneSAF Technical Report (SAIC) GPU Algorithm Creation (UNC) GPU Algorithm Creation (UNC) Execute Unit Test (SAIC/UNC) Execute Unit Test (SAIC/UNC) OneSAF Scenario Creation (SAIC) OneSAF Scenario Creation (SAIC) OneSAF Benchmark Results (SAIC) OneSAF Benchmark Results (SAIC) Integration into OOS (SAIC) Add several OpenGL dll’s to ERC libraries Place c++ header files for OpenGL among the ERC code Create a new directory among the ERC code - Setup a new makefile/buildfile, to allow GPU to build as its own library Add calls to ERC Initialization to: - Gather all the triangles in the entire database - Gather all features in the database - Pass all triangles and features into the initialization for the GPU Replace all original LOS calls with the GPU counterpart Integration into OOS (SAIC) Add several OpenGL dll’s to ERC libraries Place c++ header files for OpenGL among the ERC code Create a new directory among the ERC code - Setup a new makefile/buildfile, to allow GPU to build as its own library Add calls to ERC Initialization to: - Gather all the triangles in the entire database - Gather all features in the database - Pass all triangles and features into the initialization for the GPU Replace all original LOS calls with the GPU counterpart
23 OneSAF with GPU-based LOS Algorithm: Demonstration LOS Computation on 5K Entities Route Planning on 5K Entities
24 OneSAF with GPU-based LOS Algorithm: Demonstration Average time for Standard LOS service call: 1-2 millisecond (w/o GPU-based algorithm) Average time for GPU LOS service call: 8-12 microseconds Almost 200X speedup for single LOS query 15-20x improvement in OneSAF simulation speed in JRTC terrain with 5000 entities
02/22/ Manocha Project Accomplishments Successful demonstration at DARPATech’2005; I/ITSEC’04; I/ITSEC’05
02/22/ Manocha GPU-based Computations Accelerating OneSAF using GPUs Interactive collision detection Simulations Database and data streaming Sorting and scientific computations
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Proximity Queries Geometric reasoning of spatial relationships among objects (in a dynamic environment) d Closest Points & Separation Distance d Penetration Depth Collision Detection Contact Points & Normals
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Collision Detection Systems I-COLLIDE (1995) RAPID (1996) V-COLLIDE (1997) H-COLLIDE (1998) PQP (1999) SWIFT (2000) PIVOT (2001) SWIFT++ (2001) DEEP (2002) CULLIDE (2003)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Distance Fields Voronoi diagram computation using GPUs Render polygonal mesh approximations of primitive distance fields Color bufferDepth buffer: Result after compositing distance fields using minimum depth test [Hoff, et al; SIGGRAPH 1999]
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Our Hybrid Approach Image-space proximity queries Coarse object-space geometric localization CPUGPU Balance load by varying localization coarseness and error bound [Hoff, Zaferakis, Lin & Manocha; I3D01]
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Gears Non-convex, rigid objects Frequent interlocking contacts Unconstrained, penalty- based
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Algorithm Object Level Pruning Sub-object Level Pruning Exact Tests GPU-based PCS computationUsing CPU [Govindaraju, et al; GH 2003]
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Reliable Collision Culling using GPUs
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Interactive Self-Collision Detection order of magnitude improvement
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Interactive Proximity Query Breaking objects & changing topologies
02/22/ Manocha GPU-based Computations Accelerating OneSAF using GPUs Interactive collision detection Simulations Database and data streaming Sorting and scientific computations
02/22/ Manocha Interactive Smoke Simulation using GPUs Interactive Fluid Simulation Demonstration 1 Demonstration 2 Demonstration 3
02/22/ Manocha Interactive Ice Simulation using GPUs Interactive Phase Field Method Simulation
02/22/ Manocha Interactive Fluid Simulation using GPUs Interactive Paint Mixing with a human in the loop Interactive Paint Mixing
02/22/ Manocha Interactive Lightning using GPUs Interactive Lightning Demonstration 1 Demonstration 2
02/22/ Manocha GPU-based Computations Accelerating OneSAF using GPUs Interactive collision detection Interactive simulations Interactive shadows Database and data streaming Sorting and scientific computations
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Shadows Shadows occur on surfaces seen by the eye, but not seen by the light Light Object Shadow Eye
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Shadow Generation Shadows improve depth perception Shadows provide additional information about an object’s shape Aesthetics – shadows are more visually interesting & realistic
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Interactive Shadow Generation using GPUs
02/22/ Manocha GPU-based Computations Accelerating OneSAF using GPUs Interactive collision detection Interactive simulations Interactive shadows Database and data streaming Sorting and scientific computations
02/22/ Manocha Databases: Predicate Evaluation CPU implementation — Intel compiler 7.1 with SSE optimizations (CPU + GPU) is ~20 times faster than only CPU SIGMOD 2004
02/22/ Manocha Comparison on Different GPUs: Super-Moore’s Law
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL GPUSort: 32-bit floating point inputs GPUSORT: slashdot.org & Tom’s Hardware guide (750 downloads in 6 weeks)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL LU-Decomposition with Partial Pivoting (32-bit inputs) IEEE/ACM SuperComputing 2005
02/22/ Manocha GPU-based Algorithms 1-2 Orders of magnitude improvement Performance gap would increase in the future OneSAF Scalability (using GPU clusters)
02/22/ Manocha Future Work Develop other GPU-based algorithms for OOS Other LOS computations: attenuation, handling smoke Force and atmospheric simulations Combine with multi-resolution representations Handle very large and complex terrains GPUs clusters for modeling and simulation Extension to multiple simulation environments, WARSIM, JMTK, GIG Use GPUs with various RDEC models