11/28/ Manocha Interactive CGF Computations using COTS Graphics Processors Dinesh Manocha University of North Carolina at Chapel Hill
11/28/ Manocha UNC Collaborators Co-PI Ming C. Lin Research Staff Naga Govindaraju Dave Tuft Graduate Students Russ Gayle Brandon Lloyd Brian Salomon Avneesh Sud Sungeui Yoon Talha Zaman
11/28/ Manocha Collaborative Effort RDECOM Maria Bauer Angel Rodriguez SAIC Eric Root Marlo Verdesca Jaeson Munro Stanford Pat Hanrahan Ian Buck
11/28/ Manocha Acknowledgements BCSEO DARPA RDECOM PEO STRI
Real-time Computational Challenges for Computer Generated Forces (CGF) Atmospheric transport models Vehicle dynamics Wide area sensors Petabyte Urban Terrain Databases
Real-time Terrain Reasoning for Computer Generated Forces Best algorithms are O(N 2 ) where N = objects/entities in the CGF database (e.g., sensors, platforms, buildings, people) Currently over 40% of CGF CPU time for battalion-level scenarios spent in: – Collision detection – Line of sight computation – Terrain placement Current system can barely handle 300 entities on a 300K polygon terrain models at 10m x 10m resolution Need times improvement to handle sub-meter resolution terrain model CPUs progressing at Moore’s law (1.7x per year) need more than 7-8 years to catch on
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Current Desktop System CPU (3 GHz) System Memory (2 GB) AGP Memory (512 MB) 6.4 GB/s bandwidth PCI-E Bus (4 GB/s) 35.2 GB/s bandwidth Video Memory (512 MB) GPU (500 MHz) Video Memory (512 MB) GPU (500 MHz) 2 x 1 MB Cache
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL GeForce 7800 – 302M Transistors
11/28/ Manocha CPU vs. GPU
11/28/ Manocha CPU vs. GPU (Henry Moreton: NVIDIA, Aug. 2005) PEE GTXGPU/CPU Graphics GFLOPs Shader GFLOPs Die Area (mm2) Die Area normalized Transistors (M) Power (W) GFLOPS/mm GFLOPS/tr GFLOPS/W
11/28/ Manocha This graph highlights the relative growth rate of GPUs vs. CPUs. GPUs have been growing at a rate faster than Moore’s law and this trend is expected to continue for at least 5 more years. Goal:Exploit GPUs for CGF Computations GPUs: Growing Faster than Moore’s Law
11/28/ Manocha Issues in using GPUs Programmability Precision Handling large data
11/28/ Manocha Project Accomplishments GPU-based LOS algorithm x improvement in LOS query Integration into OneSAF: 15-20x simulation speed improvement (5000 entities)
11/28/ Manocha Project Accomplishments GPU-based LOS algorithm x improvement in LOS query Integration into OneSAF: 15-20x simulation speed improvement (5000 entities) Region-based visibility algorithms to accelerate LOS (Supported by ATO) 4-10x further improvement in LOS query Integrations into OneSAF: 10x simulation speed improvement in urban environments (3000 entities)
11/28/ Manocha Project Accomplishments GPU-based LOS algorithm x improvement in LOS query Integration into OneSAF: 15-20x simulation speed improvement (5000 entities) Region-based visibility algorithms to accelerate LOS 4-10x further improvement in LOS query Integrations into OneSAF: 10x simulation speed improvement in urban environments (3000 entities) GPU-based route planning 10-30X improvement in route computation 10x simulation speed improvement (3000 entities)
11/28/ Manocha Project Accomplishments GPU-based LOS algorithm x improvement in LOS query Integration into OneSAF: 15-20x simulation speed improvement (5000 entities) Region-based visibility algorithms to accelerate LOS: 4-10x further improvement in LOS query Integrations into OneSAF: 10x simulation speed improvement in urban environments (3000 entities) GPU-based route planning 10-30X improvement in route computation 10x simulation speed improvement (3000 entities) GPU-based collision detection 10x estimated improvement in collision query 10x simulation speed improvement (150 entities)
11/28/ Manocha Project Accomplishments Successful demonstration at DARPATech’2005; I/ITSEC’04; I/ITSEC’05 (RDECOM Booth #2266) Other GPU-based algorithms & applications Database, data streaming, numerical computation, fluid dynamics, sorting, motion planning
11/28/ Manocha LOS Integration Process OneSAF/GPU Requirements (SAIC/UNC) OneSAF/GPU Requirements (SAIC/UNC) OneSAF Technical Report (SAIC) OneSAF Technical Report (SAIC) GPU Algorithm Creation (UNC) GPU Algorithm Creation (UNC) Execute Unit Test (SAIC/UNC) Execute Unit Test (SAIC/UNC) OneSAF Scenario Creation (SAIC) OneSAF Scenario Creation (SAIC) OneSAF Benchmark Results (SAIC) OneSAF Benchmark Results (SAIC) Integration into OOS (SAIC) Add several OpenGL dll’s to ERC libraries Place c++ header files for OpenGL among the ERC code Create a new directory among the ERC code - Setup a new makefile/buildfile, to allow GPU to build as its own library Add calls to ERC Initialization to: - Gather all the triangles in the entire database - Gather all features in the database - Pass all triangles and features into the initialization for the GPU Replace all original LOS calls with the GPU counterpart Integration into OOS (SAIC) Add several OpenGL dll’s to ERC libraries Place c++ header files for OpenGL among the ERC code Create a new directory among the ERC code - Setup a new makefile/buildfile, to allow GPU to build as its own library Add calls to ERC Initialization to: - Gather all the triangles in the entire database - Gather all features in the database - Pass all triangles and features into the initialization for the GPU Replace all original LOS calls with the GPU counterpart
19 OneSAF with GPU-based LOS Algorithm: Demonstration Average time for Standard LOS service call: 1-2 millisecond (w/o GPU-based algorithm) Average time for GPU LOS service call: 8-12 microseconds Almost 200X speedup for single LOS query 15-20x improvement in OneSAF simulation speed in JRTC terrain with 5000 entities
11/28/ Manocha Databases: Predicate Evaluation CPU implementation — Intel compiler 7.1 with SSE optimizations (CPU + GPU) is ~20 times faster than only CPU SIGMOD 2004
11/28/ Manocha Comparison on Different GPUs: Super-Moore’s Law
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL GPUSort: 32-bit floating point inputs GPUSORT: slashdot.org & Tom’s Hardware guide (750 downloads in 6 weeks)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL LU-Decomposition with Partial Pivoting (32-bit inputs) IEEE/ACM SuperComputing 2005
11/28/ Manocha Project Status Integration of GPU-based algorithms in OOS Line-of-sight Route planning Collision detection 35 publications in last 18 months 2 best paper awards (Pacific Graphics’04; IEEE VR’05) Paper presentations on GPU technology in OOS Poster presentation at Army Science Conference’04 Best paper in Research & Development Track at I/ITSEC’05 –Nominated for best overall paper award at I/ITSEC’05 Other applications: sorting, stream data mining, surgical simulation, physical simulation, computer animation, high-performance computing Other collaborators: NVIDIA, Intel, ATI, AGEIA, Disney
11/28/ Manocha Future Goals Develop novel GPU-based algorithms Other LOS computations: attenuation, handling smoke Force and atmospheric simulations Combine with multi-resolution representations Handle very large and complex terrains GPUs clusters for modeling and simulation Extension to multiple simulation environments, WARSIM, JMTK, GIG