11/28/2005 1 Manocha Interactive CGF Computations using COTS Graphics Processors Dinesh Manocha University of North Carolina at Chapel Hill

Slides:



Advertisements
Similar presentations
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL CULLIDE: Interactive Collision Detection Between Complex Models in Large Environments using Graphics Hardware.
Advertisements

Kayvon Fatahalian, Jeremy Sugerman, Pat Hanrahan
Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman Pat Hanrahan GCafe December 10th, 2003.
GPU Programming using BU Shared Computing Cluster
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
IMGD 4000: Computer Graphics in Games Emmanuel Agu.
Fast Circuit Simulation on Graphics Processing Units Kanupriya Gulati † John F. Croix ‡ Sunil P. Khatri † Rahm Shastry ‡ † Texas A&M University, College.
A many-core GPU architecture.. Price, performance, and evolution.
Slide 1 OneSAF Objective System (OOS) Overview Marlo Verdesca, Eric Root, Jaeson Munro
Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, Pat Hanrahan Stanford University DARPA Site Visit, UNC.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Cache-Oblivious Mesh Layouts Sung-Eui Yoon, Peter Lindstrom Valerio Pascucci, Dinesh Manocha 1: University.
Adapted from: CULLIDE: Interactive Collision Detection Between Complex Models in Large Environments using Graphics Hardware Naga K. Govindaraju, Stephane.
02/22/ Manocha Interactive Modeling and Simulation using Graphics Processors Dinesh Manocha University of North Carolina at Chapel Hill
Interactive Shadow Generation in Complex Environments Naga K. Govindaraju, Brandon Lloyd, Sung-Eui Yoon, Avneesh Sud, Dinesh Manocha Speaker: Alvin Date:
6/25/ MRM Computational Challenges for Modeling and Simulation Michael Macedonia Chief Technology Officer, US Army Program Executive Office for Simulation,
Acceleration on many-cores CPUs and GPUs Dinesh Manocha Lauri Savioja.
Status – Week 283 Victor Moya. 3D Graphics Pipeline Akeley & Hanrahan course. Akeley & Hanrahan course. Fixed vs Programmable. Fixed vs Programmable.
Technology to the Warfighter Quicker Stream Processing for Computer Generated Forces Kickoff Meeting Maria Bauer RDECOM-STTC.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
1 1 © 2011 The MathWorks, Inc. Accelerating Bit Error Rate Simulation in MATLAB using Graphics Processors James Lebak Brian Fanous Nick Moore High-Performance.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Interactive Visualization of Volumetric Data on Consumer PC Hardware: Introduction Daniel Weiskopf Graphics Hardware Trends Faster development than Moore’s.
CSE 690 General-Purpose Computation on Graphics Hardware (GPGPU) Courtesy David Luebke, University of Virginia.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
David Luebke 1 9/4/2015 Real-Time Rendering & Game Technology CS 446/651 David Luebke.
GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
Slide 1 / 16 On Using Graphics Hardware for Scientific Computing ________________________________________________ Stan Tomov June 23, 2006.
Database and Stream Mining using GPUs Naga K. Govindaraju UNC Chapel Hill.
Codeplay CEO © Copyright 2012 Codeplay Software Ltd 45 York Place Edinburgh EH1 3HP United Kingdom Visit us at The unique challenges of.
Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of Stromal Development Olcay Sertel 1,2, Antonio Ruiz 3, Umit.
Computer Graphics Graphics Hardware
REAL-TIME NAVIGATION OF INDEPENDENT AGENTS USING ADAPTIVE ROADMAPS Avneesh Sud 1, Russell Gayle 2, Erik Andersen 2, Stephen Guy 2, Ming Lin 2, Dinesh Manocha.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Quick-CULLIDE: Efficient Inter- and Intra- Object Collision Culling using Graphics Hardware Naga K. Govindaraju, Ming C. Lin, Dinesh Manocha University.
 Copyright, HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919) P.O. Box 569, Chapel Hill, NC.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL & MICROSOFT RESEARCH GPUTeraSort: High Performance Graphics Co-processor Sorting for Large Data Management.
GPU Architecture and Programming
CSE 690: GPGPU Lecture 7: Matrix Multiplications Klaus Mueller Computer Science, Stony Brook University.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
Collision and Proximity Queries Dinesh Manocha Department of Computer Science University of North Carolina
1 Latest Generations of Multi Core Processors
Jeremy Meredith Lawrence Livermore National Laboratory UCRL-PRES This work was performed under the auspices of the U.S. Department of Energy by.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
By Dirk Hekhuis Advisors Dr. Greg Wolffe Dr. Christian Trefftz.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
Efficient Partitioning of Fragment Shaders for Multiple-Output Hardware Tim Foley Mike Houston Pat Hanrahan Computer Graphics Lab Stanford University.
CULLIDE: Interactive Collision Detection Between Complex Models in Large Environments using Graphics Hardware Presented by Marcus Parker By Naga K. Govindaraju,
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
Graphic Processing Units Presentation by John Manning.
Computer Graphics Graphics Hardware
Scalability of Intervisibility Testing using Clusters of GPUs
GP2: General Purpose Computation using Graphics Processors
Computer-Generated Force Acceleration using GPUs: Next Steps
GPU-Accelerated Route Planning for Computer Generated Forces
NVIDIA Fermi Architecture
Computer Graphics Graphics Hardware
Ray Tracing on Programmable Graphics Hardware
Graphics Processing Unit
Presentation transcript:

11/28/ Manocha Interactive CGF Computations using COTS Graphics Processors Dinesh Manocha University of North Carolina at Chapel Hill

11/28/ Manocha UNC Collaborators Co-PI  Ming C. Lin Research Staff  Naga Govindaraju  Dave Tuft Graduate Students  Russ Gayle  Brandon Lloyd  Brian Salomon  Avneesh Sud  Sungeui Yoon  Talha Zaman

11/28/ Manocha Collaborative Effort RDECOM  Maria Bauer  Angel Rodriguez SAIC  Eric Root  Marlo Verdesca  Jaeson Munro Stanford  Pat Hanrahan  Ian Buck

11/28/ Manocha Acknowledgements BCSEO DARPA RDECOM PEO STRI

Real-time Computational Challenges for Computer Generated Forces (CGF) Atmospheric transport models Vehicle dynamics Wide area sensors Petabyte Urban Terrain Databases

Real-time Terrain Reasoning for Computer Generated Forces Best algorithms are O(N 2 ) where N = objects/entities in the CGF database (e.g., sensors, platforms, buildings, people) Currently over 40% of CGF CPU time for battalion-level scenarios spent in: – Collision detection – Line of sight computation – Terrain placement Current system can barely handle 300 entities on a 300K polygon terrain models at 10m x 10m resolution Need times improvement to handle sub-meter resolution terrain model CPUs progressing at Moore’s law (1.7x per year)  need more than 7-8 years to catch on

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Current Desktop System CPU (3 GHz) System Memory (2 GB) AGP Memory (512 MB) 6.4 GB/s bandwidth PCI-E Bus (4 GB/s) 35.2 GB/s bandwidth Video Memory (512 MB) GPU (500 MHz) Video Memory (512 MB) GPU (500 MHz) 2 x 1 MB Cache

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL GeForce 7800 – 302M Transistors

11/28/ Manocha CPU vs. GPU

11/28/ Manocha CPU vs. GPU (Henry Moreton: NVIDIA, Aug. 2005) PEE GTXGPU/CPU Graphics GFLOPs Shader GFLOPs Die Area (mm2) Die Area normalized Transistors (M) Power (W) GFLOPS/mm GFLOPS/tr GFLOPS/W

11/28/ Manocha This graph highlights the relative growth rate of GPUs vs. CPUs. GPUs have been growing at a rate faster than Moore’s law and this trend is expected to continue for at least 5 more years. Goal:Exploit GPUs for CGF Computations GPUs: Growing Faster than Moore’s Law

11/28/ Manocha Issues in using GPUs Programmability Precision Handling large data

11/28/ Manocha Project Accomplishments GPU-based LOS algorithm  x improvement in LOS query  Integration into OneSAF: 15-20x simulation speed improvement (5000 entities)

11/28/ Manocha Project Accomplishments GPU-based LOS algorithm  x improvement in LOS query  Integration into OneSAF: 15-20x simulation speed improvement (5000 entities) Region-based visibility algorithms to accelerate LOS (Supported by ATO)  4-10x further improvement in LOS query  Integrations into OneSAF: 10x simulation speed improvement in urban environments (3000 entities)

11/28/ Manocha Project Accomplishments GPU-based LOS algorithm  x improvement in LOS query  Integration into OneSAF: 15-20x simulation speed improvement (5000 entities) Region-based visibility algorithms to accelerate LOS  4-10x further improvement in LOS query  Integrations into OneSAF: 10x simulation speed improvement in urban environments (3000 entities) GPU-based route planning  10-30X improvement in route computation  10x simulation speed improvement (3000 entities)

11/28/ Manocha Project Accomplishments GPU-based LOS algorithm  x improvement in LOS query  Integration into OneSAF: 15-20x simulation speed improvement (5000 entities) Region-based visibility algorithms to accelerate LOS:  4-10x further improvement in LOS query  Integrations into OneSAF: 10x simulation speed improvement in urban environments (3000 entities) GPU-based route planning  10-30X improvement in route computation  10x simulation speed improvement (3000 entities) GPU-based collision detection  10x estimated improvement in collision query  10x simulation speed improvement (150 entities)

11/28/ Manocha Project Accomplishments Successful demonstration at DARPATech’2005; I/ITSEC’04; I/ITSEC’05 (RDECOM Booth #2266) Other GPU-based algorithms & applications  Database, data streaming, numerical computation, fluid dynamics, sorting, motion planning

11/28/ Manocha LOS Integration Process OneSAF/GPU Requirements (SAIC/UNC) OneSAF/GPU Requirements (SAIC/UNC) OneSAF Technical Report (SAIC) OneSAF Technical Report (SAIC) GPU Algorithm Creation (UNC) GPU Algorithm Creation (UNC) Execute Unit Test (SAIC/UNC) Execute Unit Test (SAIC/UNC) OneSAF Scenario Creation (SAIC) OneSAF Scenario Creation (SAIC) OneSAF Benchmark Results (SAIC) OneSAF Benchmark Results (SAIC) Integration into OOS (SAIC) Add several OpenGL dll’s to ERC libraries Place c++ header files for OpenGL among the ERC code Create a new directory among the ERC code - Setup a new makefile/buildfile, to allow GPU to build as its own library Add calls to ERC Initialization to: - Gather all the triangles in the entire database - Gather all features in the database - Pass all triangles and features into the initialization for the GPU Replace all original LOS calls with the GPU counterpart Integration into OOS (SAIC) Add several OpenGL dll’s to ERC libraries Place c++ header files for OpenGL among the ERC code Create a new directory among the ERC code - Setup a new makefile/buildfile, to allow GPU to build as its own library Add calls to ERC Initialization to: - Gather all the triangles in the entire database - Gather all features in the database - Pass all triangles and features into the initialization for the GPU Replace all original LOS calls with the GPU counterpart

19 OneSAF with GPU-based LOS Algorithm: Demonstration Average time for Standard LOS service call: 1-2 millisecond (w/o GPU-based algorithm) Average time for GPU LOS service call: 8-12 microseconds Almost 200X speedup for single LOS query 15-20x improvement in OneSAF simulation speed in JRTC terrain with 5000 entities

11/28/ Manocha Databases: Predicate Evaluation CPU implementation — Intel compiler 7.1 with SSE optimizations (CPU + GPU) is ~20 times faster than only CPU SIGMOD 2004

11/28/ Manocha Comparison on Different GPUs: Super-Moore’s Law

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL GPUSort: 32-bit floating point inputs GPUSORT: slashdot.org & Tom’s Hardware guide (750 downloads in 6 weeks)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL LU-Decomposition with Partial Pivoting (32-bit inputs) IEEE/ACM SuperComputing 2005

11/28/ Manocha Project Status Integration of GPU-based algorithms in OOS  Line-of-sight  Route planning  Collision detection 35 publications in last 18 months  2 best paper awards (Pacific Graphics’04; IEEE VR’05)  Paper presentations on GPU technology in OOS  Poster presentation at Army Science Conference’04  Best paper in Research & Development Track at I/ITSEC’05 –Nominated for best overall paper award at I/ITSEC’05 Other applications: sorting, stream data mining, surgical simulation, physical simulation, computer animation, high-performance computing Other collaborators: NVIDIA, Intel, ATI, AGEIA, Disney

11/28/ Manocha Future Goals Develop novel GPU-based algorithms  Other LOS computations: attenuation, handling smoke  Force and atmospheric simulations  Combine with multi-resolution representations Handle very large and complex terrains GPUs clusters for modeling and simulation Extension to multiple simulation environments, WARSIM, JMTK, GIG