Computer-Generated Force Acceleration using GPUs: Next Steps

Slides:

Advertisements

Similar presentations

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.

Advertisements

Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.

1 InfiniBand HW Architecture InfiniBand Unified Fabric InfiniBand Architecture Router xCA Link Topology Switched Fabric (vs shared bus) 64K nodes per sub-net.

A many-core GPU architecture.. Price, performance, and evolution.

Some Thoughts on Technology and Strategies for Petaflops.

Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, Pat Hanrahan Stanford University DARPA Site Visit, UNC.

Adapted from: CULLIDE: Interactive Collision Detection Between Complex Models in Large Environments using Graphics Hardware Naga K. Govindaraju, Stephane.

02/22/ Manocha Interactive Modeling and Simulation using Graphics Processors Dinesh Manocha University of North Carolina at Chapel Hill

Technology to the Warfighter Quicker Stream Processing for Computer Generated Forces Kickoff Meeting Maria Bauer RDECOM-STTC.

11/28/ Manocha Interactive CGF Computations using COTS Graphics Processors Dinesh Manocha University of North Carolina at Chapel Hill

Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)

GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.

Simulating Quarks and Gluons with Quantum Chromodynamics February 10, CS635 Parallel Computer Architecture. Mahantesh Halappanavar.

Maximizing The Compute Power With Mellanox InfiniBand Connectivity Gilad Shainer Wolfram Technology Conference 2006.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

A High-Performance Scalable Graphics Architecture Daniel R. McLachlan Director, Advanced Graphics Engineering SGI.

Molecular Dynamics Simulations on a GPU in OpenCL Alex Cappiello.

Beowulf – Cluster Nodes & Networking Hardware Garrison Vaughan.

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.

3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.

NVIDIA® TESLA™ GPU Based Super Computer By : Adam Powell Student # For COSC 3P93.

Jun Doi IBM Research – Tokyo Early Performance Evaluation of Lattice QCD on POWER+GPU Cluster 17 July 2015.

General Purpose computing on Graphics Processing Units

Computer Graphics Graphics Hardware

Enhancements for Voltaire’s InfiniBand simulator

Unit Subtitle: Bus Structures Excerpted from

GPU Architecture and Its Application

Kai Li, Allen D. Malony, Sameer Shende, Robert Bell

LHCb and InfiniBand on FPGA

COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE

Topo Sort on Spark GraphX Lecturer: 苟毓川

Current Generation Hypervisor Type 1 Type 2.

GPU Computing Jan Just Keijser Nikhef Jamboree, Utrecht

What is GPU? how does it work?

Scalability of Intervisibility Testing using Clusters of GPUs

Informix Red Brick Warehouse 5.1

Constructing a system with multiple computers or processors

From Turing Machine to Global Illumination

BitWarp Energy Efficient Analytic Data Processing on Next Generation General Purpose GPUs Jason Power || Yinan Li || Mark D. Hill || Jignesh M. Patel.

Hybrid Ray Tracing of Massive Models

Clusters of Computational Accelerators

CS : Technology Trends August 31, 2015 Ion Stoica and Ali Ghodsi (

GP2: General Purpose Computation using Graphics Processors

Introduction to Spark.

GPU-Accelerated Route Planning for Computer Generated Forces

Pipeline parallelism and Multi–GPU Programming

Jianting Zhang City College of New York

NVIDIA Fermi Architecture

Compiler Back End Panel

Compiler Back End Panel

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

Chapter 1 Introduction.

Indiana University, Bloomington

Computer Graphics Graphics Hardware

Resource Allocation in a Middleware for Streaming Data

Chapter 4 Multiprocessors

RADEON™ 9700 Architecture and 3D Performance

Graphics Processing Unit

Multicore and GPU Programming

Types of Parallel Computers

NetFPGA - an open network development platform

Multicore and GPU Programming

Cluster Computers.

Presentation transcript:

Computer-Generated Force Acceleration using GPUs: Next Steps Dinesh Manocha UNC-Chapel Hill http://gamma.cs.unc.edu/

GPU Performance Rasterization performance will continue to grow at a rate faster than Moore’s law Improved precision and programmability Better GPU architecture: improved performance of texture cache, blending operations, occlusion queries etc. PCs with multiple GPUs

OOS: Performance issues Complex urban environments Large number of entities: 50K to 2M Environment complexity: 27GB database; 300K buildings Need 2 orders of magnitude improvement to handle LOS, collision handling and route planning Higher-resolution physical models Shadows in urban environments Acoustics in urban environments Simulation of building damage and collapsed structures Atmospheric and force model simulations GPUs can be used to accelerate all these computations

Next Goal: Larger Terrain Environments Current: GPU-based algorithms has limited capabilities to handle large terrain environments Propose: GPU functionality to be able to handle Databases with millions of buildings Distance between buildings in meters Buildings with unique full interiors and furniture

Line of Sight Computations Application to complex urban environments Region-based visibility for complex 3D environments Handle UHRBs Dynamic terrains Integrate into OneSAF

Path/Route Planning Route planning in urban environments Route planning in ultra-hi-resolution buildings Generalization to 3D routes

GPU Clusters Why multiple GPUs? scale computation rate: interactive performance scale problem size: complex terrain; high entity count retain price/performance advantages for complex applications

GPU cluster Target system Node System CPU: single or dual core AMD or P4 GPU: single or dual PCI-e 16x GPU cards host channel adapter (HA): PCI-e 8x single or dual port IB System 8-32 nodes with Infiniband switch GPU HA CPU PCI - e · IB Infiniband switch Narrative: We have not started deploying a GPU cluster yet because (a) components are rapidly improving and (b) no funding However we are tracking the technology carefully and plan to construct a system … (when) Here is our current plan for the GPU cluster. These are currently or soon to be commodity components. Possible things worth waiting for: new nVidia GPU (whatever is coming up) dual core CPUs (probably not the key driver of performance) 30 Gb/s Infiniband links (current 10 Gb/s or 20 Gb/s using dual port HCA)

GPU Cluster Architecture Commodity CPU/GPU cluster GPU IB CPU PCI - e · Infiniband Router

GPU – GPU communication: How? Cutting out the middleman Remote direct memory access (RDMA) MPI for GPU cluster built on GPU RDMA Infiniband Router IB IB IB IB IB IB all steps use cut-through routing. No GPU involvement on either end. GPU GPU GPU HA GPU GPU HA GPU GPU GPU HA PCI PCI PCI - - - e e e PCI PCI - - e e · · · · · · PCI PCI PCI - - - e e e CPU Mem CPU Mem CPU Mem

Programming clusters Cluster programming models C/C++/Fortran + MPI UPC + Co-Array Fortran Programming GPU clusters GPU programming language OpenGL, cg, Brook Mated to cluster programming model MPI, UPC Programmability many ways to lose performance very limited tool support

Parallel Algorithms: Issues Distribute the complex environment on multiple systems Perform LOS queries in parallel on different systems

Handling Complex Datasets Out of memory management Disk to main memory Main memory to GPU memory Interactive display Collision detection Path planning and navigation

GPU-based algorithms and computations can have fundamental impact in: Conclusions GPU-based algorithms and computations can have fundamental impact in: Simulations Computer generated forces Mission planning Databases and data streaming Scientific computation

Conclusions GPU-based algorithms and computations are having fundamental impact in: Simulations Computer generated forces Mission planning Databases and data streaming Scientific computation

The End