Dealing with Computational Load in Multi-user Scalable City with OpenCL Assets and Dynamics Computation for Virtual Worlds.

Slides:



Advertisements
Similar presentations
Time averages and ensemble averages
Advertisements

A Real Time Radiosity Architecture for Video Games
Integrating 3D Geodata in Service-Based Visualization Systems Jan Klimke, Dieter Hildebrandt, Benjamin Hagedorn, and Jürgen Döllner Computer Graphics Systems.
Multi-user Extensible Virtual Worlds Increasing complexity of objects and interactions with increasing world size, users, numbers of objects and types.
Combining Incremental and Parallel Methods for Large- scale Physics Simulation OpenCL Physics 1 Sheldon Brown, Site Director Daniel Tracy, Programmer Analyst.
Hadi Goudarzi and Massoud Pedram
Advanced Character Physics
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
LINCS: A Linear Constraint Solver for Molecular Simulations Berk Hess, Hemk Bekker, Herman J.C.Berendsen, Johannes G.E.M.Fraaije Journal of Computational.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
Morphing and Animation GPU Graphics Gary J. Katz University of Pennsylvania CIS 665 Adapted from articles taken from ShaderX 3, 4 and 5 And GPU Gems 1.
Particle Systems GPU Graphics. Sample Particle System Fire and SmokeWater.
School of Computer Science and Software Engineering A Networked Virtual Environment Communications Model using Priority Updating Monash University Yang-Wai.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
Level 2 Mobile and Games Programming Modules Cathy French K233.
Assets and Dynamics Computation for Virtual Worlds.
Big Kernel: High Performance CPU-GPU Communication Pipelining for Big Data style Applications Sajitha Naduvil-Vadukootu CSC 8530 (Parallel Algorithms)
Computer-Based Animation. ● To animate something – to bring it to life ● Animation covers all changes that have visual effects – Positon (motion dynamic)
Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)
AGD: 5. Game Arch.1 Objective o to discuss some of the main game architecture elements, rendering, and the game loop Animation and Games Development.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Sort-Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays Paper by Kenneth Moreland, Brian Wylie, and Constantine Pavlakos Presented.
Proposed Work 1. Client-Server Synchronization Proposed Work 2.
Havok. ©Copyright 2006 Havok.com (or its licensors). All Rights Reserved. HavokFX Next Gen Physics on ATI GPUs Andrew Bowell – Senior Engineer Peter Kipfer.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
09/09/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Event management Lag Group assignment has happened, like it or not.
Combining Incremental and Parallel Methods for Large-scale Physics Simulation 1 Daniel Tracy, UCSD CHMPR Software Engineer Erik Hill, UCSD CHMPR Software.
CS 450: COMPUTER GRAPHICS REVIEW: INTRODUCTION TO COMPUTER GRAPHICS – PART 2 SPRING 2015 DR. MICHAEL J. REALE.
Gregory Fotiades.  Global illumination techniques are highly desirable for realistic interaction due to their high level of accuracy and photorealism.
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
GPU-Accelerated Surface Denoising and Morphing with LBM Scheme Ye Zhao Kent State University, Ohio.
Robin McDougall Scott Nokleby Mechatronic and Robotic Systems Laboratory 1.
Evolving Virtual Creatures & Evolving 3D Morphology and Behavior by Competition Papers by Karl Sims Presented by Sarah Waziruddin.
COMPUTER GRAPHICS CSCI 375. What do I need to know?  Familiarity with  Trigonometry  Analytic geometry  Linear algebra  Data structures  OOP.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Review on Graphics Basics. Outline Polygon rendering pipeline Affine transformations Projective transformations Lighting and shading From vertices to.
Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
MSIM 842 VISUALIZATION II INSTRUCTOR: JESSICA R. CROUCH 1 A Particle System for Interactive Visualization of 3D Flows Jens Krüger Peter Kipfer.
Smoothed Particle Hydrodynamics Matthew Zhu CSCI 5551 — Fall 2015.
Advanced Games Development Game Physics CO2301 Games Development 1 Week 19.
Mobile Sensor Network Deployment Using Potential Fields: A Distributed, Scalable Solution to the Area Coverage Problem Andrew Howard, Maja J Matari´c,
CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Postmortem: Deferred Shading in Tabula Rasa Rusty Koonce NCsoft September 15, 2008.
COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
GPU Acceleration of Particle-In-Cell Methods B. M. Cowan, J. R. Cary, S. W. Sides Tech-X Corporation.
GPU Architecture and Its Application
Productive Performance Tools for Heterogeneous Parallel Computing
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
- Introduction - Graphics Pipeline
Parallel Programming By J. H. Wang May 2, 2017.
Source Multicore 1 November 2006.
Scalability of Intervisibility Testing using Clusters of GPUs
Graphics Processing Unit
The Graphics Rendering Pipeline
CS451Real-time Rendering Pipeline
Multidisciplinary Optimization
Presentation transcript:

Dealing with Computational Load in Multi-user Scalable City with OpenCL Assets and Dynamics Computation for Virtual Worlds

Focus on OpenCL Leveraging OpenCL allows targeting and testing various parallel compute resources for each workload: Core i7, Cell, Tesla, GPU Using a combination of compute accelerators allows us to alter configuration and mappings between software and hardware more easily. Vendors continue to optimize OpenCL drivers while we optimize our use of OpenCL: free development!

Multi-User Load Challenges Communications Graphics Rendering – Geometry Processing – Shaders – Rendering Techniques Dynamics Computation – Physics – AI or other application specific behaviors – Animation

Particle Systems Each player is represented by a cloud of particles in the shape of a cyclone. Particle systems are client-side only: no effect upon environment, no need for synchronizing. With many players per city, each player processes many times more particles. Causing scalability and performance problems on the client with multiple players visible.

Particle Systems: Now CPU computes new positions & texture coordinates New info must be sent to the card each frame. More CPU & bandwidth used as players increase.

Particle Systems: Solution Utilize OpenCL to compute particle updates on the GPU. Particle system is an ideal GPU workload. Use OpenGL/DirectX interoperability to keep all data on the card: primary reason for GPU supremacy for this task! Expect an order of magnitude or more performance improvement on this system.

Server Physics

Effect of Multi-User On Physics Multi-user Scalable City will: – Scale total interactivity occurring at once – Shift focus more towards parallel computation – Impose greater demands on the state of the art in parallel computation – Consequently expand upon the state of the art in parallel computation – Utilize both incremental and parallel methods Reduce work as much as possible Parallelize all work that must be done

Utilizing Parallel Hardware Next step: offloading physics from z10 – Xeon blades running Scalable Engine – Cell BE blades running Bullet for Cell – Distribute heavy computational stages Collision Detection on broad phase pair output Constraint solving/Integration on contact groups Then: OpenCL plan – Develop physics system using algorithms well- suited to OpenCL parallelization

Server Physics: Parallelization Physics is difficult to parallelize well: – Each stage has vastly different properties If stages map to different devices, large data buffer transmissions must synchronize compute accelerators. – Computationally heavy stages Collision detection is coarse-grained Independent contact groups can be large, unbalanced Constraint solving difficult to break down – Traditional systems that solve all constraints simultaneously (e.g. using Gauss-Seidel) parallelize in limited ways.

Server Physics: Goals Produce a new physics processing pipeline based as much as possible on OpenCL Minimize buffer transmission to/from OpenCL devices by keeping most all stages in OpenCL Specialize physics algorithms to highly parallel hardware Scale to as much activity as possible in real time as we scale hardware resources

Server Physics: Approach Present efforts for OpenCL physics focus upon traditional algorithms and pipeline – These methods have “won out” in single-core era – Physics programmers most familiar with these methods and their trade-offs – Ex: AMD-funded OpenCL port of Bullet We choose techniques better suited to massively parallel computation! – Greater potential, more exploration to perform

“Advanced Character Physics” – Jakobsen GDC 2001 History – Developed for speed and simplicity – Not yet implemented in a parallel system Features – All physics operates on solely on particles – There is no large, global set of constraints to solve Ever

Jakobsen: Key Features Objects represented as a set of particles and stick constraints rather than geometric shapes All constraints solved individually, w/o reference to other constraints Collision response by simple projection Velocity-less Verlet integration method (often used in molecular dynamics)

Jakobsen: Key Features Objects represented as a set of particles and stick constraints rather than geometric shapes All constraints solved individually, w/o reference to other constraints Collision response by simple projection Velocity-less Verlet integration method (often used in molecular dynamics)

Rigid Bodies from Particles Cube as set of particles and stick constraints – Corners are particles – Stick constraints placed for edges – Stick constraints placed to prevent collapse A stick constraint requires that the distance between two points be a constant value

Jakobsen: Key Features Objects represented as a set of particles and stick constraints rather than geometric shapes All constraints solved individually, w/o reference to other constraints Collision response by simple projection Velocity-less Verlet integration method (often used in molecular dynamics)

Constraint Solving Normally, all constraints upon a body are solved simultaneously – Solution to large set of equations and unknowns – Produces a transform that does not violate any constraints Jakobsen method solves each constraint individually – Solving one constraint violates another – Iteratively solving constraints approaches solution – Fewer iterations can be used when warranted

Jakobsen: Key Features Objects represented as a set of particles and stick constraints rather than geometric shapes All constraints solved individually, w/o reference to other constraints Collision response by simple projection Velocity-less Verlet integration method (often used in molecular dynamics)

Jakobsen: Key Features Verlet Integration – Replace use of ‘velocity’ with ‘previous position’ Projection – Simply move particle out of collision with face! Combination of features results in very stable simulation – Velocity & acceleration never get out of control Retaining OpenCL buffers on device – Last cycle’s result array binds to ‘previous position’

Jakobsen: The Good Physics processing tends toward large number of simpler, evenly divided computations! – Collision detection and constraints operate on particles – All constraints are solved independently Can reduce buffer transfers to half! – No contact graph generation stage – Allows parallel computation across collision detection, constraint generation, constraint solving, integration Contact GraphColl. Det.IntegrationColl. Det.Integration

Jakobsen: The Bad Constraints can be violated temporarily – Solving one constraint can re-violate another – ~10 constraint solving passes produces rigid body-like behavior: fewer passes produce cloth->plant behavior Body transforms must be computed based upon particle configuration in post-processing for display purposes Additional behaviors require re-engineering (friction model, bouncing, inverse kinematics, etc) in light of Verlet & particle scheme (see paper)

Progress Work is just beginning OpenCL particle system with Verlet integration Complex forces, texture animation for cyclones Particle/height-map collision detection Collision response by projection Rigid bodies as set of particles + stick constraints Multi-pass relaxation solver

Progress

Further Out Distribute responsibility for dynamics among qualified clients – Increases world asynchronicity – Develop operational semantics to indicate synchronous status – Server gathers, correlates, and distributes updates to world

Conclusion OpenCL at front and center of efforts to alleviate performance problems on both server and client in our multi-user systems. Focus is on algorithms to – Reduce buffer communication to/from OpenCL – Increase parallelism by breaking up problems into smaller pieces