Source Multicore 1 November 2006.

Slides:



Advertisements
Similar presentations
© 2007 Eaton Corporation. All rights reserved. LabVIEW State Machine Architectures Presented By Scott Sirrine Eaton Corporation.
Advertisements

Multi-user Extensible Virtual Worlds Increasing complexity of objects and interactions with increasing world size, users, numbers of objects and types.
Part IV: Memory Management
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Thoughts on Shared Caches Jeff Odom University of Maryland.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Reference: Message Passing Fundamentals.
Dealing with Computational Load in Multi-user Scalable City with OpenCL Assets and Dynamics Computation for Virtual Worlds.
Tools for Investigating Graphics System Performance
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CS-334: Computer.
Operating System Support Focus on Architecture
Assets and Dynamics Computation for Virtual Worlds.
Computer Organization and Architecture
Threading Games for Performance – Architecture – Case Studies.
CS364 CH08 Operating System Support TECH Computer Science Operating System Overview Scheduling Memory Management Pentium II and PowerPC Memory Management.
Layers and Views of a Computer System Operating System Services Program creation Program execution Access to I/O devices Controlled access to files System.
Parallel Game Engine Design or How I Learned to Stop Worrying and Love Multithreading.
Chapter 18 Multicore Computers
Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali.
This module was created with support form NSF under grant # DUE Module developed by Martin Burtscher Module B1 and B2: Parallelization.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Havok. ©Copyright 2006 Havok.com (or its licensors). All Rights Reserved. HavokFX Next Gen Physics on ATI GPUs Andrew Bowell – Senior Engineer Peter Kipfer.
Object Oriented Analysis & Design SDL Threads. Contents 2  Processes  Thread Concepts  Creating threads  Critical sections  Synchronizing threads.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Chapter 5 Operating System Support. Outline Operating system - Objective and function - types of OS Scheduling - Long term scheduling - Medium term scheduling.
Multi-core architectures. Single-core computer Single-core CPU chip.
Saints Row Scheduler  Randall Turner, Volition, Inc.   Discussion Areas:  Saints Row project  Saint’s Row scheduler.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Multi-Core Architectures
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
Group 3: Architectural Design for Enhancing Programmability Dean Tullsen, Josep Torrellas, Luis Ceze, Mark Hill, Onur Mutlu, Sampath Kannan, Sarita Adve,
2.1. T HE G AME L OOP Central game update and render processes.
Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.
Parallel Computing Presented by Justin Reschke
Background Computer System Architectures Computer System Software.
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
Static Translation of Stream Program to a Parallel System S. M. Farhad The University of Sydney.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
Fiber Based Job Systems Seth England. Preemptive Scheduling Competition for resources Use of synchronization primitives to prevent race conditions in.
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
Parallelism in AI Multithreading Strategies and Opportunities for Multi-core Architectures.
Introduction to threads
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Productive Performance Tools for Heterogeneous Parallel Computing
Multi-threading the Oxman Game Engine Sean Oxley CS 523, Fall 2012
Conception of parallel algorithms
William Stallings Computer Organization and Architecture
The University of Adelaide, School of Computer Science
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
Real-Time Ray Tracing Stefan Popov.
Lecture 5: GPU Compute Architecture
Chapter 4 Multithreading programming
Lecture 5: GPU Compute Architecture for the last time
Hybrid Programming with OpenMP and MPI
Multithreaded Programming
Jinquan Dai, Long Li, Bo Huang Intel China Software Center
The Challenge of Cross - Language Interoperability
Lecture 2 The Art of Concurrency
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Source Multicore 1 November 2006

New Source Features Dynamic Shadow Mapping New Foliage System Cinematic Physics Version 2 of Facial Animation System Dynamic Scripted Sequences Particle System 64-Bit Source Engine Reworked Character Lighting Model Companion AI (Alyx in Episode One) Real-Time statistics gathering Automatic game cache defragmentation Multicore

Multiple Approches to multi-core games New Features in Source Multiple Approches to multi-core games Source is implementing Hybrid-Threading Close to linear improvements Pretty but dumb era is ending, scalability we've seen in graphics will now apply to rest of the game Hybrid Threading Source will ship before Episode Two You'll get a version of this engine to take home to test with

Most significant development since 3D cards Multicore Most significant development since 3D cards Huge potential Huge challenge

Most significant development since 3D cards Multicore Most significant development since 3D cards Huge potential Huge challenge The decisions faced with multiple cores How we are using multiple cores

Most significant development since 3D cards Multicore Most significant development since 3D cards Huge potential Huge challenge The decisions faced with multiple cores How we are using multiple cores Four cores is more than twice as interesting as two cores

Challenges

Games always want 100% CPU utilization Challenges Games always want 100% CPU utilization

Games always want 100% CPU utilization Games are inherently serial Challenges Games always want 100% CPU utilization Games are inherently serial

Games always want 100% CPU utilization Games are inherently serial Challenges Games always want 100% CPU utilization Games are inherently serial Decades of experience in single threaded optimization

Games always want 100% CPU utilization Games are inherently serial Challenges Games always want 100% CPU utilization Games are inherently serial Decades of experience in single threaded optimization Millions of lines of code written for single threading

Strategies Threading model Threading framework Application of cores

Fine grained threading Hybrid threading Theading Models Single threading Coarse threading Fine grained threading Hybrid threading

Single threading Easy Obsolete

Put whole systems on cores Pretty easy, “multiple single threads” Coarse threading Put whole systems on cores Pretty easy, “multiple single threads” Stay partially serialized, or double buffer

Coarse threading: Early experimentation Client User input Rendering Graphics simulation Server AI Physics Game logic

Coarse threading: Early experimentation Experiment: run client and server each on own core

Coarse threading: Early experimentation Experiment: run client and server each on own core Benefits: forced to confront systems that are not thread safe or not thread efficient

Coarse threading: Early experimentation Experiment: run client and server each on own core Benefits: forced to confront systems that are not thread safe or not thread efficient Outcome: Can approach 2x in contrived maps

Coarse threading: Early experimentation

Coarse threading: Early experimentation

Coarse threading: Early experimentation Experiment: run client and server each on own core Benefits: forced to confront systems that are not thread safe or not thread efficient Outcome: Can approach 2x in contrived maps More like 1.2x in real single player Added latency to single player game

Coarse threading: Early experimentation Experiment: run client and server each on own core Benefits: forced to confront systems that are not thread safe or not thread efficient Outcome: Can approach 2x in contrived maps More like 1.2x in real single player Added latency to single player game Opened door to improved listen servers

Put whole systems on cores Pretty easy, “multiple single threads” Coarse threading Put whole systems on cores Pretty easy, “multiple single threads” Stay partially serialized, or double buffer Scales poorly Partially idle cores Synchronization, or lag Entirely idle cores

Fine grained threading Divide many small identical tasks across cores E.g., take a loop that updates state of 1000 objects and perform 1000/N on each core for N cores Moderate difficulty Scales well… Tricky if cost of each unit is variable Memory bandwidth Limited problem domains

Fine grained threading Leverage multicore in production tools: VMPI

Fine grained threading VVIS – Visibility calculations VRAD – Lighting calculations

Fine grained threading

Performance tuned for mid-level work sharing Hybrid threading Performance tuned for mid-level work sharing Not splitting sets of very small operations over cores Not putting whole systems onto cores

Use the appropriate tool for the job Hybrid threading Use the appropriate tool for the job Some systems on cores (e.g. sound) Some systems split internally similar to coarse Split expensive iterations across cores fine grained Queue some work to run when a core goes idle Most difficult Scales well Maximum core utilization

Hybrid threading: Rendering Rough pipeline Build world lists Build object lists Graphical simulation (particles, ropes, sprites) Update animations Compute shadows Draw Once for every “view” Player’s POV TV monitors Water reflections Many times CPU bound

Hybrid threading: Rendering Revised pipeline Construct scene rendering lists for multiple scenes in parallel (e.g., the world and its reflection in water) Overlap graphics simulation Compute character bone transformations for all characters in all scenes in parallel Compute shadows for all characters Allow multiple threads to draw in parallel Serialize drawing operations on another core

Implementing Hybrid Threading Operating system: pools, synchronization Threading Tools Implementing Hybrid Threading Operating system: pools, synchronization Compiler extensions: OpenMP, fine threading Tailored tools Programmers solve game development problems, not threading problems

Operating system Too low level Prone to error Lots of stalling Unpredictable scheduling Unpredictable cost

Focused on fine threading Lack of control Implementation interferes Compiler extensions OpenMP Focused on fine threading Lack of control Implementation interferes

Tailored tools: Game Threading Infrastructure Custom work management system Aimed at gaming problems, intuitive for game programmers Focus on keeping cores busy Thread pool: N-1 threads for N cores Support hybrid threading Function threading Array parallelism Multiple work modes Opportunistic core utilization Queued core utilization

Tailored tools: Game Threading Infrastructure The simple thing is the worst thing “Lock-free” algorithms Never leave cores idle waiting on other cores Leverages atomic write primitives of the CPU Under the hood of all services and data structures See: http://en.wikipedia.org/wiki/Lock-free_and_wait- free_algorithms Example: the spatial partition

Application of cores

Dual core CPUs: framerate Application of cores Dual core CPUs: framerate

Dual core CPUs: framerate Quad core CPUs: new experiences Application of cores Dual core CPUs: framerate Quad core CPUs: new experiences

Dual core CPUs: framerate Quad core CPUs: new experiences Application of cores Dual core CPUs: framerate Quad core CPUs: new experiences Richer visuals Improved simulation Richer AI

Application of cores: Particle Simulation

Application of cores: Particle Simulation Not simply a GPU issue Interactivity Presence

Application of cores: Particle Simulation Use cores to run multiple particle systems in parallel Individual particle systems using multiple cores

Application of cores: Particle Simulation More complicated systems Interactive particle systems Particles with gameplay implications Like rigid body physics, reinforce consistency of world

Particle Simulation Benchmark

Traditionally strict CPU limits AI Traditionally strict CPU limits Interesting combinations of minimalist algorithms “What could we do if extra CPU were given to AI?”

Better framerate AI Run AI in parallel with other systems Parallel agent execution

Example: Parallel Animation AI Example: Parallel Animation

Increased sophistication AI Better framerate Increased sophistication Without hitching by asynchronously running on secondary cores

Increased sophistication AI Better framerate Increased sophistication Without hitching by asynchronously running on secondary cores Richer path finding: better world interactivity

Increased sophistication AI Better framerate Increased sophistication Without hitching by asynchronously running on secondary cores Richer path finding: more analysis for better world awareness Deeper tactical analysis Increased world examination “Out of band” AI

Increased sophistication New kinds of AIs Better framerate Increased sophistication New kinds of AIs Experimental creatures built with multicore in mind

Integrate multicore across Valve’s business Broader Goals Integrate multicore across Valve’s business

Integrate multicore across Valve’s business Broader Goals Integrate multicore across Valve’s business Boost developer effectiveness

Integrate multicore across Valve’s business Broader Goals Integrate multicore across Valve’s business Boost developer effectiveness Expose to game programmers, licensees and MOD authors

Integrate multicore across Valve’s business Broader Goals Integrate multicore across Valve’s business Boost developer effectiveness Expose to game programmers, licensees and MOD authors Transparently scale to cores without recompile

Integrate multicore across Valve’s business Broader Goals Integrate multicore across Valve’s business Boost developer effectiveness Expose to game programmers, licensees and MOD authors Transparently scale to cores without recompile Leverage to 360

Integrate multicore across Valve’s business Broader Goals Integrate multicore across Valve’s business Boost developer effectiveness Expose to game programmers, licensees and MOD authors Transparently scale to cores without recompile Leverage to 360 Provide value to our customers beyond framerate

Integrate multicore across Valve’s business Broader Goals Integrate multicore across Valve’s business Boost developer effectiveness Expose to game programmers, licensees and MOD authors Transparently scale to cores without recompile Leverage to 360 Provide value to our customers beyond framerate Pre-Episode 2 Steam to deliver to customers, MOD authors, and licensees

Multicore is exciting and scary Accessible solutions Hybrid Threading Closing thoughts Multicore is exciting and scary Accessible solutions Hybrid Threading Scalable More broadly applicable Game Threading Infrastructure Closely controlled core usage Tools framed in terms of game problems Apply cores to new visual gameplay opportunities

Experiments and Benchmarks Particle System Multicore AI Provided to take home: VRAD benchmark, particle simulation benchmark