1 Game Developers Conference 2008 Comparative Analysis of Game Parallelization Dmitry Eremin Senior Software Engineer, Intel Software and Solutions Group.

Slides:



Advertisements
Similar presentations
INTEL CONFIDENTIAL Threading for Performance with Intel® Threading Building Blocks Session:
Advertisements

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Intel® Education Fluid Math™
Based on Silberschatz, Galvin and Gagne  2009 Threads Definition and motivation Multithreading Models Threading Issues Examples.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Intel ® Server Platform Transitions Nov / Dec ‘07.
Yabin Liu Senior Program Manager Business Intelligence and Reporting.
Intel® Education Learning in Context: Science Journal Intel Solutions Summit 2015, Dallas, TX.
OpenCL Introduction A TECHNICAL REVIEW LU OCT
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads.
Evaluation of a DAG with Intel® CnC Mark Hampton Software and Services Group CnC MIT July 27, 2010.
IBIS-AMI and Direction Indication February 17, 2015 Updated Feb. 20, 2015 Michael Mirmak.
Silberschatz, Galvin and Gagne ©2011Operating System Concepts Essentials – 8 th Edition Chapter 4: Threads.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Intel® Education Learning in Context: Concept Mapping Intel Solutions Summit 2015, Dallas, TX.
Enterprise Platforms & Services Division (EPSD) JBOD Update October, 2012 Intel Confidential Copyright © 2012, Intel Corporation. All rights reserved.
Intel Confidential – For Use with Customers under NDA Only Revision - 01 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL®
IBIS-AMI and Direction Decisions
IBIS-AMI and Direction Indication February 17, 2015 Michael Mirmak.
Copyright © 2006 Intel Corporation. WiMAX Wireless Broadband Access: The World Goes Wireless Michael Chen Director of Product & Platform Marketing Group.
Recognizing Potential Parallelism Introduction to Parallel Programming Part 1.
The Drive to Improved Performance/watt and Increasing Compute Density Steve Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.
Copyright © 2011 Intel Corporation. All rights reserved. Openlab Confidential CERN openlab ICT Challenges workshop Claudio Bellini Business Development.
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
Boxed Processor Stocking Plans Server & Mobile Q1’08 Product Available through February’08.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
Josef Schauer Program Manager Previous version support.
Threads. Readings r Silberschatz et al : Chapter 4.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Josef Schauer Program Manager Commerce Data Exchange.
INTEL CONFIDENTIAL Intel® Smart Connect Technology Remote Wake with WakeMyPC November 2013 – Revision 1.2 CDI/IBP #:
Operating System Concepts
Tuning Threaded Code with Intel® Parallel Amplifier.
Game Developers Conference 2009 Multithreaded AI For The Win! Orion Granatir Senior Software Engineer.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Lecture 5. Example for periority The average waiting time : = 41/5= 8.2.
Chapter 4 – Thread Concepts
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Introduction to threads
Chapter 4: Multithreaded Programming
Chapter 4: Threads.
Chapter 4: Threads.
Processes and Threads Processes and their scheduling
Chapter 4 – Thread Concepts
Chapter 4: Multithreaded Programming
Chapter 4: Multithreaded Programming
Game Threading Analysis & Methodology
BLIS optimized for EPYCTM Processors
Lecture 21 Concurrency Introduction
Chapter 4: Threads.
Many-core Software Development Platforms
Chapter 4: Threads.
A Proposed New Standard: Common Privacy Vulnerability Scoring System (CPVSS) Jonathan Fox, Privacy Office/PDIT Harold A. Toomey, PSG/ISecG Jason M. Fung,
Chapter 4: Threads.
Modified by H. Schulzrinne 02/15/10 Chapter 4: Threads.
F# for Parallel and Asynchronous Programming
What is Concurrent Programming?
Threads Chapter 4.
By Vipin Varghese Application Engineer (NCSD)
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
Multithreaded Programming
Chapter 4: Threads & Concurrency
Chapter 4: Threads.
Chapter 4: Threads.
Expanded CPU resource pool with
Operating System Overview
COMP755 Advanced Operating Systems
Presentation transcript:

1 Game Developers Conference 2008 Comparative Analysis of Game Parallelization Dmitry Eremin Senior Software Engineer, Intel Software and Solutions Group Thursday, February 21, 12:00pm – 1:00pm

2 Legal Disclaimer  INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.  Intel may make changes to specifications and product descriptions at any time, without notice.  All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.  Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.  Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance.  Intel, Intel Inside, and the Intel logo are trademarks of Intel Corporation in the United States and other countries.  *Other names and brands may be claimed as the property of others.  Copyright © 2007 Intel Corporation.

3  Usual Game Structure  Parallelization with Windows* and POSIX Threads  What is Intel® Threading Building Blocks?  Parallelization with Intel® Threading Building Blocks  Summary Agenda * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.

4 Usual Game Structure Render OnFrameMove Physics AI Particles

5 Usual Game Structure OnFrameMove Render Physics AI Particles

6 Thread Timeline  Horizontal bands represent threads Thread 1 Thread 2 Thread 3 Usual Game Structure Dark green: Threads are active (running or runnable) Light green: Threads are waiting Yellow Transition lines: Signals that wake up other threads, such as transferring a lock or sending a message Hatched light green: Threads are busy waiting

7 Concurrency Profile  Measure core utilization so user can see how parallel their program really is - Relative to the system executing the application Idle: no active threads Serial: a single thread Under-subscribed: # threads > 1 && # threads < # cores Fully-subscribed*: # threads == # cores Oversubscribed: # threads > # cores  Concurrency level is the number of threads that are active (not waiting, sleeping, blocked, etc.) at a given time * example reflects 4 core machine Usual Game Structure

8 Performance Profile Serial Case  Sequential execution  25% of system utilization  Benchmark: 20.95sec Measured on 4 core test machine Render Physics AI Particles Usual Game Structure

9 Limitation of Serial Games for Multi-Core Systems  With clock rates reaching into the multiple GHz range, further increases are becoming harder  Parallel hardware has gone mainstream for desktop  To exploit the performance potential of multi-core processors, applications must be threaded Usual Game Structure Serial games get no benefits from multi-core

10  Usual Game Structure  Parallelization with Windows* and POSIX Threads  What is Intel® Threading Building Blocks?  Parallelization with Intel® Threading Building Blocks  Summary Agenda * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.

11 Parallelization with Windows* and POSIX Threads  Updating with double buffered data structures  Decoupling rendering from frame processing  Asynchronous update of parts of the scene  Functional Decomposition Render Physics Particles AI * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.

12 Performance Profile Functional Decomposition Parallelization with Windows* and POSIX Threads  Thread pool for 3 threads  Load imbalance  Benchmark: 10.15sec Measured on 4 core test machine Render Physics AI Particles Render Thread Pool Low utilization of 4 cores

13 Data Level Parallelism  Nested parallelism - Top level - functional decomposition - Next level - data decomposition Parallelization with Windows* and POSIX Threads update several AI units... update several AI units Render Physics Particles

14 Performance Profile AI Decomposition with 2 Threads Parallelization with Windows* and POSIX Threads  Thread pool for 3 threads  Split AI for 2 threads  Load imbalance  Benchmark: 9.44sec Measured on 4 core test machine Render Thread Pool AI Pool

15 Performance Profile AI Decomposition with 4 Threads Parallelization with Windows* and POSIX Threads  Thread pool for 3 threads  Split AI for 4 threads  Load imbalance  Oversubscription  Benchmark: 15.47sec Measured on 4 core test machine Render Thread Pool AI Pool

16 One More Problem: Nested Parallelism  Software components are built from smaller components  If each turtle specifies threads... Parallelization with Windows* and POSIX Threads

17 Disadvantages of Using Windows* and POSIX Threads for Games  Low-Level details (not intuitive)  Hard to come up with good design  Code often becomes very dependent on a particular OS’s threading facilities  Load imbalance  Has to be managed manually  Oversubscription  Multiple components create threads that compete for CPU resources  Hard to manage nested parallelism Parallelization with Windows* and POSIX Threads * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners. Hard to achieve scalability

18  Usual Game Structure  Parallelization with Windows* and POSIX Threads  What is Intel® Threading Building Blocks?  Parallelization with Intel® Threading Building Blocks  Summary Agenda * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.

19 What is Intel® Threading Building Blocks?  It is Open Source now!  Threading Abstraction Library - Relies on generic programming - Provides high-level generic implementation of parallel design patterns and concurrent data structures  You specify task patterns instead of threads - Library maps your logical tasks onto physical threads, efficiently using cache and balancing load - Full support for nested parallelism  Targets threading for robust performance - Designed to provide scalable performance for computationally intense portions of shrink-wrapped applications - Portable across Linux*, Mac OS*, and Windows*  Emphasizes scalable data parallel programming - Solutions based on functional decomposition usually do not scale * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.

20 Components of Intel® Threading Building Blocks  Parallel algorithms  Concurrent containers  Synchronization primitives  Memory allocation  Task scheduler ProblemIntel® TBB Approach Low-Level details Operate with task patterns instead of threads Load imbalance Work-stealing balances load OversubscriptionOne scheduled thread per hardware thread What is Intel® Threading Building Blocks?

21  Usual Game Structure  Parallelization with Windows* and POSIX Threads  What is Intel® Threading Building Blocks?  Parallelization with Intel® Threading Building Blocks  Summary Agenda * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.

22 Parallelization with Intel® Threading Building Blocks Scheme of parallelization with Windows* and POSIX threads Scheme of parallelization with Intel® TBB Render Physics Particles update several AI units... update several AI units update several blocks update several AI units update several particles Render update several particles update several blocks update several AI units * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.

23 Task Graph MainTask AITask AIBodyTask AIFinalTask Task creation order Task completion signals AIBodyTask SyncTask PhysicsTask ParticlesTask Not expanded Parallelization with Intel® TBB...

24 Performance Profile  Intel® TBB task pool for 3 threads  Automatic load balancing with work-stealing  Benchmark: 8.66sec Measured on 4 core test machine Render Parallelization with Intel® TBB Good utilization of 4 cores

25 Limitation of Using Intel® Threading Building Blocks for Games Intel® TBB is not intended for  I/O bound processing  Hard real-time processing  Excessive usage of explicit synchronization compatible However, it is compatible with other threading packages  It can be used in concert with Windows* and POSIX threads, etc Parallelization with Intel® TBB * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.

26 Advantages of Using Intel® Threading Building Blocks for Games  Generic Parallel Algorithms  You specify task patterns instead of threads  Cross-Platform implementation  Load balancing  Adaptive tuning to variable computation  Full support for nested parallelism  Efficient use of resources  One scheduled thread per hardware thread  Effective cache reuse Parallelization with Intel® TBB Easy to achieve scalability

27  Usual Game Structure  Parallelization with Windows* and POSIX Threads  What is Intel® Threading Building Blocks?  Parallelization with Intel® Threading Building Blocks  Summary Agenda * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.

28 Summary Serial games get no benefits from multi-core Hard to achieve scalability with Windows* and POSIX threads Intel® Threading Building Blocks can easily give game developers a serious boost and scalability * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners. Multi-Thread Your Game with Intel® TBB

29 Call To Action  Think about Multi-Threading at the beginning of your project  Think about scalable performance (for N cores, not just 2 or 4) for years to come  Please fill out the evaluation form

30 ?

31 10:30am -Gaming on the Go 12:00pm -COLLADA in the Game 02:30pm -Interactive Ray Tracing in Games 04:00pm -Speed Up Synchronization Locks 09:00am -The Future of Programming for Multi- Core with the Intel Compilers 10:30am -Getting the Most Out of Intel Graphics 12:00pm -Comparative Analysis of Game Parallelization 02:30pm - Threading Quake 4 and Quake Wars Wednesday Thursday

32 For More Information Commercial Intel® TBB Product Web Page: Open Source Intel® TBB Web Portal: See Intel at GDC: - Booth number Intel Interactive Lounge – Moscone West 3 rd floor

33 Risk Factors This presentation contains forward-looking statements. All statements made that are not historical facts are subject to a number of risks and uncertainties, and actual results may differ materially. Please refer to our most recent Earnings Release and our most recent Form 10-Q or 10-K filing available on our website for more information on the risk factors that could cause actual results to differ. Rev. 4/17/07