Download presentation
Presentation is loading. Please wait.
Published byJared Hall Modified over 6 years ago
1
Multi-threading the Oxman Game Engine Sean Oxley CS 523, Fall 2012
2
Project Overview An engine is middleware for game developers, handling low-level details Adaptation of a single-threaded video game engine to multi-threaded
3
What's being multi-threaded?
Physics Spatial queries, synchronization Animation Updating of time Keyframe retrieval, PRS generation Skeleton matrix palette generation Visibility Visibility AABB generation Spatial tree classification Camera culling pass
4
Challenges of multi-threaded game engines
Response time At 60 frames per second (ideal framerate), one frame is ~16.7 ms Threads must compute short bursts of data in as little time as possible Synchronization and start-up costs must be minimized Transparency to developer Developer should not be expected to maintain strict access rules (preferably, none at all) Unpredictability of object behavior and data access
5
The approach Utilize data parallelism
Pool of threads that all simultaneously work on a task Batch physics, animation, and visibility calculations into arrays High independence of work Synchronization code minimal, mostly contained in one class Theoretically scalable to larger numbers of threads Non-parallel code runs just as fast as before Little impact on developer
6
Caveats of the approach
Third-party libraries Bullet Physics, what Oxman Engine currently uses for physics, is not thread-safe Intention was to do simultaneous queries, but forced to resort to background queries Large implications for CharacterController performance Slight developer discipline requirements Physics Waiting for queries to complete Maintaining data lifetime for the query's duration Availability Results of animation/physics not reflected until post physics/animation update phase
7
The details A “job” (a static function designed to be run in parallel) is launched by the application through a class called ThreadPool Jobs can be launched either blocking or non-blocking If blocking, the calling thread also participates in the job After the job is launched, the caller sleeps until the job is complete A job can utilize any number of threads, but only one job can run at a time
8
The details, cont. Threads are created a priori at engine init
Avoids overhead of constant thread creation/destruction Threads are woken up, and busy-wait until all threads have woken up (avoids race conditions) Each thread runs the job function, then goes back to sleep The threads know their index and how many other threads are performing the job If the job was blocking, the last thread wakes the calling thread up
9
Issues from compiler optimization and out-of-order execution
Busy-waiting on a variable to be a desired value Compiler will optimize out the load instruction Causes deadlock; must force a reload Out-of-order execution Could be done by either CPU or compiler Cannot rely on execution order without special instruction; potential for race conditions This instruction is technically not supported on all architectures, but will exist on the engine's target machines Both problems solved by the MemoryBarrier() macro on Win32 (macro for inline assembly)
10
Addressing performance concerns
In first implementation, threads slept on a condition variable Worked fine with two threads, awful with any more Problems caused by both high kernel and high mutex contention (at least on Windows) Busy-waiting didn't work either OS was more likely to preempt, holding up the whole pool Compromise made Threads continually check for a job. If no job is there, they call Sleep(0), which gives up their time quantum Makes OS preemption much less likely
11
The Performance Test Test machines: Testing procedure:
Intel Core i GHz (8 logical cores), 16 GB RAM, nVidia Geforce GTX 570 Pentium Dual T GHz (2 logical cores), 3 GB RAM, Intel 4 Express (integrated) Testing procedure: First test: 10 characters, using CharacterController components More of a realistic scenario Second test: 200 characters, not using the CharacterController More of a stress test for the parallelized engine portions Both tests done once with rendering on, once with rendering off
12
Core i7 Results, Part 1
13
Core i7 Results, Part 2
14
Core i7 Performance Analysis
First test results were rather inconclusive Differences in frame times could be attributed to error in measurements 8 threads actually slows down performance Second test results much more clear 1, 2, and 4 threads showed a 5 ms improvement each time 8 threads didn't provide any benefits
15
Pentium Dual Results, Part 1
16
Pentium Dual Results, Part 2
17
Pentium Dual Performance Analysis
No improvement for the first test Benefit of two threads offset by costs of preemption Second test again fared much better A whopping ms improvement
18
Conclusion As it currently stands, the engine's multi-threading has limited use Second test is a rather unlikely scenario Scalability concerns Fork-and-join might be more suited for consoles Not being able to effectively parallelize physics was a huge issue Other, more complex approaches will need to be tried Given more time, I'd try the job queue approach next
19
The roads not traveled Use of a thread-safe physics library
Spatial queries could then be batched and executed all in parallel CPU-intensive CharacterControllers could be parallelized Different multi-threading approaches Threads dedicated to particular subsystems “Start-up time” issues reduced or eliminated Harder to synchronize, Scalability issues Smaller-granularity job queue Work is put in a queue and picked up by individual threads, instead of all threads at once Scales better as # of threads increase More flexible Difficulty of scalable job creation More game-like performance testing scenarios
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.