Multi-threading Streaming

Multi-threading Streaming
Computer Games Engineering - CO4302

We continue with topics that will help build the engine in the labs
OVERVIEW We continue with topics that will help build the engine in the labs Multi-threading is important in many areas of games development We introduce the new C++11 features that make threading easy(ish) We will use threads to increase performance of foreground tasks as perform background tasks (e.g. streaming in data) We also make a brief aside into an optimisation topic, line sweeps

Threading in C++11 1 The CPU is bored.

Single-threaded programs under-utilise the CPU
MISSED OPPORTUNITIES Single-threaded programs under-utilise the CPU It is easy to focus on GPU performance and forget the CPU But ever more dynamic game environments need the flexibility of the CPU (GPGPU programming is hard). Many CPU-side tasks are trivially parallel: Long loops, outputs independent, input data read only So there are few data synchronisation issues DOD often uses loops that fit this description Main Thread: loop 1->1000 Thread 1: loop 1->100, Thread 2: loop 101->200 etc.

This kind of parallel loop is simple in C++11:
C++11 THREADING This kind of parallel loop is simple in C++11: void LoopSection(int s, int e) { for (int i = s; i < e; ++i) { DoThing(i); } std::thread threads[10]; for (int t = 0; t < 10; ++t) // Start 10 threads threads[t] = std::thread(LoopSection, t*100, (t+1)*100); for (int t = 0; t < 10; ++t) threads[t].join(); // Wait for each thread to end

std::thread is declared in <thread>
C++11 THREADING std::thread is declared in <thread> Constructor takes the name of a function and the parameters to that function. The thread starts running that function. Main thread continues normally after the new thread starts. New thread ends when it exits from the function. The method thread.join waits for that thread to end. The related thread.detach disassociates that thread from the main thread. All running threads must be joined or detached before the main thread ends.

This approach is often recommended for C++11 threading.
TOO SLOW… This approach is often recommended for C++11 threading. Clever templated parallel for loop replacements are possible However, it is ineffective for games. Creating threads is too slow with frame times in milliseconds Time spent creating threads will exceed any benefits. Only appropriate for tasks that will take seconds or more. Solution: create a collection of threads (thread pool) at setup time. Each thread waits until woken up and handed some work to do. When the work is complete it waits again. Requires some inter-thread communication…

MANAGING WORKER THREADS WITH CONDITION VARIABLES
// Note this code is illustrative – it’s not right yet const int NumWorkers = 10; std::condition_variable workReady[NumWorkers]; // Main Thread for (int i = 0; i < NumWorkers; ++i) { PrepareWork(i); workReady[i].notify_one(); // Tell } // worker ... for (int i = 0; i < numWorkers; ++i) workReady[i].wait(); // Wait for all } // work to finish // Thread t (already created) while (workerRunning) { workReady[t].wait(); // Wait for work DoWork(t) workReady[t].notify_one(); // Tell main } // thread

However, the code on the last slide has problems:
CONDITION VARIABLES std::condition_variable is declared in <condition_variable> It has wait and notify methods as shown last slide. Another variant is notify_all that allows several threads to receive the same signal However, the code on the last slide has problems: While the main thread is waiting for one worker, another worker could finish and its signal would be missed Condition variables can receive spurious wakeups A false signal, which should be ignored. These are allowed so the STL can be implemented more efficiently So as well as sending a signal we must maintain our own record that it was sent. That way we can avoid the above two issues.

HANDLING SPURIOUS WAKEUPS
// Worker thread while (!haveWork[t]) { // 1. If work already complete don’t wait for a signal workReady[t].wait(); // 2. Spurious signals ignored – haveWork won’t be set }; … // Main thread haveWork[i] = true; workReady[i].notify_one(); Wake ups are verified against our own boolean variable Why not use a boolean only? Since sleeping in wait is better than continuously looping But this is still not correct - in this form the code would cause a race condition: If main thread code occurs after worker while condition, but before worker calls “wait”

A mutex allows us to lock a section of code to one thread
C++11 MUTEXES std::mutex, std::lock_guard, std::unique_lock are in <mutex> A mutex allows us to lock a section of code to one thread We don’t use std::mutex directly, we use the lock types: std::mutex mutex; { // Thread 1 std::lock_guard<std::mutex>(mutex); // Locks mutex until out of scope bankBalance -= 100; } // Note the curly brackets used only to define scope … { // Thread 2 std::lock_guard<std::mutex>(mutex); // Same mutex, code can’t run at same time if (bankBalance > 50) AllowWithdrawal(); }

When using condition variables you must use unique_lock
C++11 MUTEXES std::unique_lock works the same way as lock_guard except: It can be locked and unlocked at any time (lock_guard only unlocks on destruction). Doesn’t have to be locked at first. Can transfer ownership (moveable). etc. When using condition variables you must use unique_lock In fact the wait function requires you to pass an unique lock as a parameter, the lock must already be held. We now have enough to write a working thread pool:

const int NumWorkers = 10;
std::condition_variable workReady[NumWorkers]; std::mutex mutex[NumWorkers]; // Main Thread for (int i = 0; i < NumWorkers; ++i) { PrepareWork(i); { // Only use haveWork if other thread is not std::unique_lock<std::mutex> lock(mutex[i]) haveWork[i] = true; } workReady[i].notify_one(); // Tell worker ... // Do something ? for (int i = 0; i < numWorkers; ++i) while (haveWork[i]) {// Wait until work is done workReady[i].wait(lock); // Thread t (already created) while (workerRunning) { // Guard use of haveWork from other thread std::unique_lock<std::mutex> lock(mutex[t]) while (!haveWork[t]) { // Wait for some work workReady[t].wait(lock); }; } DoWork(t) haveWork[i] = false; workReady[t].notify_one(); // Tell main thread

That is fully functional thread pool code
DETAILS That is fully functional thread pool code Threads are not created but sleep until work arrives – efficient enough for threading tasks in a single frame of a game. There is a alternative wait method that allows a function/lambda: workReady[i].wait(lock, [&]() { return haveWork[i]; }); Replaces the whole while loop on the last slide Also note that wait disables the lock while it waits, and enables it again to test the condition (the predicate) Whenever we hold the lock, other threads are blocked from progressing past certain points – so disable it when possible. Note the main thread could do something while the workers are busy

An Aside 2 Line Sweep Algorithms

Voronoi diagrams can be used as a basis for fracturing geometry.
LINE SWEEP ALGORITHMS In the lab you use a variation of a line sweep algorithm. This is a method of sorting data along one axis, then sweeping along that axis when performing a search or other algorithm Using the fact that neighbouring elements in the sweep are near each other in one axis, we can often optimise the algorithm. This example is creating a Voronoi diagram using a line sweep in O(n.log(n)). Voronoi diagrams can be used as a basis for fracturing geometry.

More C++11 Threading Features
3

This function can return a result (unlike a thread)
std::async std::async runs a function (or lambda) on a new thread This function can return a result (unlike a thread) std::async gives you a std::future object that can be used to collect the result of the function int do_stuff(float a, int b) { // Runs on another thread ... return result; } std::future<int> future = std::async(do_stuff, 2.5f, 10); // Do other things int result = future.get(); // Waits until result is ready

You can also test if a result is ready or not:
std::async You can also test if a result is ready or not: if (future.wait_for(std::chrono::seconds(0)) == std::future_status::ready) wait_for is for setting timeouts on the result, but this is very useful async is easier to work with than threads: In particular, it makes communication of the result much simpler However, this is still has penalty of thread creation So not useful for threading game per-frame tasks But most convenient method for set-up tasks or long running tasks Side note: the main thread uses a std::future to collect the result. The running thread uses a std::promise to communicate the result. We don’t see std::promise in typical usage, but mentioning it for completeness.

ADVANCED THREAD POOLS : std::packaged_task
std::packaged_task is an object that encapsulates a function with arbitrary parameters and a return value A little like async, but it is not called straight away. It is an object that can be moved (not copied) around. Like async it returns results using std::future making it easy to synchronise. This makes it an good choice for a generic work object to pass to worker threads. The details are quite complex, here is a good example:

Threading Case Study 4 Streaming

We often need to load game data while the game is running
ASYNCHRONOUS I/O We often need to load game data while the game is running Normal file I/O will block and stall the game C++ does not have a standard asynchronous I/O file API We have a couple of options: Platform specific APIs E.g. Windows: CreateFile and ReadFile with FILE_FLAG_OVERLAPPED flag set E.g. PS4: API called fios Write ordinary synchronous I/O code and run it in a separate thread Platform-specific APIs may be more efficient with the hardware Our own threaded I/O can be more tightly integrated with the game

LAB CASE STUDY – DYNAMICALLY LOADING A 2D MAP
Window moving through a large grid-based map. Blue squares loaded, yellow squares not loaded. Window approaches edge of loaded area, new squares must be loaded. Use async i/o to load green squares. Reuse memory occupied by squares with X’s, which are discarded. Loading must complete before screen reaches edge of loaded area. Must make blue grid large enough that there is time to load new sections. Also when the window reverses direction, do not want to immediately load the squares with X’s again. Increasing grid size will help with this too. Details in lab.

Multi-threading Streaming

Similar presentations

Presentation on theme: "Multi-threading Streaming"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multi-threading Streaming

Similar presentations

Presentation on theme: "Multi-threading Streaming"— Presentation transcript:

Similar presentations

About project

Feedback