Multithreading GSP-420
What is multithreading? A thread is a dispatchable unit of executable code. The name comes from the concept of a “thread of execution.” In a thread-based multitasking environment, all processes have at least one thread, but they can have more. This means that a single program can perform two or more tasks concurrently. For instance, a text editor can be formatting text at the same time that it is printing, as long as these two actions are being performed by two separate threads.
Parallelism Multithreading changes the fundamental architecture of a program. Unlike a single-threaded program that executes in a strictly linear fashion, a multithreaded program executes portions of itself concurrently. Thus, all multithreaded programs include an element of parallelism. Consequently, a major issue in multithreaded programs is managing the interaction of the threads.
Threads As explained earlier, all processes have at least one thread of execution, which is called the main thread. The main thread is created when your program begins. In a multithreaded program, the main thread creates one or more child threads. Thus, each multithreaded process starts with one thread of execution and then creates one or more additional threads. In a properly designed program, each thread represents a single logical unit of activity.
Advantage of Multithreading The principal advantage of multithreading is that it enables you to write very efficient programs because it lets you utilize the idle time that is present in most programs. Most I/O devices, whether they are network ports, disk drives, or the keyboard, are much slower than the CPU. Often, a program will spend a majority of its execution time waiting to send or receive data. With the careful use of multithreading, your program can execute another task during this idle time. For example, while one part of your program is sending a file over the Internet, another part can be reading keyboard input, and still another can be buffering the next block of data to send.
Windows Allows Us To Multithread C++ does not contain any built-in support for multithreaded applications. Instead, it relies entirely upon the operating system to provide this feature. Java and C# provide built-in support for multithreading, C++ does not due to efficiency, control, and the range of applications to which C++ is applied.
C++ Is All About Efficiency By not building in support for multithreading, C++ does not attempt to define a “one size fits all” solution. Instead, C++ allows you to directly utilize the multithreading features provided by the operating system. This approach means that your programs can be multithreaded in the most efficient means supported by the execution environment. Because many multitasking environments offer rich support for multithreading, being able to access that support is crucial to the creation of high-performance, multithreaded programs.
Windows Multithreading Windows offers a wide array of API functions that support multithreading. Keep in mind that Windows provides many other multithreading-based functions that you might want to explore on your own. To use Windows’ multithreading functions, you must include <windows.h> in your program.
CreateThread() HANDLE CreateThread(LPSECURITY_ATTRIBUTES secAttr, SIZE_T stackSize, LPTHREAD_START_ROUTINE threadFunc, LPVOID param, DWORD flags, LPDWORD threadID); Use the MSDN documentation on what each of these parameters represent.
CreateThread() – threadfunc() Each thread of execution begins with a call to a function, called the thread function, within the creating process. Execution of the thread continues until the thread function returns. The address of this function (that is, the entry point to the thread) is specified in threadFunc. All thread functions must have this prototype: DWORD WINAPI threadfunc(LPVOID param); Any argument that you need to pass to the new thread is specified in CreateThread( )’s param. This 32-bit value is received by the thread function in its parameter. This parameter may be used for any purpose. The function returns its exit status.
Terminating Threads A thread of execution terminates when its entry function returns or fails (return NULL). The process may also terminate the thread manually, using either TerminateThread( ) or ExitThread( ): BOOL TerminateThread(HANDLE thread, DWORD status); VOID ExitThread(DWORD status);
ATTN: WE USE VISUAL C++ The Visual C++ alternatives to CreateThread( ) and ExitThread( ) are _beginthreadex( ) and _endthreadex( ). Both require the header file <process.h>.
_beginthreadex() NOTICE parameters for CreateThread() and _beginthreadex(). The function returns a handle to the thread if successful or zero if a failure occurs. The type uintptr_t specifies a Visual C++ type capable of holding a pointer or handle. uintptr_t _beginthreadex(void *secAttr, unsigned stackSize, unsigned (__stdcall *threadFunc)(void *), void *param, unsigned flags, unsigned *threadID);
_beginthreadex() - threadfunc() The address of the thread function (that is, the entry point to the thread) is specified in threadFunc. For _beginthreadex( ), a thread function must have this prototype: unsigned __stdcall threadfunc(void * param);
_endthreadex() The prototype for _endthreadex( ): void _endthreadex(unsigned status); It functions just like ExitThread( ) by stopping the thread and returning the exit code specified in status.
Multithreading Compiler Settings To use the multithreaded library from the Visual C++ 6 IDE, first activate the Project | Settings property sheet. Then, select the C/C++ tab. Next, select Code Generation from the Category list box and then choose Multithreaded in the Use Runtime Library list box. For Visual C++ 7 .NET IDE, select Project | Properties. Next, select the C/C++ entry and highlight Code Generation. Finally, choose Multi-threaded as the runtime library.
Suspending and Resuming Threads A thread of execution can be suspended by calling SuspendThread( ). It can be resumed by calling ResumeThread( ). The prototypes for these functions are shown here: DWORD SuspendThread(HANDLE hThread); DWORD ResumeThread(HANDLE hThread); For both functions, the handle to the thread is passed in hThread.
Synchronization - Mutex A mutex synchronizes a resource such that one and only one thread or process can access it at any one time. The CreateMutex( ) function returns a handle to the semaphore if successful or NULL on failure. A mutex handle is automatically closed when the main process ends. You can explicitly close a mutex handle when it is no longer needed by calling CloseHandle( ).
CreateMutex() HANDLE CreateMutex(LPSECURITY_ATTRIBUTES secAttr, BOOL acquire, LPCSTR name); Here, secAttr is a pointer to the security attributes. If secAttr is NULL, the default security descriptor is used. If the creating thread desires control of the mutex, then acquire must be true. Otherwise, pass false. The name parameter points to a string that becomes the name of the mutex object. Mutexes are global objects, which may be used by other processes.
Using a Mutex Once you have created a semaphore mutex, you use it by calling two related functions: WaitForSingleObject( ) and ReleaseMutex( ): DWORD WaitForSingleObject(HANDLE hObject, DWORD howLong); BOOL ReleaseMutex(HANDLE hMutex);
WaitForSingleObject() WaitForSingleObject( ) waits on a synchronization object. It does not return until the object becomes available or a time-out occurs. For use with mutexes, hObject will be the handle of a mutex. The howLong parameter specifies, in milliseconds, how long the calling routine will wait. Once that time has elapsed, a time-out error will be returned. To wait indefinitely, use the value INFINITE. The function returns WAIT_OBJECT_0 when successful (that is, when access is granted). It returns WAIT_TIMEOUT when time-out is reached.
ReleaseMutex() ReleaseMutex( ) releases the mutex and allows another thread to acquire it. Here, hMutex is the handle to the mutex. The function returns nonzero if successful and zero on failure.
Example of Using A Mutex To use a mutex to control access to a shared resource, wrap the code that accesses that resource between a call to WaitForSingleObject() and ReleaseMutex(), as shown in this example. (The time-out period will differ from application to application.) if(WaitForSingleObject(hMutex, 10000)==WAIT_TIMEOUT) { // handle time-out error } // access the resource ReleaseMutex(hMutex);
Warning: Deadlock Generally, you will want to choose a time-out period that will be more than enough to accommodate the actions of your program. If you get repeated time-out errors when developing a multithreaded application, it usually means that you have created a deadlock condition. Deadlock occurs when one thread is waiting on a mutex that another thread never releases.
Designing Multithreading There are two main ways to break down a program to concurrent parts: Function Parallelism - divides the program to concurrent tasks. Data Parallelism - which tries to find some set of data for which to perform the same tasks in parallel. Of the three compared models, two will be function parallel, and one data parallel.
Synchronous Function Parallel One way to include parallelism to a game loop is to find parallel tasks from an existing loop. To reduce the need for communication between parallel tasks, the tasks should preferably be truly independent of each other. An example of this could be doing a physics task while calculating an animation.
Synchronous Function Parallel A game loop parallelized using the synchronous function parallel model. The animation and the physics tasks can be executed in parallel.
Limitations One major concern with both the function parallel models is that they have an upper limit to how many cores they can support. This is the limit of how many parallel tasks it is possible to find in the engine. The number of meaningful tasks is decreased by the fact that threading very small tasks will yield negligible results. The synchronous function parallel model imposes an additional limit – the parallel tasks should have very little dependencies on each other. For example it is not sensible to run a physics task in parallel with a rendering task if the rendering needs the object coordinates from the physics task.
Asynchronous function parallel The difference is that this model doesn't contain a game loop. Instead, the tasks that drive the game forward update at their own pace, using the latest information available. For example the rendering task might not wait for the physics task to finish, but would just use the latest completed physics update. By using this method it is possible to efficiently parallelize tasks that are interdependent.
Asynchronous function parallel The asynchronous function parallel model enables interdependent tasks to run in parallel. The rendering task does not wait for the physics task to finish, but uses the latest complete physics update.
Limitations As with the synchronous model, the scalability of the asynchronous function parallel model is limited by how many tasks it is possible to find from the engine. Fortunately the communication between threads by only using the latest information available effectively reduces the need for the threads to be truly independent. Thus we can easily have a physics task run concurrently with a rendering task – the rendering task would use a previous physics update to get the coordinates for each object. Based on this, the asynchronous model can support a larger amount of tasks, and therefore a larger amount of processor cores, than the synchronous model.
Data parallel model In addition to finding parallel tasks, it is possible to find some set of similar data for which to perform the same tasks in parallel. With game engines, such parallel data would be the objects in the game. For example, in a flying simulation, one might decide to divide all of the planes into two threads. Each thread would handle the simulation of half of the planes. Optimally the engine would use as many threads as there are logical processor cores.
Data parallel model A game loop using the data parallel model. Each object thread simulates a part of the game objects.
Important Data Parallel Issue An important issue is how to divide the objects into threads. One thing to consider is that the threads should be properly balanced, so that each processor core gets used to full capacity. A second thing to consider is what will happen when two objects in different threads need to interact. Communication using synchronization primitives could potentially reduce the amount of parallelism. Therefore a recommended plan of action is to use message passing accompanied by using latest known updates as in the asynchronous model. Communication between threads can be reduced by grouping objects that are most likely to interact with each other. Objects are more likely to come into contact with their neighbors, so one strategy could be to group objects by area.
Data Parallel or Function Parallel? The data parallel model has excellent scalability. The amount of object threads can be automatically set to the amount of cores the system is running, and the only non-parallelizable parts of the game loop would be ones that don't directly deal with game objects. While the function parallel models can still get the most out of a few cores, data parallelism is needed to fully utilize future processors with dozens of cores.
Synchronous Conclusion Because the synchronous function parallel model does not require special changes to engine components, and is really just an enhancement of a regular game loop, it is well suited for adding some amount of parallelism to an existing game engine. The model is not suited for future use because of it's weak scaling support and low amount of parallelism.
Asynchronous Conclusion The asynchronous function parallel model can be recommended for new game engines because of the high amount of possible parallelism and the fact that existing components need only few changes. This model is a good choice for game engines aimed at the generation of multicore processors that have a relatively small number of cores. The only drawback is the need to tune thread running times to minimize the impact of worst case thread timings. More research is needed to know how this timing fluctuation actually affects game play.
Data Parallel Conclusion The data parallel model is definitely something to think about for the future. It will be needed when the amount of processor cores increases beyond the number of tasks available for a function parallel model. For current use the increased scalability doesn't offer enough benefits compared to the trouble of coding custom components to support this type of data parallelism.
So Do We Have To Do A Lab On All This? No… But you should at least attempt to get performance gains in your completed engine project with synchronous parallelism. If you successfully implement this into your engine and see significant performance gains, you will earn an extra 5% overall credit in this class. Note: Requires 2 separate builds
Resources Multithreaded Game Engine Architecture: http://www.gamasutra.com/features/20060906/m onkkonen_01.shtml Multithreading in C++: http://www.devarticles.com/c/a/Cplusplus/Multi threading-in-C/ GDC 2004 Multithreading Notes: http://www.toymaker.info/Games/html/gdce_- _cpu_issues.html