Windows Threading Colin Roby Jaewook Kim [CMSC 621] Advanced Operating Systems Windows Threading Colin Roby Jaewook Kim
OS, Process, and Thread for Windows OS Applications Programming paradigms P Microkernel Multi-Processor Computing System Threads Interface Operating System Hardware Process Processor Thread P
Legacy Window Threading Model (Co-operative Threading – Windows 3 Legacy Window Threading Model (Co-operative Threading – Windows 3.1 and 95)
Co-operative Threading Used by old 16-bit Window Platform Invented to overcome the lacking of a hardware timer Thread continues execution until Thread terminates Thread executes an instruction causing wait (e.g., IO) Thread volunteers to stop (invoking yield or sleep) Ready Running Exited Blocked Terminate (call scheduler) Create Block for resource Yield Resource becomes available (move to ready queue) Scheduler dispatch
Architecture for Cooperative Threading Model Use serialized message queue All user input from keyboard & mouse are queued Next message is not sent to program until current message is fully processed Message based program interaction Prior to receiving message, program stays dormant in memory Message queue sends message to program Program starts processing message Program returns control back to window
Advantages & Disadvantages Safe and easy to use. No need to worry about other threads changing shared variables due to its exclusive nature Disadvantage Only one thread can be active Threads depend on each other to yield control, results in performance decrease in heavily loaded systems.
Threading Models from Windows NT to 2003 (Preemptive Threading)
Preemptive Multiprocessing Preemptive multi-processing operating system The OS schedules the CPU time The application can be preempted by OS scheduler Terminate (call scheduler) Exited Running Scheduler dispatch Block for resource (call scheduler) Yield, Interrupt (call scheduler) Create Ready Blocked Resource free, I/O completion interrupt (move to ready queue) * Kai Li – Non-Preemptive and Preemptive Threads
Windows Thread The unit of execution (in UNIX, Process is the unit) Basically one-to-one mapping Fiber Library for the M:M Model Each thread contains A thread id Register set Separate user and kernel stacks Private data storage area The register set, stacks, and private storage area are known as the context of the threads The primary data structures of a thread include: ETHREAD (executive thread block) KTHREAD (kernel thread block) TEB (thread environment block)
Windows Thread Types Single Threading Multiple Threading Each process is started with a single thread Multiple Threading A thread can be created by Win32 Pthread or Windows Thread API Hyper Threading Simultaneous multithreading technology on the Pentium 4 microarchitecture by Intel Supported by Windows 2000 or more
Windows Threading Models Win32 Threading Model Win32 Pthread or Windows Thread API COM (Component Object Model) Threading Model Single Threaded Apartments (STA) Multi Threaded Apartments (MTA) Both Threading Model (STA or MTA)
Win32 Threading API Calls Some of Win32 calls for managing processes, threads and fibers
Win32 Threading Example start_servers( ) { HANDLE thread; DWORD id; thread = CreateThread(0, // security attributes 0, // default # of stack pages allocated (LPTHREAD_START_ROUTINE) server, // start routine (LPVOID)0, // argument 0, // creation flags &id); // thread ID WaitForSingleObject(thread, INFINITE); ... } DWORD WINAPI server(void *arg) { while(TRUE) // get and handle request return(0); To create a thread, one calls the CreateThread routine. This skeleton code for a server application creates a number of threads, each to handle client requests. If CreateThread returns successfully, then a new thread has been created that is now executing independently of the caller of CreateThread. The handle for the new thread is returned; its ID is returned via the last (result) argument. The first parameter is a pointer to the security attributes to be associated with the thread; we supply this as 0 throughout the course. The next parameter is the number of stack pages (in bytes) to allocate physical resources for (one megabyte of virtual memory is allocated; the parameter indicates how much of this initially has real memory and stack space supporting it); 0 means to use the default. The third parameter is the address of the first routine that our thread executes; the next parameter is the argument that’s passed to that routine. The next to the last parameter specifies various creation flags; we don’t supply any here. If CreateThread fails, GetLastError can be used to determine the cause of the failure. The definition of server has the rather odd-looking “WINAPI” as part of its signature. This indicates the subroutine-calling convention used for this routine.
Win32 Threading Example cont. rlogind(int r_in, int r_out, int l_in, int l_out) { HANDLE in_thread, out_thread; two_ints_t in={r_in, l_out}, out={l_in, r_out}; in_thread = CreateThread(0, 0, incoming, &in, 0, &id); out_thread = CreateThread(0, 0, outgoing, &out, 0, &id); WaitForSingleObject(in_thread, INFINITE); CloseHandle(in_thread); WaitForSingleObject(out_thread, INFINITE); CloseHandle(out_thread); } Here’s the Win32 analog of the rlogind example we showed for POSIX threads. The routine WaitForSingleObject acts, at least in this example, much like pthread_join: it causes the caller to wait until either the thread mentioned in its first argument terminates or the period of time given in its second argument (in milliseconds) transpires. However, unlike as is the case with POSIX threads, the programmer has control over the lifetime of the internal object representing the thread. When a thread is created, a thread object is created in the kernel and the creating thread is passed a handle (reference) to it via the value returned by CreateThread (this is implemented in a manner similar to how file descriptors refer to open files in Unix). The thread object maintains a reference count, which is the number of handles that refer to it. Initially this is two: one handle is returned to the creating thread, the other is implicitly held by the new thread itself, and closed when that thread terminates. The kernel object disappears when (and only when) the reference count drops to zero. Thus, in the example of the slide, when in_thread and out_thread terminate, the reference counts in their thread objects drop from two to one. The creating thread returns from WaitForSingleObject; it must explicitly call CloseHandle so that its hold on the thread object is released, dropping the reference count to zero.
Win32 Threading Example cont. ExitThread((DWORD) value); return((DWORD) value); WaitForSingleObject(thread, timeOutValue); GetExitCodeThread(thread, &value); CloseHandle(thread); A thread terminates either by calling ExitThread or by returning from its first procedure. In either case, it supplies a value that can be retrieved via a call (by some other thread) to GetExitCodeThread. One should be careful to distinguish between terminating a thread and terminating a process. With the latter, all the threads in the process are forcibly terminated. So, if any thread in a process calls ExitProcess, the entire process is terminated, along with its threads. Similarly, if a thread returns from main, this also terminates the entire process, since returning from main is equivalent to calling ExitProcess. The only thread that can legally return from main is the one that called it in the first place. All other threads (those that did not call main) certainly do not terminate the entire process when they return from their first procedures, they merely terminate themselves.
COM Threading Components don’t live on threads An instance is a ‘chunk’ of memory associated with an apartment Apartments determine which threads can call the component Thread switch is decided by the proxy based on apartment and threading model
COM Threading (STA vs. MTA) COM Object COM Object
COM Threading Example int main() { /* ::CoInitializeEx(NULL, COINIT_APARTMENTTHREADED); for STA */ ::CoInitializeEx(NULL, COINIT_MULTITHREADED); /* for MTA */ DisplayCurrentThreadId(); ILegacyCOMObject1Ptr spILegacyCOMObject1; spILegacyCOMObject1.CreateInstance(__uuidof(LegacyCOMObject1)); spILegacyCOMObject1 -> TestMethod1(); ::CoUninitialize(); return 0; }
Threading Model for Multicore System
Thread Management Program actively assigns software thread to hardware thread. Assign thread – strongly suggests which hardware thread should the software thread run on Program passively relies on window scheduler to assign software thread to hardware thread Efficiency of the threading is dependent upon the scheduling algorithm.
Hardware Design Variance Two hardware thread share one core This is known as simultaneous multi-threading (aka Hyper-Threading) Multiple cores within the same cpu, one or more hardware thread on each core Existing architecture includes dual-core, quad core.
Detecting multicore cpu and hardware thread Window relied on threading packages provided by processor manufactures to detect the number of cpu cores and available hardware Detect the cpu core topology – how many real hardware threads exist Detect the relationship between the hardware threads such as sharing data caches or sharing instructions set
Mechanics of Window Scheduler Preemptive, time slicing based Using system clock to interrupt each thread Each thread is allocated a fixed amount of time - quantum Priority Driven Highest priority ready thread always run first Higher priority thread will interrupt lower priority thread before its time slicing is used up, or even before it starts its quantum Manages processor affinity Assign a thread to a particular processor
Processor Preference for Window Scheduler Each thread maintains two CPU numbers stored in the kernel thread block: Ideal processor – the preferred processor the thread should run on (often specified by programmer) Last processor – the processor on which the thread last ran Scheduler processor assignment preference: If the ideal processor is idle, pick the ideal processor Pick the last processor if it is idle Pick the executing processor – where the current scheduling code is running Scan all the cpu from highest cpu number to lowest cpu number.
Additional Slides
Processes and Threads (1) Basic concepts used for CPU and resource management
Processes and Threads (2) Relationship between jobs, processes, threads, and fibers
Windows Threading Architecture
One-to-one Threading Model A process in Windows XP is inert; it executes nothing A process simply owns a 4GB address space that contains code and data for an application. In addition, a process owns other resources, such as files, memory allocations, and threads. Every process in Windows XP has a primary thread. Threads in Windows XP are kernel-level threads. Per-thread data structures: Total user/kernel time, kernel stack, thread-scheduling info., Thread-local storage array, thread environment block (TEB), List of objects thread is waiting on, synchronization info. Etc.
Fibers vs. Threads Fibers vs. Threads Fibers are often called “lightweight” threads. They allow an application to schedule its own “threads” of execution. Fibers are invisible to the kernel. They are implemented in user-mode in Kernel32.dll Fibers interface ConvertThreadToFiber() converts a thread to a running fiber. A new fiber can be created using CreateFiber(). The new fiber runs until it exits or until it calls SwitchToFiber(). Fibers provide a functionality of the many-to-many model.
Thread Cancellation Terminating a thread before it has finished Two general approaches: Asynchronous cancellation terminates the target thread immediately Deferred cancellation allows the target thread to periodically check if it should be cancelled
References 1. Detecting Multi-Core Processor Topology in an IA-32 Platform by Khang Nguyen and shiHjon Kuo (Intel software network) 2. Inside Microsoft Windows 2000 by David A Solomon and Mark E. Russinovich 3. Programming Windows 95 by Charles Petzold - Microsoft Press. Demo: diagnose a performance problem using perf counters We’ll start with an application under load that over allocates and show high allocation rates, % time in GC, and low RPS We’ll refine the app and reevaluate perf data, making incremental improvements that show improvements in key metrics