Thread Implementation Issues Andrew Whitaker
Where do Threads Come From? A few choices: The operating system A user-mode library Some combination of the two… How expensive are system calls?
Option #1: Kernel Threads Threads implemented inside the OS Thread operations (creation, deletion, yield) are system calls Scheduling handled by the OS scheduler Described as “one-to-one” One user thread mapped to one kernel thread Every invocation of Thread.start() creates a kernel thread process OS threads
Option #2: User threads Implemented as a library inside a process All operations (creation, destruction, yield) are normal procedure calls Described as “many-to-one” Many user-perceived threads map to a single OS process/thread process OS thread
Process Address Space Review Every process has a user stack and a program counter In addition, each process has a kernel stack and program counter (not shown here) stack SP heap (dynamic allocated mem) static data (data segment) code (text segment) PC
Threaded Address Space User address space (for both user and kernel threads) Every thread always has its own user stack and program counter For both user, kernel threads For user threads, there is only a single kernel stack, program counter, PCB, etc. thread 1 stack SP (T1) thread 2 stack SP (T2) thread 3 stack SP (T3) heap (dynamic allocated mem) static data (data segment) PC (T2) code (text segment) PC (T1) PC (T3)
User Threads vs. Kernel Threads User threads are faster Operations do not pass through the OS But, user threads suffer from: Lack of physical parallelism Only run on a single processor! Poor performance with I/O A single blocking operation stalls the entire application For these reasons, most major OS’s provide some form of kernel threads
When Would User Threads Be Useful? The 15-Puzzle solver? The web server? The Fibonacci GUI? This boils down to understanding why we were using threads for these applications in the first place. The pi calculator uses threads to take advantage of physical parallelism. Therefore, pure user threads are not very useful. Likewise, the web server uses multiple processors and blocking I/O operations -- both of which are not handled well by user threads. Only the fibonacci gui stands to benefit from using user threads -- essentially, we’re using threads to make the system more responsive.
Option #3: Two-level Model OS supports native multi-threading And, a user library maps multiple user threads to a single kernel thread “Many-to-many” Potentially captures the best of both worlds Cheap thread operations Parallelism process OS threads
Problems with Many-to-Many Threads Lack of coordination between user and kernel schedulers “Left hand not talking to the right” Specific problems Poor performance e.g., the OS preempts a thread holding a crucial lock Deadlock Given K kernel threads, at most K user threads can block Other runnable threads are starved out!
Scheduler Activations, UW 1991 Add a layer of communication between kernel and user schedulers Examples: Kernel tells user-mode that a task has blocked User scheduler can re-use this execution context Kernel tells user-mode that a task is ready to resume Allows the user scheduler to alter the user-thread/kernel-thread mapping Supported by newest release of NetBSD
Implementation Spilling Over into the Interface In practice, programmers have learned to live with expensive kernel threads For example, thread pools Re-use a static set of threads throughout the lifetime of the program
Locks Used for implementing critical sections interface Lock { public void acquire(); // only one thread allowed between an // acquire and a release public void release(); } Used for implementing critical sections Modern languages (Java, C#) implicitly acquire and release locks
Two Varieties of Locks Spin locks Blocking locks Threads busy wait until the lock is freed Thread stays in the ready/running state Blocking locks Threads yield the processor until the lock is freed Thread transitions to the blocked state
Why Use Spin Locks? Spin Locks can be faster No context switching required Sometimes, blocking is not an option For example, in the kernel scheduler implementation Spin locks are never used on a uniprocessor In general, busy waiting is bad. Why would we ever favor spin locks? Often spin locks
Bogus Spin Lock Implementation class SpinLock implements Lock { private volatile boolean isLocked = false; public void acquire() { while (isLocked) { ; } // busy wait isLocked = true; } public void release() { isLocked = false; Multiple threads can acquire this lock!
Hardware Support for Locking Problem: Lack of atomicity in testing and setting the isLocked flag Solution: Hardware-supported atomic instructions e.g., atomic test-and-set Java conveniently abstracts these primitives (AtomicInteger, and friends)
Corrected Spin Lock class SpinLock implements Lock { private final AtomicBoolean isLocked = new AtomicBoolean (false); public void acquire() { // get the old value, set a new value while (isLocked.getAndSet(true)) { ; } } public void release() { assert (isLocked.get() == true); isLocked.set(false);
Blocking Locks: Acquire Implementation Atomically test-and-set locked status If lock is already held: Set thread state to blocked Add PCB (task_struct) to a wait queue Invoke the scheduler Problem: must ensure thread-safe access to the wait queue!
Disabling Interrupts Prevents the processor from being interrupted Serves as a coarse-grained lock Must be used with extreme care No I/O or timers can be processed
Thread-safe Blocking Locks Atomically test-and-set locked status If lock is already held: Set thread state to blocked Disable interrupts Add PCB (task_struct) to a wait queue Invoke the scheduler Next task re-enables interrupts
Disabling Interrupts on a Multiprocessor Disabling interrupts can be done locally or globally (for all processors) Global disabling is extremely heavyweight Linux: spin_lock_irq Disable interrupts on the local processor Grab a spin lock to lock out other processors