CSE 410, Spring 2008 Computer Systems Threads CSE 410, Spring 2008 Computer Systems http://www.cs.washington.edu/410 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Reading and References Chapter 4, Operating System Concepts, Silberschatz, Galvin, and Gagne See 3.6 for networking examples Other References www.java.sun.com Microsoft Windows Internals Pthreads Programming, Nichols, Buttlar and Farrell 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
A Process A complete process includes numerous things address space (all the code and data pages) OS resources and accounting information a “thread of control”, which defines where the process is currently executing the Program Counter CPU registers 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Thread V.S. Process Thread: a flow of control (lightweight) Contains a PC, a SP, and shares the A.S. Process: A heavyweight unit of control PCB and other data structures It’s own address space Resources, files, etc. A thread container A repository for all that describes an active program 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Processes are heavyweight objects Creating a new process is costly lots of data must be allocated and initialized operating system control data structures memory allocation for the process Communicating between processes is costly most communication goes through the OS need a context switch for each process 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Parallelism With multiple paths of execution, we can implement (or simulate) simultaneous actions Why build a parallel program? responsiveness to user user interface always responds quickly server handling simultaneous requests (web, etc.) each request is handled independently execute faster on a multiprocessor two CPUs can run two programs at once 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Parallel processes are expensive There’s a lot of performance cost creating separate processes coordinating them through the OS There’s a lot of duplication same program code, protection, etc… Maybe there’s a simpler way (at least some of the time) ... 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Process definition What is fundamental in a process? Code and data Access and control privileges Operating system management scheduling, address space/memory map, ... What else is there? Program Counter, registers, and stack Separate the idea of “process” from the idea of a “thread of control” (PC, SP, registers) 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Threads are “Lightweight Processes” Most operating systems now support two entities the process, which defines the address space and general process attributes the thread, which defines one or more execution paths within a process Threads are the unit of scheduling Processes are the “containers” in which threads execute 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Multi-threaded design benefits Separating execution path from address space simplifies design of parallel applications Some benefits of threaded designs improved responsiveness to user actions handling concurrent events (e.g., web requests) simplified program structure (code, data) more efficient and so less impact on system map easily to multi-processor systems 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
One thread Three threads stack stack 1 $sp $sp1 stack 2 $sp2 stack 3 $sp3 heap heap PC1 code code PC PC3 PC2 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Cookbook Analogy Think of a busy kitchen 3 cooks and 1 cookbook Each cook maintains a pointer to where they are in the cookbook (the Program Counter) Two cooks could both be making the same thing (threads running the same procedure) The cooks must coordinate access to the kitchen appliances (resource access control) 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Implementation A thread is bound to the process that provides its address space Each process has one or more threads How are threads actually implemented? Kernel threads In the kernel (OS) and user mode libraries combined User threads In user mode libraries alone 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Kernel Threads The operating system knows about and manages the threads in every program Thread operations (create, yield, ...) all require kernel involvement Major benefit is that threads in a process are scheduled independently one blocked thread does not block the others threads in a process can run on different CPUs 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Kernel Thread Performance Kernel threads have performance issues Even though threads avoid process overhead, operations on kernel threads are still slow a thread operation requires a kernel call kernel threads may be overly general, in order to support needs of different users, languages, etc. the kernel can’t trust the user, so there must be lots of checking on kernel calls 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
User Threads To make thread operations faster, they can be implemented at the user level Each thread is managed by the run-time system user-mode libraries are linked with your program Each thread is represented simply by a PC, registers, stack and a control block, managed in the user’s address space 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
User Thread Performance All activities happen in user address space so thread operations can be faster But OS scheduling takes place at process level block entire process if a single thread is I/O blocked may run a process that is just running an idle thread Win2K provides “fibers” as user mode threads application can schedule its own “lightweight threads” in user mode code 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Multithreaded Operating Systems Modern O.S.es are multithreaded A thread for interrupts, memory management, etc. Even a thread to do nothing If managed code: a thread for G.C. A thread monitoring user input All use some mapping of user to kernel threads Many-to-one (*nix model) One-to-one (XP model) Many-to-many Two level model: Many-to-many but allow one-to-one processor affinity binding (old solaris, HP-UX) 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Many-To-One Many user threads map to one kernel thread Thread management in user code (+) Whole process blocks if any one thread makes a blocking system call (-) One thread can access the kernel at a time, meaning no MP support (-) What is an obvious extension here? Hint: think thread pools. 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
One-to-One Every user thread maps to its own kernel thread Greater concurrency (+) Only block blocking threads (+) Creating a user thread -> creating a kernel thread (-) So much thread creation and overhead implies an upper bound on the user thread count (-) 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Many-to-Many Many user threads multiplex to an equal or lesser number of kernel threads Kernel thread count can be tuned per architecture, or per application (+) Essentially, a kernel managed thread pool Removes the limitation in 1:1 regarding the number of percieved threads a user can create True MP and MT support Blocked thread reassignment 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Implementation of Thread Libraries Initialize attributes and thread information Stack size, pri, etc. Create & run thread Parent thread must be able to wait for children Join or wait() Child threads must signal to parents an exit return from function or pthread_exit() 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
POSIX Threads #include <pthread.h> //shareware on win32 void *runnable(void *param); main{ pthread_t tid; pthread_attr_t attr; pthread_attr_init( &attr ) ; //default attr pthread_create( &tid, &attr, runnable, argv[1] ); pthread_join( tid, null); //wait for } 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Pthreads Continued Void *runner(void *param) { } //do stuff pthread_exit(0); } // *nix, Mac OS X use this standard //note: create is really create_and_run Unlike Java, where new isn’t create or run //in pthreads, we signal the exit explicitly 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Win32 Threads #include <windows.h> //to access api DWORD sum; //global shared data here Main{ int param; HANDLE threadHandle = CreateThread( NULL, 0, runFctn, ¶m, 0, &threadId); WaitForSingleObject(threadHandle); CloseHandle(threadHandle); } 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Win32 Threads Cont } DWORD WINAPI runFctn(LPVOID param) { DWORD uLInt = *(DWORD*)param; //do something here return 0; } //no official exit function (like pthreads), just return //notice creation flags allow for suspended start 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Java Threads Rather generic threading application Two approaches Quick and functional Full inheritance See example 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
Multithreading Performance T/F: Multithreading always increases performance? Easy to make a multi-threaded app perform worse than a well-engineered single thread? T/F: Just like pipelining, an architecture with support for 4 threads is 4x faster. Now that we have MP that can leverage MT We introduce new dilemmas, such as Deadlock Memory coherence and corruption Concurrency control Synchronization holes in your locks Starvation 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington
XNA Threading Specific architectures {360, Zune, PC} 360: 3 chips, 2 cores per See SetProcessorAffinity(); 12/4/2018 cse410-20-threads © 2006-07 Perkins, DW Johnson and University of Washington