Advanced Programming Rabie A. Ramadan Lecture 7. Multithreading An Overview 2 Some of the slides are exerted from Jonathan Amsterdam presentation.

Advanced Programming Rabie A. Ramadan Lecture 7

Multithreading An Overview 2 Some of the slides are exerted from Jonathan Amsterdam presentation

Processing Elements Architecture

Processing Elements ä Simple classification by Flynn: (No. of instruction and data streams) > SISD - conventional > SIMD - data parallel, vector computing > MISD - systolic arrays > MIMD - very general, multiple approaches. ä Current focus is on MIMD model, using general purpose processors. (No shared memory)

SISD : A Conventional Computer  Speed is limited by the rate at which computer can transfer information internally. Processor Data Input Data Output Instructions Ex: PC, Macintosh, Workstations

The MISD Architecture  More of an intellectual exercise than a practical configuration. Few built, but commercially not available

SIMD Architecture Ex: CRAY machine vector processing, C i <= A i * B i Instruction Stream Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C

MIMD Architecture Unlike SISD, MISD, MIMD computer works asynchronously. Shared memory (tightly coupled) MIMD Distributed memory (loosely coupled) MIMD Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C Instruction Stream A Instruction Stream B Instruction Stream C

MEMORYMEMORY BUSBUS Shared Memory MIMD machine Comm: Source PE writes data to GM & destination retrieves it  Easy to build, conventional OSes of SISD can be easily be ported  Limitation : reliability & expandability. A memory component or any processor failure affects the whole system.  Increase of processors leads to memory contention. Ex. : Silicon graphics supercomputers.... MEMORYMEMORY BUSBUS Global Memory System Processor A Processor A Processor B Processor B Processor C Processor C MEMORYMEMORY BUSBUS

MEMORYMEMORY BUSBUS Distributed Memory MIMD l Communication : based on High Speed Network. l Network can be configured to... Tree, Mesh, Cube, etc. l Unlike Shared MIMD  easily/ readily expandable  Highly reliable (any CPU failure does not affect the whole system) Processor A Processor A Processor B Processor B Processor C Processor C MEMORYMEMORY BUSBUS MEMORYMEMORY BUSBUS Memory System A Memory System A Memory System B Memory System B Memory System C Memory System C

Serial Vs. Parallel Q Please COUNTER COUNTER 1 COUNTER 2

Single and Multithreaded Processes Single-threaded Process Single instruction stream Multiple instruction stream Multiplethreaded Process Threads of Execution Common Address Space

OS: Multi-Processing, Multi-Threaded Application CPU Better Response Times in Multiple Application Environments Higher Throughput for Parallelizeable Applications CPU Threaded Libraries, Multi-threaded I/O

Multi-threading, continued... Multi-threaded OS enables parallel, scalable I/O Application CPU Application OS Kernel Multiple, independent I/O requests can be satisfied simultaneously because all the major disk, tape, and network drivers have been multi-threaded, allowing any given driver to run on multiple CPUs simultaneously.

Applications Could have One or More Process Program in Execution Consists of three components An executable program Associated data needed by the program Execution context of the program All information the operating system needs to manage the process

What are Threads? Ø Thread is a piece of code that can execute in concurrence with other threads. Ø It is a schedule entity on a processor 0 Local state 0 Global/ shared state 0 PC 0 Hard Context Registers Hardware Context Status Word Program Counter Thread Object

What is a Thread ? A single sequential flow of control A unit of concurrent execution. Multiple threads can exist within the same process and share memory resources (on the other hand, processes have each its own process space) All programs have at least one thread called “main thread”

Thread Resources Each thread has its own Program Counter (point of execution) Control Stack (procedure call/return) Data Stack (local Variables) All threads share Heap (objects) – dynamic allocated memory for the process Program code Class and instance variables

Threaded Process Model THREAD STACK THREAD DATA THREAD TEXT SHARED MEMORY Threads within a process l Independent executables l All threads are parts of a process hence communication easier and simpler.

The Multi-Threading Concept Task A UniProcessor A Threading library creates threads and assigns processor time to each thread T0 T1 T2

The Multi-Threading in Multi-Processors Task A Processor 1 T0 T1 T2 Processor 2 Processor 3 Processor 4

Why multiple Threads? Speeding up the computations Two threads, each solve half of the problem then combine their results Improving Responsiveness One thread computes while other handles the user interface One thread loads an image from the net while the other computes

Why multiple Threads? Performing house keeping tasks One thread does garbage collection while other computes One thread rebalances the search tree while the other uses the tree. Performing multiuser tasks Several threads run animation simultaneously (as an example)

Simple Example main : Run thread2 Forever Print 1 thread2 Forever Print 2 1 1 2 1 2 2 1 2

Scheduling Scheduler is part of the Operating System that determines which thread to run next Two types of schedulers Pre-emptive – can interrupt the running thread Cooperative – a thread must voluntarily yield Most modern O.S. are pre-emptive

Thread Life cycle New state: At this point, the thread is considered not alive. Runnable (Ready-to-run) state : � invoked by the start() method but not actually. The scheduler is aware of the thread but may be scheduled sometimes later Running state: � The thread is currently executing. Dead state: � If any thread comes on this state that means it cannot ever run again. Blocked - A thread can enter in this state because of waiting the resources that are hold by another thread.

Software Models for Multithreaded Programming Boss/worker model Work crew model Pipelining model Combinations of models

Boss/Worker Model One thread functions as the boss It assigns tasks to worker threads for them to perform. Each worker performs a different task until it has finished, at which point it notifies the boss that it is ready to receive another task. Alternatively, the boss polls workers periodically to see whether or not each worker is ready to receive another task. A variation of the boss/worker model is the work queue model. The boss places tasks in a queue, and workers check the queue and take tasks to perform

Work Crew Model Multiple threads work together on a single task. The task is divided horizontally into pieces that are performed in parallel Each thread performs one piece. Example: Group of people cleaning a building. Each person cleans certain rooms or performs certain types of work (washing floors, polishing furniture, and so forth), and each works independently.

Pipelining Model A task is divided vertically into steps. The steps must be performed in sequence to produce a single instance of the desired result. The work done in each step (except for the first and last) is based on the previous step and is a prerequisite for the work in the next step.

Combinations of Models You may find it appropriate to combine the software models in a single program if your task is complex.

Bad News Multithreaded programs are hard to write Hard to Understand They are incredibly hard to debug Anyone thinks that concurrent programming is easy should have his/her thread examined

Threads Assumptions Threads are executed in any order Not necessarily to alternate line by line Bugs may show up rarely Bugs may be hard to repeat More than one thread try to change memory at the same time Assumptions about the execution does not apply (E. G.) What is the value of i after i=1?

Memory Conflicts Two threads access the same memory location; they can conflict with each other The resulting state may be expected wrong E.g. Two states may try to increment a counter

Terminology Critical section: a section of code which reads or writes shared data Race condition: potential for interleaved execution of a critical section by multiple threads Results are non-deterministic Mutual exclusion: synchronization mechanism to avoid race conditions by ensuring exclusive execution of critical sections Deadlock: permanent blocking of threads Starvation: one or more threads denied resources; without those resources, the program can never finish its task.

Four requirements for Deadlock Mutual exclusion Only one thread at a time can use a resource. Hold and wait Thread holding at least one resource is waiting to acquire additional resources held by other threads No preemption Resources are released only voluntarily by the thread holding the resource, after thread is finished with it Circular wait There exists a set {T 1, …, T n } of waiting threads T 1 is waiting for a resource that is held by T 2 T 2 is waiting for a resource that is held by T 3 … T n is waiting for a resource that is held by T 1

Memory Synchronization

Thread Synchronization methods Mutex Locks Condition Variables Semaphore

Mutex Locks

If a data item is shared by a number of threads, race conditions could occur if the shared item is not protected properly. The easiest protection mechanism is a lock For every thread, before it accesses the set of data items, it acquires the lock. Once the lock is successfully acquired, the thread becomes the owner of that lock and the lock is locked. Then, the owner can access the protected items. After this, the owner must release the lock and the lock becomes unlocked. Another thread can acquire the lock

Mutex Locks the use of a lock simply establishes a critical section. Before entering a critical section, a thread acquires a lock. If it is successful, this thread enters the critical section and the lock is locked. As a result, all subsequent acquiring requests will be queued until the lock is unlocked.

Mutex Locks Restrictions Only the owner can release the lock Imagine the following situation. Suppose thread A is the current owner of lock L and thread B is a second thread who wants to lock the lock. If a non-owner can unlock a lock, thread B can unlock the lock that thread A owns, and, hence, either both threads may be executing in the same critical section, or thread B preempts thread A and executes the instructions of the critical section. Recursive lock acquisition is not allowed The current owner of the lock is not allowed to acquire the same lock again.

Mutex Example The Dining Philosophers Problem Imagine that five philosophers who spend their lives just thinking and eating. In the middle of the dining room is a circular table with five chairs. The table has a big plate of spaghetti. However, there are only five chopsticks available. Each philosopher thinks. When he gets hungry, he sits down and picks up the two chopsticks that are closest to him. If a philosopher can pick up both chopsticks, he eats for a while. After a philosopher finishes eating, he puts down the chopsticks and starts to think.

The Dining Philosophers Problem Analysis Philosopher Cycle Philosopher flow

C++ Language Support for Synchronization Languages with exceptions like C++ Languages that support exceptions are problematic (easy to make a non- local exit without releasing lock) Consider: void Rtn() { lock.acquire(); … DoFoo(); … lock.release(); } void DoFoo() { … if (exception) throw errException; … } Notice that an exception in DoFoo() will exit without releasing the lock

C++ Language Support for Synchronization (con’t) Must catch all exceptions in critical sections Catch exceptions, release lock, and re-throw exception: void Rtn() { lock.acquire(); try { … DoFoo(); … } catch (…) {// catch exception lock.release();// release lock throw; // re-throw the exception } lock.release(); } void DoFoo() { … if (exception) throw errException; … } Even Better: auto_ptr facility. See C++ Spec. Can deallocate/free lock regardless of exit method

Java Language Support for Synchronization Java has explicit support for threads and thread synchronization Bank Account example: class Account { private int balance; // object constructor public Account (int initialBalance) { balance = initialBalance; } public synchronized int getBalance() { return balance; } public synchronized void deposit(int amount) { balance += amount; } } Every object has an associated lock which gets automatically acquired and released on entry and exit from a synchronized method.

Condition Variables

Condition Variables (CV) A condition variable allows a thread to block its own execution until some shared data reaches a particular state. A condition variable is a synchronization object used in conjunction with a mutex. A mutex controls access to shared data; A condition variable allows threads to wait for that data to enter a defined state. A mutex is combined with CV to avoid the race condition.

Condition Variable Waiting and signaling on condition variables Routines pthread_cond_wait(condition, mutex) Blocks the thread until the specific condition is signalled. Should be called with mutex locked Automatically release the mutex lock while it waits When return (condition is signaled), mutex is locked again pthread_cond_signal(condition) Wake up a thread waiting on the condition variable. Called after mutex is locked, and must unlock mutex after pthread_cond_broadcast(condition) Used when multiple threads blocked in the condition

Condition Variable – for signaling Think of Producer – consumer problem Producers and consumers run in separate threads. Producer produces data and consumer consumes data. Producer has to inform the consumer when data is available Consumer has to inform producer when buffer space is available

Without Condition Variables

/* Globals */ int data_avail = 0; pthread_mutex_t data_mutex = PTHREAD_MUTEX_INITIALIZER; void *producer(void *) { Pthread_mutex_lock(&data_mutex); Produce data Insert data into queue; data_avail=1; Pthread_mutex_unlock(&data_mutex); }

void *consumer(void *) { while( !data_avail ); /* do nothing – keep looping!!*/ Pthread_mutex_lock(&data_mutex); Extract data from queue; if (queue is empty) data_avail = 0; Pthread_mutex_unlock(&data_mutex); consume_data(); }

With Condition Variables

int data_avail = 0; pthread_mutex_t data_mutex = PTHREAD_MUTEX_INITIALIZER; pthread_cont_t data_cond = PTHREAD_COND_INITIALIZER; void *producer(void *) { Pthread_mutex_lock(&data_mutex); Produce data Insert data into queue; data_avail = 1; Pthread_cond_signal(&data_cond); Pthread_mutex_unlock(&data_mutex); }

void *consumer(void *) { Pthread_mutex_lock(&data_mutex); while( !data_avail ) { /* sleep on condition variable*/ Pthread_cond_wait(&data_cond, &data_mutex); } /* woken up */ Extract data from queue; if (queue is empty) data_avail = 0; Pthread_mutex_unlock(&data_mutex); consume_data(); }

So far …. 58 Condition Variable: a queue of threads waiting for something inside a critical section Key idea: allow sleeping inside critical section by atomically releasing lock at time we go to sleep Lock: provides mutual exclusion to shared data: Always acquire before accessing shared data structure Always release after finishing with shared data

Semaphore An extension to mutex locks A semaphore is an object with two methods Wait and Signal, a private integer counter and a private queue (of threads).

Example Assume that in our corporate print room, we have 5 printers online. Our print spool manager allocates a semaphore set with 5 semaphores in it, one for each printer on the system. Since each printer is only physically capable of printing one job at a time, each of our five semaphores will be initialized to a value of 1 (one), meaning that they are all online, and accepting requests. John sends a print request to the spooler. The print manager looks at the semaphore set, and finds the first semaphore which has a value of one. Before sending John's request to the physical device, the print manager decrements the semaphore for the corresponding printer by a value of negative one (-1). Now, that semaphore's value is zero.

Example A value of zero represents 100% resource utilization on that semaphore. In our example, no other request can be sent to that printer until it is no longer equal to zero. When John's print job has completed, the print manager increments the value of the semaphore which corresponds to the printer. Its value is now back up to one (1), which means it is available again.

Semaphore Synchronized counting variables Formally, a semaphore comprises: An integer value Two operations: P() and V() P() – (e.g. consumer) - also known as wait () While value == 0, sleep Decrement value V() – (e.g. producer)--- also known as signal() Increment value If there are any threads sleeping waiting for value to become non-zero, wakeup at least 1 thread

Assignment is posted 63

Advanced Programming Rabie A. Ramadan Lecture 7. Multithreading An Overview 2 Some of the slides are exerted from Jonathan Amsterdam presentation.

Similar presentations

Presentation on theme: "Advanced Programming Rabie A. Ramadan Lecture 7. Multithreading An Overview 2 Some of the slides are exerted from Jonathan Amsterdam presentation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Advanced Programming Rabie A. Ramadan Lecture 7. Multithreading An Overview 2 Some of the slides are exerted from Jonathan Amsterdam presentation.

Similar presentations

Presentation on theme: "Advanced Programming Rabie A. Ramadan Lecture 7. Multithreading An Overview 2 Some of the slides are exerted from Jonathan Amsterdam presentation."— Presentation transcript:

Similar presentations

About project

Feedback