Multiprocessors and Multi-computers Multi-computers –Distributed address space accessible by local processors –Requires message passing –Programming tends.

Multiprocessors and Multi-computers Multi-computers –Distributed address space accessible by local processors –Requires message passing –Programming tends to be more difficult Multiprocessors –Single address space accessible by all processors –Simultaneous access to shared variables can produce inconsistent results –Generally programming is more convenient –Doesn’t scale to more than about sixteen processors

Shared Memory Hardware Bus ProcessesMemory modules Bus configurationCrossbar switch configuration Memory Modules Processes

Shared Memory Access Critical Section –A section of code that needs to be protected from simultaneous access Mutual Exclusion –The mechanism used to enforce a critical section –Locks –Semaphores –Monitors –Condition Variables x =1=2 Process 1 Process 2 Shared Variable

Locks Single bit variable –1=locked, 0=unlocked –“Enter door and lock the door at entry” Spin locks (busy wait locks) –While (lock==1) spin(); lock = 1; // Critical section lock = 0; Advantages –Simple and easy to understand Disadvantages –Poor use of the CPU if process does not block while waiting –It’s easy to skip the lock=0 statement Examples: Pthreads, openMP Locks are the simplest mutual exclusion mechanism Note: The while and lock setting must be atomic

Semaphores Allows limited concurrent access to a critical section An integer variable, s, controls the mechanism Operations –P operation: passeren in Dutch for: to pass s--; while (s<0) wait(); // Critical section code –V operation: vrigeven in Dutch for: to release s++; if (s<=0) unblock a waiting process; p(s); // Critical section v(s); Notes –Set s=1 initially for s to be a binary semaphore which acts like a lock. –Set s=k>1 initially if k simultaneous entries are possible –Set s=k<=0 for consumer processes waiting to consume data produced. Disadvantage –Easy to skip the v operation Example: UNIX

Monitors A Class mechanism that limits access to a shared resource public class doIt { public doIt() {//Constructor logic} public synchronized void critMethod() { wait(); // Wait till another thread signals notify(); } } Advantage: Most natural mutual exclusive mechanism Disadvantage: Requires a language that supports the construct Examples: Java, ADA, Modula II

Condition Variables Advantages: –Reduce overhead associated with checking global conditions –Avoids having to frequently “poll” the global variable Disadvantage: Its easy to skip the unlock operations Example: Pthreads Notes: –wait() unlocks and locks mutex automatically –Threads must already be waiting for a signal when it is thrown Example Thread 1 lock(mutex) while (c<>VALUE) wait(cVar,mutex) // Critical section unlock(mutex); Thread 2 if (c==VALUE) signal(condVar) Mechanism to guarantee a global condition before critical section entry

Deadlock Necessary Conditions –Circular Wait –Limited Resource –Non-preemptive –Hold and Wait Deadly Embrace R1R1 R2R2 R n-1 RnRn P1P1 P2P2 P n-1 PnPn R1R1 R2R2 P1P1 P2P2 Two Process Deadlock Resources needed by a process can never be allocated

Cache Coherence Cache Coherence Protocol –Update: All caches immediately updated with altered data –Invalidate: Altered data is invalidated in all caches. Updates take place only if subsequently referenced. False Sharing: Cache updates take place because multiple processes access the same cache block but not the same locations Processor 1 Processor 2 Memory x y y x Cache Blocks

Multiprocessor Cache Considerations Cache coherence in multiprocessor systems –Temporal locality is reduced Smart compiler optimizations require knowledge of cache to alter data layout Optimizations might require special programming Multiple cache exist instead of one Shared memory distributed systems –NUMA (SGI Origin), COMA (KSR) –Hardware or software cache coherence algorithms Performance of software coherence can be inadequate

Shared Memory Programming Alternatives Heavyweight processes Thread programming standard –Pthreads –Java Threads Programming language designed for parallel processing (ADA) Library routines with an existing language Modified syntax of an existing language (HP Fortran) Compiler directives to specify parallel programs (OpenMP)

Heavyweight Processes Operating system switches context from one process to another Advantage: A blocked process doesn’t block the other processes Not popular because overhead is significant –Separate process control blocks UNIX pid = fork(); if (pid==0) { //slave code } else { //parent code } if (pid==0)exit(0); else wait(0); Fork FORK JOIN

Using Heavyweight Processes Create new processes pid= fork(); Wait for any process to complete pid=wait(&status); Wait for specific process to complete Pid = waitpid(pid, &status, blockingcall); Exit process exit(status); Attaching shared memory id=shmget(IPC_PRIVATE,size); addr=shmat(id, NULL, 0);

Processes and Threads Code Resources Listeners Heap Stack IP Code Resources Listeners Heap Stack IP Stack IP Single Thread Process Dual Thread Process Notes: Threads can be three orders of magnitude faster than processes Thread safe library routines can be used by multiple concurrent threads Synchronization uses shared variables

Pthreads – Create and Join Spawn an attached thread pthread_create (&thread1, NULL, proc1, &arg). Pthread_join(thread1, status) Thread execution void proc1(&arg) { // Thread code return(*status); } Detatched threads –Join is not needed –The OS destroys thread resources when they terminate –A parameter in the create call indicates a detached thread IEEE Portable Operating System Interface Standard (POSIX) Note: The Pthreads library must be available

Pthread Locks Mutex variables –Wait until lock opens before proceeding –Thread selection when lock opens is non-deterministic –Only the thread owning the lock can unlock it Example: declaring a Pthread lock Pthread_mutex_t mutex; pthread_mutex_lock(&mutex); // Critical section code Pthread_mutex_unlock(&mutex); Example: test if a lock is open –Returns ‘EBUSY’ if already locked or 0 if open Pthread_mutex_trylock(mutex)

Pthread Condition Variables Declarations pthread_con_t cond1; pthread_mutex_t mutex1; Thread 1 pthread_mutex_lock(&mutex1); while (cond1<>0) pthread_cond_wait(cond1, mutex1); pthread_mutex_unlock(&mutex1); // Code to take action Thread 2 pthread_mutex_lock(&mutex1); cond1--; if (cond1==0) pthread_cond_signal(cond1); pthread_mutex_unlock(&mutex1); Example Code

Java Threads Instantiate and run a thread ThreadClass t = new ThreadClass(); t.start(); Thread class Class ThreadClass extends thread { public ThreadClass {//Constructor} public void run() { while (true) { //yield or sleep periodically. //thread code entered here. } } }

Modify Existing Language Syntax Declaration of a shared memory variable shared int x; Specify statements to execute concurrently par { s1(); s2(); s3(); … sn(); } Iterations assigned to different processors forall (i=0; i<n; i++) { //code } Examples: High Performance Fortran and C Example Constructs

Smart Compilers Perform Bernstein’s Dependency Analysis –I i is the set of memory locations read by a block of code i. –O i is the set of memory locations written by a block of code i. –If I i ∩O j = Ф and O i ∩O j = Ф where i≠j, the code can execute in parallel –Questions: Consider instructions: ‘a=x+y and b=x+z’ Can these instructions execute in parallel? What are I 1, I 2, O 1, and O 2 ? What are I 1 ∩O 2, I 2 ∩O 1, and O 1 ∩O 2 ? Other optimizations –Relocate data to optimize cache utilization –Reorder instructions to optimize register use –Loop decomposition example: Before:for (i=3;i<=20;i++) a[i] = a[i-1]+4; After:for (i=3;i<=20;i+=2) a[i]=a[i-2]+4; for (i=4;i<=20;i+=2) a[i]=a[i-2]+4;

Compiler Directives (openMP) Contains compiler directives and library directives Recognized industry standard developed in the late 1990s Designed for shared memory programming Interfaces to C/C++, Fortran, and Java (JOMP) Uses fork-join model based on threads Parallel sections of code execute using “teams of threads” openMP statements have the form –C: #pragma omp –JOMP: //omp

OMP Parallel Directive Description –All threads execute the structured block –An implicit barrier is at the end of the block #pragma omp parallel { // code here } Example #pragma omp parallel private(x, num_threads) { x = omp_get_thread_num(); num_threads = omp_get_num_threads(); a[x] = 10*num_threads; } Notes –Omp_get_thread_num() returns the rank of the current thread –Omp_get_num_threads() returns the total number of threads –A[] is global, x and num_threads are local to each thread

OMP: Establish Thread Team Size Techniques that programs can use –Specify in the parallel directive #pragma parallel num_threads(value) –Call the library routine: omp_set_num_threads(value) –Set the environment variable: OMP_NUM_THREADS –Use system dependent value if not specified Notes: –The parallel directive sets the team size for that directive –Other approaches set the value which remains until it is changed

OMP: Section and For Constructs Parallel sections (teams of threads share structured blocks) –Syntax #pragma omp sections { #pragma omp section {//block of code} #pragma omp section {//block of code}... } –Note: There is an implied barrier at the end of the construct unless nowait appears on the pragma line Parallel for loop –Syntax: #pragma omp for (i=0; i<MAX; i++) {block of code} –Note: There is an implied barrier at the end of each iteration unless nowait appears on the pragma line

OMP: Sequential within Parallel Blocks Single: the block executed by one of the threads –Syntax: #pragma omp single {//code} –Note: There is an implied barrier at the end of the construct unless nowait appears on the pragma line Master: The block executed by the master thread –Syntax: #pragma omp master {//code} –Note: There is no implied barrier in this construct

OMP: Critical Sections and Synchronization Critical Sections (one thread at a time) –Syntax: #pragma omp critical name {//code} –Notes The name ties all critical sections together with that name The system uses a global default if name isn’t specified Barrier –Syntax: #pragma omp barrier –Note: All threads must be able to reach the barrier Atomic expression –Syntax: #pragma omp atomic Note: Minimize critical sections or waiting processors execute serially

Example Program (summing 1000 numbers) Heavyweight UNIX processes (Section 8.7.1) –Concurrency: fork, exit, and wait. –Shared memory: shmget, shmat, shmctl. –Data sharing: semget, semctl, semop to create P,V functions. Lightweight threads (Section 8.7.2) –Concurrency: pthread create and join –Shared Memory: automatic –Data sharing: pthred mutex init, lock, and unlock. Java (Section 8.7.3) –Concurrency: Thread extension class –Shared Memory: automatic –Data Sharing: synchronized methods provide monitor capability

Multiprocessors and Multi-computers Multi-computers –Distributed address space accessible by local processors –Requires message passing –Programming tends.

Similar presentations

Presentation on theme: "Multiprocessors and Multi-computers Multi-computers –Distributed address space accessible by local processors –Requires message passing –Programming tends."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multiprocessors and Multi-computers Multi-computers –Distributed address space accessible by local processors –Requires message passing –Programming tends.

Similar presentations

Presentation on theme: "Multiprocessors and Multi-computers Multi-computers –Distributed address space accessible by local processors –Requires message passing –Programming tends."— Presentation transcript:

Similar presentations

About project

Feedback