Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to parallel programming modelS

Similar presentations


Presentation on theme: "Introduction to parallel programming modelS"— Presentation transcript:

1 Introduction to parallel programming modelS
CS 5802 Monica Borra

2 Overview Types of parallel programming models Shared memory Model
OpenMP POSIX Threads Cilk/Cilk Plus/Cilk Plus Plus Thread Building Blocks

3 Parallel Programming Model
A set of software technologies to express parallel algorithms and match applications with the underlying parallel systems. “an abstraction above hardware and memory architectures” Types of Parallel Programming Models: Shared Memory model, Threads Model, Distributed Memory model and Hybrid Models

4 Programming models NOT specific to a particular type of machine or memory architecture.
“Virtual Shared Memory” Machine memory is physical distributed across networked machines, but appeared to the user as a single shared memory global address space.  Every task has direct access to global address space yet the ability to send and receive messages using MPI can be implemented.

5 Shared Memory Common block of read/write memory among processes Create
Shared memory segment is created by the first process. Other processes know the key and have access to the shared memory segment. So, they can attach and share with one another. Create Shared Memory (unique key) MAX ptr Attach ptr Attach Proc. 3 Proc. 4 Proc. 5 ptr Proc. 2 Proc. 1 int shmget(key_t key, size_t size, int shmflg);

6 Thread Models Program is a collection of threads of control.
Can be created dynamically, mid-execution, in some languages Each thread has a set of private variables, e.g., local stack variables Also a set of shared variables, e.g., static variables, shared common blocks, or global heap. Threads communicate implicitly by writing and reading shared variables. Data Racing Problem. - Require synchronization to ensure that no more than one thread is updating the same global address at any time.

7 Several Thread Libraries/systems
PTHREADS is the POSIX Standard OpenMP standard for application level programming TBB: Thread Building Blocks CILK: Language of the C “ilk” Java threads

8 Distributed memory model
A set of tasks that use their own local memory during computation. Multiple tasks can reside on the same physical machine and/or across an arbitrary number of machines. Tasks exchange data through communications by sending and receiving messages. Data transfer usually requires cooperative operations to be performed by each process. For example, a send operation must have a matching receive operation.

9 Open Multi Processing A simple API that allows to add parallelism into existing source code without significantly having to rewrite it. Programming in C/C++/Fortran. It is a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer It is composed of a set of compiler directives, library routines, and environment variables Easier to understand and maintain.

10 Fork-Join Model.

11 (Note that launching more thread than number of processing unit available can actually slow down the whole program )  OpenMP Since it is compiler directive based, it requires a compiler that supports. The directives can be added incrementally – gradual parallelization.

12 OpenMP Example: include<iostream> #include<omp.h>
using namespace std; /******************************************************************** Sample OpenMP program which at stage 1 has 4 threads and at stage 2 has 2 threads **********************************************************/ int main() { #pragma omp parallel  num_threads(4) //*create 4 threads and region inside it will be executed by all   threads . */   #pragma omp critical//allow one thread at a time to access below statement   cout<<" Thread Id  in OpenMP stage 1=  "<<omp_get_thread_num()<< endl; }  //here all thread get merged into one thread id cout<<"I am alone"<<endl; #pragma omp parallel num_threads(2)//create two threads   cout<<" Thread Id  in OpenMP stage 2=  "<<omp_get_thread_num()<<  endl;; }  Command to run executable  with name a.out  on Linux :  /a.out  Output               Thread Id  in OpenMP stage 1= 2               Thread Id  in OpenMP stage 1=0         Thread Id  in OpenMP stage 1=3         Thread Id  in OpenMP stage 1= 1         I am alone         Thread Id  in OpenMP stage 2= 1         Thread Id  in OpenMP stage 2=0

13 OpenMP Advantages Programmer need not specify the processors ( nodes)
No need for message passing since it uses a shared memory Its style of coding fits for both serial and parallel paradigms Ability to deal with coarse-grain parallelism with shared memory Disadvantages Runs efficiently only on shared memory platforms. Scalability is hindered due to shared memory architecture No reliable error handling mechanisms. Synchronization between subset threads isn’t allowed.

14 POSIX THREADS POSIX: Portable Operating System Interface for UNIX - Interface to Operating System utilities PThreads: The POSIX threading interface Implementations of the API are available in C/C++ on many Unix-like OS. However, we need third-party packages such as pthreads-w32, which implements pThreads on top of existing Windows API. Pthreads defines a set of programming language types, functions and constants. It is implemented with a pthread.h header and a thread library. There are around 100 Pthreads procedures, all prefixed "pthread_" and they can be categorized into four groups: Thread Management, Mutexes, Condition Variables, Synchronization.

15 Forking a POSIX Thread:
int pthread_create(pthread_t *, const pthread_attr_t *, void * (*)(void *), void *); Example call: errcode = pthread_create(&thread_id; &thread_attribute &thread_fun; &fun_arg); thread_id is the thread id or handle (used to halt, etc.) thread_attribute various attributes a. Standard default values obtained by passing a NULL pointer b. Sample attribute: minimum stack size thread_fun the function to be run (takes and returns void*) fun_arg an argument can be passed to thread_fun when it starts errorcode will be set nonzero if the create operation fails

16 Some other functions: pthread_yield();
Informs the scheduler that the thread is willing to yield its quantum, requires no arguments. pthread_exit(void *value); Exit thread and pass value to joining thread (if exists) pthread_join(pthread_t *thread, void **result); Wait for specified thread to finish. Place exit value into *result. pthread_t me; me = pthread_self(); Allows a pthread to obtain its own identifier pthread_t thread; pthread_detach(thread); Informs the library that the threads exit status will not be needed by subsequent pthread_join calls resulting in better threads performance.

17 Simple Example: void* SayHello(void *foo) {
printf( "Hello, world!\n" ); return NULL; } int main() { pthread_t threads[16]; int tn; for(tn=0; tn<16; tn++) { pthread_create(&threads[tn], NULL, SayHello, NULL); for(tn=0; tn<16 ; tn++) { pthread_join(threads[tn], NULL); return 0; Compile using gcc –lpthread

18 CILK/CILK PLUS/CILK++
Programming Languages which extend C and C++. Initially developed by MIT, based on ANSI C now belongs to Intel. Initial applications of Cilk were only in high performance computing. Intel Cilk Plus keywords: cilk_spawn - Specifies that a function call can execute asynchronously, without requiring the caller to wait for it to return. This is an expression of an opportunity for parallelism, not a command that mandates parallelism. The Intel Cilk Plus runtime will choose whether to run the function in parallel with its caller. cilk_sync - Specifies that all spawned calls in a function must complete before execution continues. There is an implied cilk_sync at the end of every function that contains a cilk_spawn. cilk_for - Allows iterations of the loop body to be executed in parallel. Also introduces  "Reducers” provide a lock-free mechanism that allows parallel code to use private "views" of a variable which are merged at the next sync.

19 Example of Cilk Plus int fib(int n) { if (n < 2) return n; int x = cilk_spawn fib(n-1); int y = fib(n-2); cilk_sync; return x + y; } Uses the header file <cilk/cilk.h> for (int i = 0; i < 8; ++i) { cilk_spawn do_work(i); } cilk_sync; cilk_for (int i = 0; i < 8; ++i) { do_work(i); }

20 Thread Building Blocks(TBB)
 A C++ template library developed by Intel for parallel programming on multi-core processors. TBB enables you to specify tasks instead of Threads TBB is compatible with other threading packages A TBB program creates, synchronizes and destroys graphs of dependent tasks according to algorithms, i.e. high-level parallel programming paradigms ( Algorithmic Skeletons) TBB emphasize scalable, data parallel programming Optimizes core utilization. May result in scheduling overhead.

21 TBB COMPONENTS Basic algorithms:
parallel_for, parallel_reduce, parallel_scan Advanced algorithms: parallel_while, parallel_do, parallel_pipeline, parallel_sort Containers: concurrent_queue, concurrent_priority_queue, concurrent_vector, concurrent_hash_map Memory allocation: scalable_malloc, scalable_free, scalable_realloc, scalable_calloc, scalable_allocator, cache_aligned_allocator Mutual exclusion: mutex, spin_mutex, queuing_mutex, spin_rw_mutex, queuing_rw_mutex, recursive_mutex Atomic operations: fetch_and_add, fetch_and_increment, fetch_and_decrement, compare_and_swap, fetch_and_store TBB relies on generic programming. It is similar to Standard Tag Library. Detailed explanation of TBB Components

22

23 High Performance Fortran
Extension of Fortran 90 with constructs that support parallel computing Allows efficient implementation on both SIMD and MIMD style architectures Implicit parallelizing (mapping, distribution, communication, synchronization) High productivity

24 Parallel Virtual Machine (pvm)
Enables a collection of heterogeneous computers to be used as a coherent and flexible concurrent computational resource. Supports software execution on each machine in a user-configurable pool Heterogeneous applications that can exploit specific strengths of individual machines on a network. Set of dynamic resource manager and powerful process control functions Fault tolerant (that can survive host or task failures) and portable.

25 THANK YOU!


Download ppt "Introduction to parallel programming modelS"

Similar presentations


Ads by Google