Introduction to parallel programming modelS

Slides:



Advertisements
Similar presentations
Parallel Processing with OpenMP
Advertisements

NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Threads. Readings r Silberschatz et al : Chapter 4.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Threads© Dr. Ayman Abdel-Hamid, CS4254 Spring CS4254 Computer Network Architecture and Programming Dr. Ayman A. Abdel-Hamid Computer Science Department.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
The University of Adelaide, School of Computer Science
10/16/ Realizing Concurrency using the thread model B. Ramamurthy.
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
Threads and Thread Control Thread Concepts Pthread Creation and Termination Pthread synchronization Threads and Signals.
Includes slides from course CS194 at UC Berkeley, by prof. Katherine Yelick Shared Memory Programming Pthreads: an overview Ing. Andrea Marongiu
Multicore Programming (Parallel Computing) HP Training Session 5 Prof. Dan Connors.
CS333 Intro to Operating Systems Jonathan Walpole.
Lecture 7: POSIX Threads - Pthreads. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
12/22/ Thread Model for Realizing Concurrency B. Ramamurthy.
Threads A thread is an alternative model of program execution
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
NCHU System & Network Lab Lab #6 Thread Management Operating System Lab.
7/9/ Realizing Concurrency using Posix Threads (pthreads) B. Ramamurthy.
Tutorial 4. In this tutorial session we’ll see Threads.
A thread is a basic unit of CPU utilization within a process Each thread has its own – thread ID – program counter – register set – stack It shares the.
Chapter 4 – Thread Concepts
Realizing Concurrency using the thread model
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Threads Some of these slides were originally made by Dr. Roger deBry. They include text, figures, and information from this class’s textbook, Operating.
Chapter 4: Threads.
Realizing Concurrency using the thread model
Threads.
CS427 Multicore Architecture and Parallel Computing
Day 12 Threads.
Chapter 4 – Thread Concepts
Threads Threads.
CS399 New Beginnings Jonathan Walpole.
Chapter 2 Processes and Threads Today 2.1 Processes 2.2 Threads
Chapter 4: Threads.
Multithreading Tutorial
Computer Engg, IIT(BHU)
Computer Science Department
Threads and Cooperation
Chapter 4: Threads Overview Multithreading Models Thread Libraries
Realizing Concurrency using Posix Threads (pthreads)
CS4961 Parallel Programming Lecture 5: Data and Task Parallelism, cont
Chapter 4: Threads.
Chapter 4: Threads & Concurrency
Realizing Concurrency using the thread model
PTHREADS AND SEMAPHORES
Multithreading Tutorial
Realizing Concurrency using the thread model
Threads and Concurrency
Operating System Concepts
Multithreading Tutorial
Programming with Shared Memory
Jonathan Walpole Computer Science Portland State University
Multithreading Tutorial
Realizing Concurrency using the thread model
Realizing Concurrency using Posix Threads (pthreads)
Realizing Concurrency using the thread model
Chapter 4: Threads & Concurrency
Chapter 4: Threads.
Programming with Shared Memory
Realizing Concurrency using Posix Threads (pthreads)
Tutorial 4.
Chapter 4: Threads.
Shared Memory Programming with Pthreads
Presentation transcript:

Introduction to parallel programming modelS CS 5802 Monica Borra

Overview Types of parallel programming models Shared memory Model OpenMP POSIX Threads Cilk/Cilk Plus/Cilk Plus Plus Thread Building Blocks

Parallel Programming Model A set of software technologies to express parallel algorithms and match applications with the underlying parallel systems. “an abstraction above hardware and memory architectures” Types of Parallel Programming Models: Shared Memory model, Threads Model, Distributed Memory model and Hybrid Models

Programming models NOT specific to a particular type of machine or memory architecture. “Virtual Shared Memory” Machine memory is physical distributed across networked machines, but appeared to the user as a single shared memory global address space.  Every task has direct access to global address space yet the ability to send and receive messages using MPI can be implemented.

Shared Memory Common block of read/write memory among processes Create Shared memory segment is created by the first process. Other processes know the key and have access to the shared memory segment. So, they can attach and share with one another. Create Shared Memory (unique key) MAX ptr Attach ptr Attach Proc. 3 Proc. 4 Proc. 5 ptr Proc. 2 Proc. 1 int shmget(key_t key, size_t size, int shmflg);

Thread Models Program is a collection of threads of control. Can be created dynamically, mid-execution, in some languages Each thread has a set of private variables, e.g., local stack variables Also a set of shared variables, e.g., static variables, shared common blocks, or global heap. Threads communicate implicitly by writing and reading shared variables. Data Racing Problem. - Require synchronization to ensure that no more than one thread is updating the same global address at any time.

Several Thread Libraries/systems PTHREADS is the POSIX Standard OpenMP standard for application level programming TBB: Thread Building Blocks CILK: Language of the C “ilk” Java threads

Distributed memory model A set of tasks that use their own local memory during computation. Multiple tasks can reside on the same physical machine and/or across an arbitrary number of machines. Tasks exchange data through communications by sending and receiving messages. Data transfer usually requires cooperative operations to be performed by each process. For example, a send operation must have a matching receive operation.

Open Multi Processing A simple API that allows to add parallelism into existing source code without significantly having to rewrite it. Programming in C/C++/Fortran. It is a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer It is composed of a set of compiler directives, library routines, and environment variables Easier to understand and maintain.

Fork-Join Model.

(Note that launching more thread than number of processing unit available can actually slow down the whole program )  OpenMP Since it is compiler directive based, it requires a compiler that supports. The directives can be added incrementally – gradual parallelization.

OpenMP Example: include<iostream> #include<omp.h> using namespace std; /******************************************************************** Sample OpenMP program which at stage 1 has 4 threads and at stage 2 has 2 threads **********************************************************/ int main() { #pragma omp parallel  num_threads(4) //*create 4 threads and region inside it will be executed by all   threads . */   #pragma omp critical//allow one thread at a time to access below statement   cout<<" Thread Id  in OpenMP stage 1=  "<<omp_get_thread_num()<< endl; }  //here all thread get merged into one thread id cout<<"I am alone"<<endl; #pragma omp parallel num_threads(2)//create two threads   cout<<" Thread Id  in OpenMP stage 2=  "<<omp_get_thread_num()<<  endl;; }  Command to run executable  with name a.out  on Linux :  /a.out  Output               Thread Id  in OpenMP stage 1= 2               Thread Id  in OpenMP stage 1=0         Thread Id  in OpenMP stage 1=3         Thread Id  in OpenMP stage 1= 1         I am alone         Thread Id  in OpenMP stage 2= 1         Thread Id  in OpenMP stage 2=0

OpenMP Advantages Programmer need not specify the processors ( nodes) No need for message passing since it uses a shared memory Its style of coding fits for both serial and parallel paradigms Ability to deal with coarse-grain parallelism with shared memory Disadvantages Runs efficiently only on shared memory platforms. Scalability is hindered due to shared memory architecture No reliable error handling mechanisms. Synchronization between subset threads isn’t allowed.

POSIX THREADS POSIX: Portable Operating System Interface for UNIX - Interface to Operating System utilities PThreads: The POSIX threading interface Implementations of the API are available in C/C++ on many Unix-like OS. However, we need third-party packages such as pthreads-w32, which implements pThreads on top of existing Windows API. Pthreads defines a set of programming language types, functions and constants. It is implemented with a pthread.h header and a thread library. There are around 100 Pthreads procedures, all prefixed "pthread_" and they can be categorized into four groups: Thread Management, Mutexes, Condition Variables, Synchronization.

Forking a POSIX Thread: int pthread_create(pthread_t *, const pthread_attr_t *, void * (*)(void *), void *); Example call: errcode = pthread_create(&thread_id; &thread_attribute &thread_fun; &fun_arg); thread_id is the thread id or handle (used to halt, etc.) thread_attribute various attributes a. Standard default values obtained by passing a NULL pointer b. Sample attribute: minimum stack size thread_fun the function to be run (takes and returns void*) fun_arg an argument can be passed to thread_fun when it starts errorcode will be set nonzero if the create operation fails

Some other functions: pthread_yield(); Informs the scheduler that the thread is willing to yield its quantum, requires no arguments. pthread_exit(void *value); Exit thread and pass value to joining thread (if exists) pthread_join(pthread_t *thread, void **result); Wait for specified thread to finish. Place exit value into *result. pthread_t me; me = pthread_self(); Allows a pthread to obtain its own identifier pthread_t thread; pthread_detach(thread); Informs the library that the threads exit status will not be needed by subsequent pthread_join calls resulting in better threads performance.

Simple Example: void* SayHello(void *foo) { printf( "Hello, world!\n" ); return NULL; } int main() { pthread_t threads[16]; int tn; for(tn=0; tn<16; tn++) { pthread_create(&threads[tn], NULL, SayHello, NULL); for(tn=0; tn<16 ; tn++) { pthread_join(threads[tn], NULL); return 0; Compile using gcc –lpthread

CILK/CILK PLUS/CILK++ Programming Languages which extend C and C++. Initially developed by MIT, based on ANSI C now belongs to Intel. Initial applications of Cilk were only in high performance computing. Intel Cilk Plus keywords: cilk_spawn - Specifies that a function call can execute asynchronously, without requiring the caller to wait for it to return. This is an expression of an opportunity for parallelism, not a command that mandates parallelism. The Intel Cilk Plus runtime will choose whether to run the function in parallel with its caller. cilk_sync - Specifies that all spawned calls in a function must complete before execution continues. There is an implied cilk_sync at the end of every function that contains a cilk_spawn. cilk_for - Allows iterations of the loop body to be executed in parallel. Also introduces  "Reducers” provide a lock-free mechanism that allows parallel code to use private "views" of a variable which are merged at the next sync.

Example of Cilk Plus int fib(int n) { if (n < 2) return n; int x = cilk_spawn fib(n-1); int y = fib(n-2); cilk_sync; return x + y; } Uses the header file <cilk/cilk.h> for (int i = 0; i < 8; ++i) { cilk_spawn do_work(i); } cilk_sync; cilk_for (int i = 0; i < 8; ++i) { do_work(i); }

Thread Building Blocks(TBB)  A C++ template library developed by Intel for parallel programming on multi-core processors. TBB enables you to specify tasks instead of Threads TBB is compatible with other threading packages A TBB program creates, synchronizes and destroys graphs of dependent tasks according to algorithms, i.e. high-level parallel programming paradigms ( Algorithmic Skeletons) TBB emphasize scalable, data parallel programming Optimizes core utilization. May result in scheduling overhead.

TBB COMPONENTS Basic algorithms: parallel_for, parallel_reduce, parallel_scan Advanced algorithms: parallel_while, parallel_do, parallel_pipeline, parallel_sort Containers: concurrent_queue, concurrent_priority_queue, concurrent_vector, concurrent_hash_map Memory allocation: scalable_malloc, scalable_free, scalable_realloc, scalable_calloc, scalable_allocator, cache_aligned_allocator Mutual exclusion: mutex, spin_mutex, queuing_mutex, spin_rw_mutex, queuing_rw_mutex, recursive_mutex Atomic operations: fetch_and_add, fetch_and_increment, fetch_and_decrement, compare_and_swap, fetch_and_store TBB relies on generic programming. It is similar to Standard Tag Library. Detailed explanation of TBB Components

High Performance Fortran Extension of Fortran 90 with constructs that support parallel computing Allows efficient implementation on both SIMD and MIMD style architectures Implicit parallelizing (mapping, distribution, communication, synchronization) High productivity

Parallel Virtual Machine (pvm) Enables a collection of heterogeneous computers to be used as a coherent and flexible concurrent computational resource. Supports software execution on each machine in a user-configurable pool Heterogeneous applications that can exploit specific strengths of individual machines on a network. Set of dynamic resource manager and powerful process control functions Fault tolerant (that can survive host or task failures) and portable.

THANK YOU!