Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.


Similar presentations
SE-292 High Performance Computing

Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
Computer Architecture Introduction to MIMD architectures Ola Flygt Växjö University
Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.
Concurrency: Mutual Exclusion and Synchronization Why we need Mutual Exclusion? Classical examples: Bank Transactions:Read Account (A); Compute A = A +
Reference: Message Passing Fundamentals.
Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Parallel Computing Overview CS 524 – High-Performance Computing.
Concurrency CS 510: Programming Languages David Walker.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
3.5 Interprocess Communication
Threads CSCI 444/544 Operating Systems Fall 2008.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Asynchronous Message Passing EE 524/CS 561 Wanliang Ma 03/08/2000.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Mapping Techniques for Load Balancing
Lecture 1 – Parallel Programming Primer CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed.
CS 470/570:Introduction to Parallel and Distributed Computing.
Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.
Computer Architecture Parallel Processing
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Distributed Shared Memory Systems and Programming
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
Parallel Computer Architecture and Interconnect 1b.1.
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
Hybrid MPI and OpenMP Parallel Programming
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
Processes. Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Cooperating Processes Interprocess Communication Communication.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Lecture 7: POSIX Threads - Pthreads. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
CS 838: Pervasive Parallelism Introduction to pthreads Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
Embedded Computer Architecture 5SAI0 Multi-Processor Systems
Cooperating Processes The concurrent processes executing in the operating system may be either independent processes or cooperating processes. A process.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
NCHU System & Network Lab Lab #6 Thread Management Operating System Lab.
Parallel Computing Presented by Justin Reschke
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
1 ParallelAlgorithms Parallel Algorithms Dr. Stephen Tse Lesson 9.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Lecture 1 – Parallel Programming Primer
Distributed Shared Memory
Computer Engg, IIT(BHU)
Chapter 4: Threads.
Shared Memory Programming
Chapter 4: Threads & Concurrency
Programming Parallel Computers
Presentation transcript:

Lecture 4: Parallel Programming Models

Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism Shared memory / Distributed memory Other programming paradigms Object-oriented Functional and logic

Parallel Programming Models Data Parallelism Parallel programs that emphasize concurrent execution of the same task on different data elements (data-parallel programs) Most programs for scalable parallel computers are data parallel in nature. Task Parallelism Parallel programs that emphasize the concurrent execution of different tasks on the same or different data Used for modularity reasons. Parallel programs, structured as a task-parallel composition of data- parallel components is common.

Parallel Programming Models Data parallelism Task Parallelism

Parallel Programming Models Explicit Parallelism The programmer specifies directly the activities of the multiple concurrent “threads of control” that form a parallel computation. Provide the programmer with more control over program behavior and hence can be used to achieve higher performance. Implicit Parallelism The programmer provides high-level specification of program behavior. It is then the responsibility of the compiler or library to implement this parallelism efficiently and correctly.

Parallel Programming Models Shared Memory The programmer’s task is to specify the activities of a set of processes that communicate by reading and writing shared memory. Advantage: the programmer need not be concerned with data-distribution issues. Disadvantage: performance implementations may be difficult on computers that lack hardware support for shared memory, and race conditions tend to arise more easily Distributed Memory Processes have only local memory and must use some other mechanism (e.g., message passing or remote procedure call) to exchange information. Advantage: programmers have explicit control over data distribution and communication.

Shared vs Distributed Memory Shared memory Distributed memory Memory Bus PPPP PPPP MMMM Network

Parallel Programming Models Parallel Programming Tools: Parallel Virtual Machine (PVM) Distributed memory, explicit parallelism Message-Passing Interface (MPI) Distributed memory, explicit parallelism PThreads Shared memory, explicit parallelism OpenMP Shared memory, explicit parallelism High-Performance Fortran (HPF) Implicit parallelism Parallelizing Compilers Implicit parallelism

Parallel Programming Models

Message Passing Model Used on Distributed memory MIMD architectures Multiple processes execute in parallel asynchronously Process creation may be static or dynamic Processes communicate by using send and receive primitives

Parallel Programming Models Blocking send: waits until all data is received Non-blocking send: continues execution after placing the data in the buffer Blocking receive: if data is not ready, waits until it arrives Non-blocking receive: reserves buffer and continue execution. In a later wait operation if data is ready, copies it into the memory.

Parallel Programming Models Synchronous message-passing: Sender and receiver processes are synchronized Blocking-send / Blocking receive Asynchronous message-passing: no synchronization between sender and receiver processes Large buffers are required. As buffer size is finite, the sender may eventually block.

Parallel Programming Models Advantages of message-passing model Programs are highly portable Provides the programmer with explicit control over the location of data in the memory Disadvantage of message-passing model Programmer is required to pay attention to such details as the placement of memory and the ordering of communication.

Parallel Programming Models Factors that influence the performance of message-passing model Bandwidth Latency Ability to overlap communication with computation.

Parallel Programming Models Example: Pi calculation   f 0 1 f(x) dx = f 0 1 4/(1+x 2 ) dx = w ∑ f(x i ) f(x) = 4/(1+x 2 ) n = 10 w = 1/n x i = w(i-0.5) x f(x) x i 1

Parallel Programming Models Sequential Code #define f(x) 4.0/(1.0+x*x); main(){ intn,i; float w,x,sum,pi; printf(“n?\n”); scanf(“%d”, &n); w=1.0/n; sum=0.0; for (i=1; i<=n; i++){ x=w*(i-0.5); sum += f(x); } pi=w*sum; printf(“%f\n”, pi); }  = w ∑ f(x i ) f(x) = 4/(1+x 2 ) n = 10 w = 1/n x i = w(i-0.5) x f(x) x i 1

Parallel Programming Models Parallel PVM program Master: Creates workers Sends initial values to workers Receives local “sum”s from workers Calculates and prints “pi” Workers: Receive initial values from master Calculate local “sum”s Send local “sum”s to Master Master W0W1W2W3

Parallel Virtual Machine (PVM) Data Distribution x f(x) x i 1 x f(x) x i 1

Parallel Programming Models SPMD Parallel PVM program Master: Creates workers Sends initial values to workers Receives “pi” from W0 and prints Workers: Receive initial values from master Calculate local “sum”s Workers other than W0: Send local “sum”s to W0 W0: Receives local “sum”s from other workers Calculates “pi” Sends “pi” to Master Master W0W1W2W3

Parallel Programming Models Shared Memory Model Used on Shared memory MIMD architectures Program consists of many independent threads Concurrently executing threads all share a single, common address space. Threads can exchange information by reading and writing to memory using normal variable assignment operations

Parallel Programming Models Memory Coherence Problem To ensure that the latest value of a variable updated in one thread is used when that same variable is accessed in another thread. Hardware support and compiler support are required Cache-coherency protocol Thread 1Thread 2 X

Parallel Programming Models Distributed Shared Memory (DSM) Systems Implement Shared memory model on Distributed memory MIMD architectures Concurrently executing threads all share a single, common address space. Threads can exchange information by reading and writing to memory using normal variable assignment operations Use a message-passing layer as the means for communicating updated values throughout the system.

Parallel Programming Models Synchronization operations in Shared Memory Model Monitors Locks Critical sections Condition variables Semaphores Barriers

PThreads In the UNIX environment a thread: Exists within a process and uses the process resources Has its own independent flow of control Duplicates only the essential resources it needs to be independently schedulable May share the process resources with other threads Dies if the parent process dies Is "lightweight" because most of the overhead has already been accomplished through the creation of its process.

PThreads Because threads within the same process share resources: Changes made by one thread to shared system resources will be seen by all other threads. Two pointers having the same value point to the same data. Reading and writing to the same memory locations is possible, and therefore requires explicit synchronization by the programmer.