PARALLEL PARADIGMS AND PROGRAMMING MODELS 6 th week.

Slides:



Advertisements
Similar presentations
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Advertisements

1 Programming Languages Storage Management Cao Hoaøng Truï Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa TP. HCM.
Parallel Processing & Distributed Systems Thoai Nam Chapter 2.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
Reference: Message Passing Fundamentals.
Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Parallel Computing Multiprocessor Systems on Chip: Adv. Computer Arch. for Embedded Systems By Jason Agron.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
Multiscalar processors
Chapter 9. Concepts in Parallelisation An Introduction
Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.
WEL COME PRAVEEN M JIGAJINNI PGT (Computer Science) MCA, MSc[IT], MTech[IT],MPhil (Comp.Sci), PGDCA, ADCA, Dc. Sc. & Engg.
Mapping Techniques for Load Balancing
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Parallel Architectures
Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.
Computer Architecture Parallel Processing
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
October 26, 2006 Parallel Image Processing Programming and Architecture IST PhD Lunch Seminar Wouter Caarls Quantitative Imaging Group.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Programming with Shared Memory Introduction to OpenMP
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
-1.1- PIPELINING 2 nd week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PIPELINING 2 nd week References Pipelining concepts The DLX.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
Introduction and Features of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++
SHARED-MEMORY PROGRAMMING 6 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM SHARED-MEMORY PROGRAMMING 6 th week References Introduction.
Hybrid MPI and OpenMP Parallel Programming
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
-1- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Parallel Computer Architectures 2 nd week References Flynn’s Taxonomy Classification of Parallel.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Group 3: Architectural Design for Enhancing Programmability Dean Tullsen, Josep Torrellas, Luis Ceze, Mark Hill, Onur Mutlu, Sampath Kannan, Sarita Adve,
Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,
Parallel Processing & Distributed Systems Thoai Nam And Vu Le Hung.
MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix.
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
Parallel Processing & Distributed Systems Thoai Nam Chapter 2.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
TM Parallel Concepts An introduction. TM The Goal of Parallelization Reduction of elapsed time of a program Reduction in turnaround time of jobs Overhead:
MPI and OpenMP.
Programmability Hiroshi Nakashima Thomas Sterling.
Threaded Programming Lecture 2: Introduction to OpenMP.
Outline Why this subject? What is High Performance Computing?
Multiprocessor So far, we have spoken at length microprocessors. We will now study the multiprocessor, how they work, what are the specific problems that.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
Parallel Computing Presented by Justin Reschke
1/46 PARALLEL SOFTWARE ( SECTION 2.4). 2/46 The burden is on software From now on… In shared memory programs: Start a single process and fork threads.
Parallel Processing & Distributed Systems Thoai Nam Chapter 3.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Introduction to OpenMP
Conception of parallel algorithms
Computer Engg, IIT(BHU)
Introduction to OpenMP
Chapter 4: Threads.
Message Passing Models
Introduction to OpenMP
Chapter 4: Threads.
Programming Parallel Computers
Presentation transcript:

PARALLEL PARADIGMS AND PROGRAMMING MODELS 6 th week

-2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PARALLEL PARADIGMS AND PROGRAMMING MODELS 6 th week References Parallel programming paradigms Programmability Issues Parallel programming models – Implicit parallelism – Explicit parallel models – Other programming models

-3- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM REFERENCES Scalable Parallel Computing: Technology, Architecture and Programming, Kai Hwang and ZhiweiXu, ch12 Parallel Processing Course – Yang-Suk Kee School of EECS, Seoul National University

-4- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PARALLEL PROGRAMMING PARADIGMS Parallel Programming Paradigms/Models are the ways to – Design a parallel program – Structure the algorithm of a parallel program – Deploy/run the program on a parallel computer system Commonly-used algorithmic paradigms [Ref.] – Phase parallel – Synchronous and Asynchronous Iteration – Divide and Conquer – Pipeline – Process farm – Work pool

-5- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PARALLEL PROGRAMMABILITY ISSUES The programmability of a parallel programming models is – How much easy to use this system for developing and deploying parallel programs – How much the system supports for various parallel algorithmic paradigms Programmability is the combination of – Structuredness – Generality – Portability

-6- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM STRUCTUREDNESS A program is structured if it is comprised of structured cosntructs each of which has these 3 properties – Is a single-entry, single-exit construct – Different semantic entities are clearly identified – Related operation are enclosed in one construct The structuredness mostly depends on – The programming languge – The design of the program

-7- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM GENERALITY A program class C is as general as or more general than program class D if: – For any program Q in D, we can write a program P in C – Both P & Q have the same semantics – P performs as well as or better than Q

-8- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PORTABILITY A program is portable across a set of computer system if it can be transferrd from one machine to another with little effort Portability largely depends on – The language of the program – The target machine’s architecture Levels of portability 1. [Users] must change the program’s algorithm 2. Only have to change the source code 3. Only have to recompile and relink the program 4. Can use the executable directly

-9- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PARALLEL PROGRAMING MODELS Widely-accepted programming models are – Implicit parallelism – Data-parallel model – Message-passing model – Shared-variable model ( Shared Address Space model)

-10- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM IMPLICIT PARALLELISM The compiler and the run-time support system automatically exploit the parallelism from the sequential-like program written by users Ways to implement implicit parallelism – Parallelizing Compilers – User directions – Run-time parallelization

-11- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PARALLELIZING COMPILER A parallelizing (restructuring )compiler must – Performs dependence analysis on a sequential program’s source code – Uses transformation techniques to convert sequential code into native parallel code Dependence analysis is the idenfifying of – Data depence – Control depence

-12- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Data dependence Control dependence When dependencies do exist, transformation techniques/ optimizing techniques should be used – To eliminate those dependencies or – To make the code parallelizable, if possible PARALLELIZING COMPILER (cont’d) X = X + 1 Y = X + Y X = X + 1 Y = X + Y If f(X) = 1 then Y = Y + Z;

-13- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Privatization technique SOME OPTIMIZING TECHNIQUES FOR ELIMINATING DATA DEPENDENCIES Do i=1,N P: A = … Q: X(i)= A + … … End Do ParDo i=1,N P: A(i) = … Q: X(i) = A(i) + … … End Do Q needs the value A of P, so N iterations of the Do loop can not be parallelized Each iteration of the Do loop have a private copy A(i), so we can execute the Do loop in parallel

-14- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM SOME OPTIMIZING TECHNIQUES FOR ELIMINATING DATA DEPENDENCIES (cont’d) Reduction techinque Do i=1,N P: X(i) = … Q: Sum = Sum + X(i) … End Do ParDo i=1,N P: X(i) = … Q: Sum = sum_reduce(X(i)) … End Do The Do loop can not be executed in parallel since the computing of Sum in the i-th iteration needs the values of the previous iteration A parallel reduction function is used to avoid data dependency

-15- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM USER DIRECTION User help the compiler in parallelizing by – Providing additional information to guide the parallelization process – Inserting compiler directives (pragmas) in the source code User is responsible for ensuring that the code is correct after parallelization Example (Convex Exemplar C) #pragma_CNX loop_parallel for (i=0; i <1000;i++){ A[i] = foo (B[i], C[i]); }

-16- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM RUN-TIME PARALLELIZATION Parallelization involves both the compiler and the run-time system – Additional construct is used to decompose the sequential program into multiple tasks and to specify how each task will access data – The compiler and the run-time system recognize and exploit parallelism at both the compile time and run-time Example: Jade language (Stanford Univ.) – More parallelism can be recognized – Automatically exploit the irregular and dynamic parallelism

-17- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM IMPLICIT PARALLELISM: A CONCLUSION Advantages of the implicit programming model – Ease of use for users (programmers) – Reusability of old-code and legacy sequential applications – Faster application development time Disadvantages – The implementation of the underlying run-time systems and parallelizing compilers is so complicated and requires a lot of reseach and studies – Reseach outcome shows that automatic parallelization is not so efficient (from 4% to 38% of parallel code written by experienced programmers)

-18- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM EXPLICIT PROGRAMMING MODELS Data-Parallel Message-Passing Shared-Variable

-19- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM DATA-PARALLEL MODEL Applies to either SIMD or SPMD modes The same instruction or program segment executes over different data sets simultaneously Massive parallelism is exploited at data set level Has a single thread of control Has a global naming space Applies loosely synchronous operation

-20- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM DATA-PARALLEL MODEL: AN EXAMPLE main() { double local[N], tmp[N], pi, w; long i,j,t, N=100000; A: w=1.0/N; B: forall (i=0;i<N;i++){ P:local[i]= (i +0.5)*w; Q:tmp[i]=4.0/(1.0+local[i]*local[i]); } C: pi=sum(tmp); D: printf(“pi is %f\n”, pi*w); } //end main Data-parallel operations Reduction operation Example: a data-parallel program to compute the constant “pi”

-21- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MESSAGE-PASSING MODEL Multithreading: program consists of multiple processes – Each process has its own thread of control – Both control parallelism (MPMD) and data parallelism (SPMD) are supported Asynchronous Parallelism – All process execute asynchronously – Must use special operation to synchronize processes Multiple Address Spaces – Data variables in one process is invisible to the others – Processes interact by sending/receiving messages Explicit Interactions – Programmer must resolve all the interaction issues: data mapping, communication, synchronization and aggregation Explicit Allocation – Both workload and data are explicitly allocated to the process by the user

-22- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MESSAGE-PASSING MODEL: AN EXAMPLE #define N main() { double local, pi, w; long i, taskid, numtask; A: w=1.0/N; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &taskid); MPI_Comm_size(MPI_COMM_WORLD, &numtask); B: for (i=taskid;i<N;i=i+numtask) { P:local= (i +0.5)*w; Q:local=4.0/(1.0+local*local); } C: MPI_Reduce(&local, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); D: if (taskid==0) printf(“pi is %f\n”, pi*w); MPI_Finalize(); } //end main Example: a message-passing program to compute the constant “pi” Message-Passing operations

-23- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM SHARED-VARIABLE MODEL Has a single address space Has multithreading and asynchronous model Data reside in a single, shared address space, thus does not have to be explicitly allocated Workload can be implicitly or explicitly allocated Communication is done implicitly – Through reading and writing shared variables Synchronization is explicit

-24- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM SHARED-VARIABLE MODEL: AN EXAMPLE #define N main() { double local, pi=0.0, w; long i; A: w=1.0/N; B: #pragma parallel #pragma shared (pi,w) #pragma local(i,local) { #pragma pfor iterate (i=0;N;1) for(i=0;i<N;i++){ P: local= (i +0.5)*w; Q: local=4.0/(1.0+local*local); } C: #pragma critical pi=pi+local; } D: if (taskid==0) printf(“pi is %f\n”, pi*w); } //end main

-25- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM COMPARISON OF FOUR MODELS IssuesImplicitData-parallelMessage-passingShared-Variable Platform-independent examples Kap, ForgeFortran 90, HPF, HPC++ PVM, MPIX3H5 Platform-dependent examples CM C*SP2 MPL, Paragon Nx Cray Craft, SGI Power C Parallelism issues                Allocation issues                Communication                  Synchronization                 Aggregation                   Irregularity                Termination                 Determinacy                Correctness              Generality             Portability               Structuredness          

-26- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM COMPARISON OF FOUR MODELS (cont’d) Implicit parallelism – Easy to use – Can reuse existing sequential programs – Programs are portable among different architectures Data parallelism – Programs are always determine and free of deadlocks/livelocks – Difficult to realize some loosely sync. program

-27- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM COMPARISON OF FOUR MODELS (cont’d) Message-passing model – More flexible than the data-parallel model – Lacks support for the work pool paradigm and applications that need to manage a global data structure – Be widely-accepted – Expoit large-grain parallelism and can be executed on machines with native shared-variable model (multiprocessors: DSMs, PVPs, SMPs) Shared-variable model – No widely-accepted standard  programs have low portability – Programs are more difficult to debug than message-passing programs

-28- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM OTHER PROGRAMMING MODELS Functional programming Logic programming Computing-by-learning Object-oriented programming