1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.

Slides:



Advertisements
Similar presentations
Introduction to Openmp & openACC
Advertisements

Distributed Systems CS
SE-292 High Performance Computing
Introductions to Parallel Programming Using OpenMP
Optimizing single thread performance Dependence Loop transformations.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Types of Parallel Computers
Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.
Reference: Message Passing Fundamentals.
Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Based on Silberschatz, Galvin and Gagne  2009 Threads Definition and motivation Multithreading Models Threading Issues Examples.
Distributed Systems CS Programming Models- Part I Lecture 16, Oct 31, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Parallel & Concurrent Programming: Occam Vitaliy Lvin University of Massachusetts.
Computer Architecture Parallel Processing
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Parallel Programming in Java with Shared Memory Directives.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
HPC Issues Thomas Radke Max Planck Institute for Gravitational Physics
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
Parallel Computer Architecture and Interconnect 1b.1.
MIMD Distributed Memory Architectures message-passing multicomputers.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Distributed Systems CS /640 Programming Models Borrowed and adapted from our good friends at CMU-Doha, Qatar Majd F. Sakr, Mohammad Hammoud andVinay.
Improving I/O with Compiler-Supported Parallelism Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
MPI: Portable Parallel Programming for Scientific Computing William Gropp Rusty Lusk Debbie Swider Rajeev Thakur.
Fortress Aaron Becker Abhinav Bhatele Hassan Jafri 2 May 2006.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
MPI and OpenMP.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
HParC language. Background Shared memory level –Multiple separated shared memory spaces Message passing level-1 –Fast level of k separate message passing.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Parallel Computing Presented by Justin Reschke
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
SHARED MEMORY PROGRAMMING WITH OpenMP
Computer Engg, IIT(BHU)
September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.
Distributed Systems CS
MPJ: A Java-based Parallel Computing System
Chapter 4: Threads & Concurrency
Types of Parallel Computers
Programming Parallel Computers
Presentation transcript:

1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005

2 Why Parallel Programming? Predict weather Predict spread of SARS Predict path of hurricanes Predict oil slick propagation Model growth of bio-plankton/fisheries Structural simulations Predict path of forest fires Model formation of galaxies Simulate nuclear explosions

3 Code that can be parallelized do i= 1 to max, a[i] = b[i] + c[i] * d[i] end do

4 Parallel Computers Programming mode types –Shared memory –Message passing

5 Distributed Memory Architecture Each Processor has direct access only to its local memory Processors are connected via high-speed interconnect Data structures must be distributed Data exchange is done via explicit processor-to-processor communication: send/receive messages Programming Models –Widely used standard: MPI –Others: PVM, Express, P4, Chameleon, PARMACS,... P0 Communication Interconnect... Memory P0 P1 Pn

6 Message Passing Interface MPI provides: Point-to-point communication Collective operations –Barrier synchronization –gather/scatter operations –Broadcast, reductions Different communication modes –Synchronous/asynchronous –Blocking/non-blocking –Buffered/unbuffered Predefined and derived datatypes Virtual topologies Parallel I/O (MPI 2) C/C++ and Fortran bindings

7 Shared Memory Architecture Processors have direct access to global memory and I/O through bus or fast switching network Cache Coherency Protocol guarantees consistency of memory and I/O accesses Each processor also has its own memory (cache) Data structures are shared in global address space Concurrent access to shared memory must be coordinated Programming Models –Multithreading (Thread Libraries) –OpenMP P0 Cache P0 Cache P1 Cache Pn Cache Global Shared Memory Shared Bus...

8 OpenMP OpenMP: portable shared memory parallelism Higher-level API for writing portable multithreaded applications Provides a set of compiler directives and library routines for parallel application programmers API bindings for Fortran, C, and C++

9

10 Approaches Parallel Algorithms Parallel Language Message passing (low-level) Parallelizing compilers

11 Parallel Languages CSP - Hoare’s notation for parallelism as a network of sequential processes exchanging messages. Occam - Real language based on CSP. Used for the transputer, in Europe.

12 Fortran for parallelism Fortran 90 - Array language. Triplet notation for array sections. Operations and intrinsic functions possible on array sections. High Performance Fortran (HPF) - Similar to Fortran 90, but includes data layout specifications to help the compiler generate efficient code.

13 More parallel languages ZPL - array-based language at UW. Compiles into C code (highly portable). C* - C extended for parallelism

14 Object-Oriented Concurrent Smalltalk Threads in Java, Ada, thread libraries for use in C/C++ –This uses a library of parallel routines

15 Functional NESL, Multiplisp Id & Sisal (more dataflow)

16 Parallelizing Compilers Automatically transform a sequential program into a parallel program. 1.Identify loops whose iterations can be executed in parallel. 2.Often done in stages. Q: Which loops can be run in parallel? Q: How should we distribute the work/data?

17 Data Dependences Flow dependence - RAW. Read-After-Write. A "true" dependence. Read a value after it has been written into a variable. Anti-dependence - WAR. Write-After-Read. Write a new value into a variable after the old value has been read. Output dependence - WAW. Write-After-Write. Write a new value into a variable and then later on write another value into the same variable.

18 Example 1: A = 90; 2: B = A; 3: C = A + D 4: A = 5;

19 Dependencies A parallelizing compiler must identify loops that do not have dependences BETWEEN ITERATIONS of the loop. Example: do I = 1, 1000 A(I) = B(I) + C(I) D(I) = A(I) end do

20 Example Fork one thread for each processor Each thread executes the loop: do I = my_lo, my_hi A(I) = B(I) + C(I) D(I) = A(I) end do Wait for all threads to finish before proceeding.

21 Another Example do I = 1, 1000 A(I) = B(I) + C(I) D(I) = A(I+1) end do

22 Yet Another Example do I = 1, 1000 A( X(I) ) = B(I) + C(I) D(I) = A( X(I) ) end do

23 Parallel Compilers Two concerns: Parallelizing code –Compiler will move code around to uncover parallel operations Data locality –If a parallel operation has to get data from another processor’s memory, that’s bad

24 Distributed computing Take a big task that has natural parallelism Split it up to may different computers across a network Examples: prime number searches, Google Compute, etc. Distributed computing is a form of parallel computing