Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.

Slides:



Advertisements
Similar presentations
Introduction to C Programming
Advertisements

List Ranking and Parallel Prefix
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
List Ranking on GPUs Sathish Vadhiyar. List Ranking on GPUs Linked list prefix computations – computations of prefix sum on the elements contained in.
INTEL CONFIDENTIAL Threading for Performance with Intel® Threading Building Blocks Session:
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
MATH 224 – Discrete Mathematics
Practical techniques & Examples
CHAPTER 2 ALGORITHM ANALYSIS 【 Definition 】 An algorithm is a finite set of instructions that, if followed, accomplishes a particular task. In addition,
Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
Copyright © 2014, 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with C++ Early Objects Eighth Edition by Tony Gaddis,
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 7- 1 Overview 7.1 Introduction to Arrays 7.2 Arrays in Functions 7.3.
Copyright © 2010 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with Programming Logic & Design Second Edition by Tony Gaddis.
Lecture 7 : Parallel Algorithms (focus on sorting algorithms) Courtesy : SUNY-Stony Brook Prof. Chowdhury’s course note slides are used in this lecture.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Data Parallel Algorithms Presented By: M.Mohsin Butt
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Scalable Algorithmic Techniques (Ch. 4-5 Lin Snyder Text) Johnnie W. Baker Feb
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 12: Recursion Problem Solving, Abstraction, and Design using C++
Chapter 1: Introduction to Computers and Programming.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Extended Prelude to Programming Concepts & Design, 3/e by Stewart Venit and.
Chapter 19 Searching, Sorting and Big O
Comp 249 Programming Methodology Chapter 10 – Recursion Prof. Aiman Hanna Department of Computer Science & Software Engineering Concordia University, Montreal,
CS4961 Parallel Programming Lecture 2: Introduction to Parallel Algorithms Mary Hall August 25, /25/2011 CS4961.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with Programming Logic & Design First Edition by Tony Gaddis.
Object-Oriented Program Development Using Java: A Class-Centered Approach, Enhanced Edition.
Concurrent Algorithms. Summing the elements of an array
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Finding concurrency Jakub Yaghob. Finding concurrency design space Starting point for design of a parallel solution Analysis The patterns will help identify.
Copyright © 2010 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with Programming Logic & Design Second Edition by Tony Gaddis.
Copyright © 2010 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5: Looping.
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Lec 5 part2 Disk Storage, Basic File Structures, and Hashing.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Modern Information Retrieval
Concurrency and Performance Based on slides by Henri Casanova.
VISUAL C++ PROGRAMMING: CONCEPTS AND PROJECTS Chapter 7A Arrays (Concepts)
Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with Programming Logic & Design Third Edition by Tony Gaddis.
Copyright © 2014, 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with C++ Early Objects Eighth Edition by Tony Gaddis,
08/23/2012CS4230 CS4230 Parallel Programming Lecture 2: Introduction to Parallel Algorithms Mary Hall August 23,
Lecture 2. Algorithms and Algorithm Convention 1.
Parallel Computing Chapter 3 - Patterns R. HALVERSON MIDWESTERN STATE UNIVERSITY 1.
Introduction to Algorithms
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Programming in C with MPI and OpenMP
Algorithm Efficiency Chapter 10.
Chapter 8: ZPL and Other Global View Languages
Introduction to Algorithms
Lecture 2 The Art of Concurrency
Data Parallel Pattern 6c.1
COS 151 Bootcamp – Week 4 Department of Computer Science
Presentation transcript:

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder Chapter 4: First Steps Toward Parallel Programming

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Toward writing parallel programs Build intuition toward parallelism When to parallelize When overhead is too great Consider –Data allocation –Work allocation –Data structure design –Algorithms 4-2

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 3 ways to formulate parallel computations Unlimited Parallelism Fixed Parallelism Scalable Parallelsim 4-3

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 2 classes of parallel algorithms Data parallel Task parallel 4-4

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Data parallel Perform same computation to different data items at the same time. Parallelism grows as data grows Example –P chefs preparing N meals –Each chef prepares N/P meals –As N increases, also increase P, limited by constraints 4-5

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Task parallel Perform distinct computations at the same time Number of tasks typically fixed Not scalable Example –Chef for salad, chef for dessert, chef appetizer –There are dependencies among tasks –Utilizes pipelining Hybred of data and task is often used 4-6

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Pseudo code – Peril-L Minimal, easy to learn Universal to any language Allow reasoning about performance Will extend C 4-7

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Perl-L Threads –forall (i in (1..12)) printf(“Hello %i\n”,i); Prints 12 Hello’s in random order Threads compete and execute in parallel 4-8

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Perl-L exclusive –One thread executes body at a time forall (i in (1..12)){ exclusive { printf(“Hello %i\n”,i); }} barrier –Forces all threads to stop at the barrier until all threads arrive at which point they continue 4-9

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Perl-L barrier –All threads wait for all to arrive, then continue forall (i in (1..12)) { printf(“tweedle dee \n”); barrier; printf(“tweedle dum \n”); } All tweedle dee’s print before tweedle dum’s 4-10

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Peril-l memory model Global –Variables visible to all threads –Outside a forall –Variables underlined Local –Variables visible to only local thread –Inside a forall –Variables not underlined 4-11

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Perl-l Multiple reads concurrent One write –Allows race conditions, last write wins 4-12

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Connecting global and local memory Global memory is distributed to local memory Localize takes global memory to make it local int allData[n]; // global forall (thdID in (0..P-1)) { // spawn threads int size = n/P; // size of allocations int locData[size]=localize(allData[]); // map globals to this thd locals 4-13

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Connecting global and local memory (cont) Modification to local data is same as modifying global data but with out λ delay of accessing nonlocal memory 4-14

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Issues of localization of global memory Global arrays use local indices which start at 0 Multiple threads on a processor keep data local to the thread There is no local copy, both local and global reference the same memory location? 4-15

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Handy functions size = mySize(global,i) –Feturns the size of the ith dimension of the local portion of the global array localToGlobal(locData, i, j) –Returns global index corresponds to ith index of the jth dimension of the local array, locData 4-16

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Full Empty variables - synchronization Like matter, next slide Incurs over head like global memory, λ int t’=0; //declare empty t and fill it 4-17

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 4-18 Table 4.1 Semantics of full/empty variables.

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Reduce/Scan Reduce – combines a set of values to produce a single value –Written with / –+/count //add elements of count Scan – parallel prefix computation, embodies logic that performs a sequential operation in parts and carries along the intermediate results –Written with \ –Min\items //scan, ie find smallest of items’ prefixs 4-19

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Additional example least = min/dataArray; //scalar stored in local //least of each thread. reduce/scan can combine values across multiple threads 4-20

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley More examples - reduce count – local in each thread total=+/count; Combined into a single result stored in each thread 4-21

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley More examples - scan count local to each thread beforeMe =+\count; count variables are accumulate so the ith thread has its beforeMe variable assigned the sum of the first i count values 4-22

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Implied Reduce - Scan synchronization Consider largest = max/localTotal; All threads must arrive at this statement to perform the summation. Threads proceed only after the assignment 4-23

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Programming consideration exclusive { total +=priv_count; } //done serially Versus Total =+/priv_count; //done with tree structure Converts from O(p) to O(lg P) 4-24

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 4-25 Figure 4.1 The Count 3s computation (Try 3) written in the Peril-L notation.

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Formulating Parallelism Fixed Parallelism –Write code designed for a particular machine –Improving the machine may not increase parallelism Unlimited Parallelism –Use forall ( i in (0.. n-1) –Will use available resources –Will require substantial thread communication 4-26

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 4-27 Figure 4.2 Fixed Parallelism solution to Count 3s (t=4).

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Formulating Parallelism (cont) Scalable –As follows: Determine how components (data structures, work load, etc) grow as n increases. Formulate a set S of substantial subproblems where natural units of the solution are assigned to each S Solve each S independently –Utilizes locality 4-28

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 4-29 Figure 4.3 Scalable Parallelism solution to Count 3s. Notice that the array segment has been localized.

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 4-30 Table 4.2 Helper functions.

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 4-31 Figure 4.4 Odd/Even Interchange to alphabetize a list L of records on field x.

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 4-32 Figure 4.5 Fixed 26-way parallel solution to alphabetizing. The function letRank(x) returns the 0- origin rank of the Latin letter x.

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 4-33 Figure 4.6

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 4-34 Table 4.3 Merge operations.

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 4-35 Figure 4.7 Peril-L program using Batcher’s sort to alphabetize records in L.

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 4-36 Figure 4.7 Peril-L program using Batcher’s sort to alphabetize records in L. (cont.)