Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI-455/552 Introduction to High Performance Computing Lecture 11.

Similar presentations


Presentation on theme: "CSCI-455/552 Introduction to High Performance Computing Lecture 11."— Presentation transcript:

1 CSCI-455/552 Introduction to High Performance Computing Lecture 11

2

3 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.9 Bucket Sort One “bucket” assigned to hold numbers that fall within each region. Numbers in each bucket sorted using a sequential sorting algorithm. Sequential sorting time complexity: O(nlog(n/m). Works well if the original numbers uniformly distributed across a known interval, say 0 to a - 1.

4

5 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.10 Parallel Version of Bucket Sort Simple approach Assign one processor for each bucket.

6 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.11 Further Parallelization Partition sequence into m regions, one region for each processor. Each processor maintains p “small” buckets and separates numbers in its region into its own small buckets. Small buckets then emptied into p final buckets for sorting, which requires each processor to send one small bucket to each of the other processors (bucket i to processor i).

7 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.12 Another Parallel Version of Bucket Sort Introduces new message-passing operation - all-to-all broadcast.

8

9 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.13 “all-to-all” Broadcast Routine Sends data from each process to every other process

10 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.14 “all-to-all” routine actually transfers rows of an array to columns: Transposes a matrix.

11 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Parallel Bucket and Sample Sort The critical aspect of the above algorithm is one of assigning ranges to processors. This is done by suitable splitter selection. The splitter selection method divides the n elements into p blocks of size n/p each, and sorts each block by using quicksort. From each sorted block it chooses p – 1 evenly spaced elements. The p(p – 1) elements selected from all the blocks represent the sample used to determine the buckets. This scheme guarantees that the number of elements ending up in each bucket is uniformed (less than 2n/p).

12 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Parallel Bucket and Sample Sort An example of the execution of sample sort on an array with 24 elements on three processes.

13 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Parallel Bucket and Sample Sort The splitter selection scheme can itself be parallelized. Each processor generates the p – 1 local splitters in parallel. All processors share their splitters using a single all- to-all broadcast operation. Each processor sorts the p(p – 1) elements it receives and selects p – 1 uniformly spaces splitters from them.

14 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.11 Parallel Complexity Analysis

15

16

17 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.15 Numerical Integration Using Rectangles Each region calculated using an approximation given by rectangles: Aligning the rectangles:

18 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.16 Numerical Integration Using Trapezoidal Method May not be better!

19

20 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.17 Adaptive Quadrature Solution adapts to shape of curve. Use three areas, A, B, and C. Computation terminated when largest of A and B sufficiently close to sum of remain two areas.

21 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.18 Adaptive Quadrature with False Termination. Some care might be needed in choosing when to terminate. Might cause us to terminate early, as two large regions are the same (i.e., C = 0).


Download ppt "CSCI-455/552 Introduction to High Performance Computing Lecture 11."

Similar presentations


Ads by Google