1 CS4402 – Parallel Computing Lecture 7 - Simple Parallel Sorting. - Parallel Merge Sort.

Slides:



Advertisements
Similar presentations
Back to Sorting – More efficient sorting algorithms.
Advertisements

Garfield AP Computer Science
Partitioning and Divide-and-Conquer Strategies Data partitioning (or Domain decomposition) Functional decomposition.
Lecture 3: Parallel Algorithm Design
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
Sorting Algorithms Bryce Boe 2012/08/13 CS32, Summer 2012 B.
Lectures on Recursive Algorithms1 COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski.
CS4413 Divide-and-Conquer
Quicksort Quicksort     29  9.
ISOM MIS 215 Module 7 – Sorting. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
© Copyright 2012 by Pearson Education, Inc. All Rights Reserved. 1 Chapter 17 Sorting.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 24 Sorting.
Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Sorting Algorithms and Average Case Time Complexity
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
CS203 Programming with Data Structures Sorting California State University, Los Angeles.
Parallel Sorting Algorithms Comparison Sorts if (A>B) { temp=A; A=B; B=temp; } Potential Speed-up –Optimal Comparison Sort: O(N lg N) –Optimal Parallel.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Chapter 10 in textbook. Sorting Algorithms
Sorting Algorithms CS 524 – High-Performance Computing.
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Collective Communications Self Test with solution.
Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.
CS 584. Sorting n One of the most common operations n Definition: –Arrange an unordered collection of elements into a monotonically increasing or decreasing.
Unit 1. Sorting and Divide and Conquer. Lecture 1 Introduction to Algorithm and Sorting.
Sorting HKOI Training Team (Advanced)
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “ Introduction to the Design & Analysis of Algorithms, ” 2 nd ed., Ch. 1 Chapter.
CSCE 3110 Data Structures & Algorithm Analysis Sorting (I) Reading: Chap.7, Weiss.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Lecture10: Sorting II Bohyung Han CSE, POSTECH CSED233: Data Structures (2014F)
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
CS 307 Fundamentals of Computer Science 1 bubble sort  Traverse a collection of elements –Move from the front to the end –“Bubble” the largest value to.
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
1. 2 Sorting Algorithms - rearranging a list of numbers into increasing (strictly nondecreasing) order.
Chapter 8 Sorting and Searching Goals: 1.Java implementation of sorting algorithms 2.Selection and Insertion Sorts 3.Recursive Sorts: Mergesort and Quicksort.
1 Searching and Sorting Searching algorithms with simple arrays Sorting algorithms with simple arrays –Selection Sort –Insertion Sort –Bubble Sort –Quick.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
CS4402 – Parallel Computing
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 25 Sorting.
1 Ch. 2: Getting Started. 2 About this lecture Study a few simple algorithms for sorting – Insertion Sort – Selection Sort (Exercise) – Merge Sort Show.
1 Ch.19 Divide and Conquer. 2 BIRD’S-EYE VIEW Divide and conquer algorithms Decompose a problem instance into several smaller independent instances May.
Data Structures and Algorithms in Parallel Computing Lecture 8.
CSE 143 Lecture 16 Sorting reading: 13.1, slides created by Marty Stepp
Sorting Quick, Merge & Radix Divide-and-conquer Technique subproblem 2 of size n/2 subproblem 1 of size n/2 a solution to subproblem 1 a solution to.
Today’s Material Sorting: Definitions Basic Sorting Algorithms
Parallel Programming - Sorting David Monismith CS599 Notes are primarily based upon Introduction to Parallel Programming, Second Edition by Grama, Gupta,
1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5.
Divide and Conquer Algorithms Sathish Vadhiyar. Introduction  One of the important parallel algorithm models  The idea is to decompose the problem into.
Liang, Introduction to Java Programming, Tenth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 23 Sorting.
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
Chapter 23 Sorting Jung Soo (Sue) Lim Cal State LA.
Advanced Sorting 7 2  9 4   2   4   7
Lecture 3: Parallel Algorithm Design
CS4402 – Parallel Computing
Parallel Sorting Algorithms
Adapted from slides by Marty Stepp and Stuart Reges
Unit-2 Divide and Conquer
Sorting Algorithms Ellysa N. Kosinaya.
Parallel Sorting Algorithms
Analysis of Algorithms
CSE 373 Data Structures and Algorithms
Quick-Sort 4/25/2019 8:10 AM Quick-Sort     2
Presentation transcript:

1 CS4402 – Parallel Computing Lecture 7 - Simple Parallel Sorting. - Parallel Merge Sort.

2 Sorting Change the elements of a=(a[i],i=0,1,…,n-1) in an increase or decrease order. Several sequential algorithms available: - Internal sorting swaps elements and does not use extra memory. - External sorting uses external memory e.g. the linear sorting. - Complexity O(n^2) for simple but no optimal algorithms. - counting, bubble, sequential insert etc. - Complexity O(n*logn) for optimal algorithms. - quick, merge, binary insert, etc. - Linear Complexity O(n) when the array has some properties.

3

4

5

6 Rank Sort – Worse Description for(i=0;i<n;i++){ for(rank[i]=0, j=0;j<n;j++){ if(a[i]>a[j]) rank[i]++; } for(i=0;i<n;i++){ b[rank[i]] = a[i]; }

7 Rank Sort – MPI Implementation Using a parallel machine with size processors. Some Remarks: 1.Each processor must know the whole array. 2.The counting process is then partitioned onto processors. 3.Each processor counts only a chunk of the array. 4.Processor rank generates the array ranking = (ranking[i], i=0,…, n/size-1) ranking[i] = rank of a[rank*n/size+i] in a. 5.The arrays ranking are gathered and then the array b is restored.

8 MPI_Rank_sort(int n, int * a, int root, MPI_Comm comm) This MPI function must have the following steps: 1.Bcast the whole array to the processors. 2.Generate the array ranking = (ranking[i], i=0,…,n/size-1) 3.Gather the array ranking on processor root 4.If root then generate/restore the array b. Question? Can we avoid the serial step? If Yes to what price?

9 Linear Sort: Suppose that the array a=(a[i], i=0,…,n-1) has only integers in 0,1,…,m-1. In this case we can count how many times j=0,1,…,m-1 occurs in a. Then this information is reused to generate the array. Example: a=(2,1,3,2,1,3,0,1,1,2,0,3,1) count[0]=2, count[1]=5, count[2]=3, count[3]=3 a is restore with 2 0-s, 5 1-s, 3 2-s and 3 3-s. a=(0,0,1,1,1,1,1,2,2,2,3,3,3)

10 Linear Sort: // reset the counters for(j=0;j<m;j++) count[j] = 0; // generate the counters for(i=0;i<n;i++) count[a[i]] ++; // restore the array order based on counters for(j=0;j<m;j++) for(k=0;k<count[j];k++) a[i++] = j; Complexity is

11 MPI_Linear_sort(int n, int * a, int m, int root, MPI_Comm comm) The MPI routine should 1.The array a is scattered on processors. 2.The count is done on the scattered arrays. 3.The count arrays are all sum-reduced on processors 4.If root then restore the array The linear complexity makes this computation perhaps unsuitable for parallel computation.

12 Bucket Sort: Suppose that array=(array[i], i=0,…,n-1) has all elements in the interval [0, a]. Use multiple buckets / collectors to filter the elements in the buckets. Then sort the buckets.

13

14

15 Bucket Sort: // empty the buckets for(j=0;j<m;j++) bucket[j] = empty; // sweep the array and collect the elements in the correct bucket for(i=0;i<n;i++) { bucket_id = (int) (a[i] / m); push(a[i], bucket[bucket_id]); } // sort all buckets for(j=0;j<m;j++) sort(bucket[j]); // append all buckets for(j=0;j<m;j++) push(a, bucket[j]);

16

17

18 MPI_Bucket_sort(int n, int * a, int, int root, MPI_Comm comm) The MPI routine should 1.The array a is bcast on processors. 2.The elements of bucket rank are collected from the array. 3.The bucket rank is sorted up. 4.The buckets are then gathered to root.

19 Strategies for || Algorithms: Divide and Conquer

20 Main Strategies to Develop Parallel Algorithms Partitioning, Divide and Conquer, Pipelining etc Partitioning is the most popular strategy - the problem is split into several parts. - the results are combined to obtain the final result. Data is partitioned  domain decomposition. Computation is partitioned  functional decomposition. Remarks: - Embarrassingly computation uses partitioning. - The simplest partitioning is when # processors = # parts. Divide and Conquer is a recursive partitioning until the parts’ size is smaller.

21 The Summation Problem Find the sum of the array x[0],x[1],…,x[n-1] using m processors. The sequential solution uses n-1 additions. The array is divided into m sub-arrays. Each processor computes a partial sum. All the partial sums are collected by the master to find the final sum. Important problem: How to make communication efficient. 1. send / receive routines. 2. scatter / reduce routines. 3. divide and conquer?

22

23

24

25

26

27

28 More Elements Suppose that there are p=2^q processors. The D&C tree has the following elements: -It has q+1=log(p)+1 levels. -The active nodes on level l are -The receiver nodes on level l are -The active node sends half of data to

29 For the processor P(rank) we work with: - P(rank) is active on level l if (rank % p/pow(2,l)==0). - P(rank) is receiver on level l if it is active && (rank / p/pow(2,l) is odd). - If active P(rank) sends half of data to P(rank+p/pow(2,l+1)).

30

31 The Algorithm Step 1. Top-Bottom For l=0,1,2,.., q-1 if rank is receiver then receive n/pow(2,l) data from rank- p/pow(2,l) if rank is active then send n/pow(2,l+1) data to rank+p/pow(2,l+1) Step 2. Computation: find the summation of the local array Step 3. Bottom-Top For l=q-1,…2,1,0 if rank is active then receive received_sum from rank+p/pow(2,l) find sum=sum+received_sum if rank is sender then send sum to rank-p/pow(2,l)

32 The Program /* STAGE 1 - TOP->DOWN */ // using the D&C tree scatter the array onto processors for(level = 0; level <= q; level++){ //if is Receiver if( isReceiver( rank, size, level ) && level > 0) MPI_Recv(a, n/(int)pow(2, level), MPI_DOUBLE, rank - size/(int)pow(2, level), 0, MPI_COMM_WORLD, NULL); //if is Active if( isActive( rank, size, level ) && level < q ) MPI_Send(&a[n/(int)pow(2, level+1)], n/(int)pow(2, level+1), MPI_DOUBLE, rank+size/(int)pow(2, level+1), 0, MPI_COMM_WORLD); }

33 The Program /* STAGE 3 - DOWN->TOP */ for(level = q; level >= 0; level--){ if( isActive( rank, size, level ) && level < q ){ MPI_Recv(&tmpSum, 1, MPI_DOUBLE, rank+size/(int)pow(2, level+1), 0, MPI_COMM_WORLD, NULL); s += tmpSum; } if( isSender( rank, size, level ) && level > 0 ) MPI_Send(&s, 1, MPI_DOUBLE, rank-size/(int)pow(2, level), 0, MPI_COMM_WORLD); }

34

35 Parallel Merge Sort (1) Parallel Merge Sort uses the D&C tree to sort in parallel. The stages of the D&C computation are as follows: Stage 1. The array is scattered / communicated through the tree from root to leaves. Stage 2. The leaves sort out smaller array. Stage 3. The sorted arrays are gathered / merged / communicated through the tree from leaves to root. Each node of the tree computes: a. receive an array from the right hand side child b. merge the received array with the local array c. Send the new array to its father if sender

36 Parallel Merge Sort (2) Step 1. Top-Bottom For l=0,1,2,.., q-1 if rank is receiver then receive n/pow(2,l) data from rank- p/pow(2,l) if rank is active then send n/pow(2,l+1) data to rank+p/pow(2,l+1) Step 2. Computation Sort the local array of n/size elements Step 3. Bottom-Top For l=q-1,…2,1,0 if rank is active then receive the array from rank+p/pow(2,l) merge the local array with the received array if rank is sender then send the local array over to rank-p/pow(2,l)