Problem Solving Strategies

Slides:



Advertisements
Similar presentations
Garfield AP Computer Science
Advertisements

1 Parallel Parentheses Matching Plus Some Applications.
Lecture 7-2 : Distributed Algorithms for Sorting Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14)
Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.
Faster Sorting Methods Chapter 9 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Ver. 1.0 Session 5 Data Structures and Algorithms Objectives In this session, you will learn to: Sort data by using quick sort Sort data by using merge.
Parallel Sorting Algorithms Comparison Sorts if (A>B) { temp=A; A=B; B=temp; } Potential Speed-up –Optimal Comparison Sort: O(N lg N) –Optimal Parallel.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Chapter 10 in textbook. Sorting Algorithms
1 Friday, November 17, 2006 “In the confrontation between the stream and the rock, the stream always wins, not through strength but by perseverance.” -H.
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.
1 Sorting Algorithms - Rearranging a list of numbers into increasing (strictly non-decreasing) order. ITCS4145/5145, Parallel Programming B. Wilkinson.
Section 8.4 Insertion Sort CS Insertion Sort  Another quadratic sort, insertion sort, is based on the technique used by card players to arrange.
1 Parallel Sorting Algorithms. 2 Potential Speedup O(nlogn) optimal sequential sorting algorithm Best we can expect based upon a sequential sorting algorithm.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
1. 2 Sorting Algorithms - rearranging a list of numbers into increasing (strictly nondecreasing) order.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.
Lecture No. 04,05 Sorting.  A process that organizes a collection of data into either ascending or descending order.  Can be used as a first step for.
1 Searching and Sorting Searching algorithms with simple arrays Sorting algorithms with simple arrays –Selection Sort –Insertion Sort –Bubble Sort –Quick.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
Chapter 9 Sorting. The efficiency of data handling can often be increased if the data are sorted according to some criteria of order. The first step is.
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
© 2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved. Data Structures for Java William H. Ford William R. Topp Chapter 4 Introduction.
Today’s Material Sorting: Definitions Basic Sorting Algorithms
Parallel Programming - Sorting David Monismith CS599 Notes are primarily based upon Introduction to Parallel Programming, Second Edition by Grama, Gupta,
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
CSCI-455/552 Introduction to High Performance Computing Lecture 21.
Searching and Sorting Searching algorithms with simple arrays
Chapter 23 Sorting Jung Soo (Sue) Lim Cal State LA.
Advanced Sorting 7 2  9 4   2   4   7
Prof. U V THETE Dept. of Computer Science YMA
Lecture 25: Searching and Sorting
UNIT - IV SORTING By B.Venkateswarlu Dept of CSE.
Sorting.
Sorts, CompareTo Method and Strings
Chapter 16: Searching, Sorting, and the vector Type
16 Searching and Sorting.
Sorting Mr. Jacobs.
Top 50 Data Structures Interview Questions
CSC 421: Algorithm Design & Analysis
CSC 421: Algorithm Design & Analysis
Introduction to Search Algorithms
CSC 421: Algorithm Design & Analysis
Algorithm Analysis CSE 2011 Winter September 2018.
Parallel Sorting Algorithms
Description Given a linear collection of items x1, x2, x3,….,xn
How can this be simplified?
Searching and Sorting Linear Search Binary Search ; Reading p
Sorting Chapter 13 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved
Quicksort and Mergesort
Unit-2 Divide and Conquer
Decomposition Data Decomposition Functional Decomposition
CS Two Basic Sorting Algorithms Review Exchange Sorting Merge Sorting
Parallel Sorting Algorithms
Sorting Chapter 8 CS 225.
Sub-Quadratic Sorting Algorithms
Parallel Sorting Algorithms
Sorting Chapter 8.
Analysis of Algorithms
CSC 421: Algorithm Design & Analysis
Sorting Algorithms - Rearranging a list of numbers into increasing (strictly non-decreasing) order. Sorting number is important in applications as it can.
Recursive Algorithms 1 Building a Ruler: drawRuler()
Parallel Sorting Algorithms
CS203 Lecture 15.
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

Problem Solving Strategies Partitioning Divide the problem into disjoint parts Compute each part separately Divide and Conquer Divide Phase: Recursively create sub-problems of the same type Base case Reached: Execute an algorithm Conquer phase: Merge the results as the recursion unwinds Traditional Example: Merge Sort Where is the work? Partitioning: Creating disjoint parts of the problem Divide and Conquer: Merging the separate results Traditional Example: Quick Sort

Parallel Sorting Considerations Distributed memory Distributed system precision differences can cause unpredictable results Traditional algorithms can require excessive communication Modified algorithms minimize communication requirements Typically, data is scattered to the P processors Shared Memory Critical sections and Mutual Exclusion locks can inhibit performance Modified algorithms eliminate the need for locks Each processor can sort N/P data points or they can work in parallel in a more fine grain manner (no need for processor communication).

Two Related Sorts Bubble Sort Odd-Even Sort void bubble(char[] *x, int N) { int sorted=0, i, size=N-1; char* temp; while (!sorted) { sorted=1; for (i=0;i<size;i++) { if (strcmp(x[i],x[i+1]>0) { strcpy(temp,x[i]); strcpy(x[i],x[i+1]); strcpy(x[i+1],temp); sorted = 0; } size--; } } void oddEven(char[] *x, int N) { int even=0,sorted=0,i,size=N-1; char *temp; while(!sorted) { sorted=1; for(i=even; i<size; i+=2) { if(strcmp(x[i],x[i+1]>0) { strcpy(temp,x[i]); strcpy(x[i],x[i+1]); strcpy(x[i+1],temp); sorted = 0; } } even = 1 – even; Sequential version: Odd-Even has no advantages Parallel version: Processors can work independently without data conflicts

Bubble, Odd Even Example Bubble Pass Odd Even Pass Bubble: Smaller values move left one spot per pass. Largest value move immediately to the end. The loop size can shrink by one each pass. Odd Even: The loop size cannot shrink. However, all interchanges can occur in parallel.

One Parallel Iteration Distributed Memory Shared Memory Odd Processors: mergeLow(pr data, pr-1 data) ; Barrier if (r<=P-2) mergeHigh(pr data,pr+1 data) Barrier Even Processors: mergeHigh(pr data, pr+1 data) ; Barrier if (r>=1) mergeLow(pr data, pr-1 data) Barrier Odd Processors: sendRecv(pr data, pr-1 data); mergeHigh(pr data, pr-1 data) if(r<=P-2) {  sendRecv(pr data, pr+1 data);    mergeLow(pr data, pr+1 data) } Even Processors: sendRecv(pr data, pr+1 data); mergeLow(pr data, pr+1 data) if(r>=1) {  sendrecv(pr data, Pr-1 data);    mergeHigh(pr data, pr-1 data) } Notation: r = Processor rank, P = number of processors, pr data is the block of data belonging to processor, r Note: P/2 Iterations are necessary to complete the sort

A Distributed Memory Implementation Scatter the data among available processors Locally sort N/P items on each processor Even Passes Even processors, p<N-1, exchange data with processor, p+1. Processors, p and p+1 perform a partial merge where p extracts the lower half and p+1 extracts the upper half. Odd Passes Even processors, p>=2, exchange data with processor, p-1. Processors, p, and p-1 perform a partial merge where p extracts the upper half and p-1 extracts the lower half. Exchanging Data: MPI_Sendrecv

Partial Merge – Lower keys Store the lower n keys from arrays a and b into array c mergeLow(char[] *a, char[] *b, char *c, int n) { int countA=0, countB=0, countC=0; while (countC < n) { if (strcmp(a[countA],b[countB])<0) { strcpy(c[countC++], a[countA++]); } else { strcpy(c[countC++], a[countB++); } } To merge upper keys: Initialize the counts to n-1 Decrement the counts instead of increment Change the countC < n to countC >= 0

Bitonic Sequence 10,12,14,20] [95,90,60,40,35,23,18,0] [3,5,8,9 [3,5,8,9,10,12,14,20] [95,90,60,40,35,23,18,0] Increasing and then decreasing where the end can wrap around

Unsorted: 10,20,5,9.3,8,12,14,90,0,60,40,23,35,95,18 Step 1: 10,20 9,5 3,8 14,12 0,90 60,40 23,35 95,18 Step 2: [9,5][10,20][14,12][3,8][0,40][60,90][95,35][23,18] 5,9 10,20 14,12 8,3 0,40 60,90 95,35 23,18 Step 3: [5,9,8,3][14,12,10,20] [95,40,60,90][0,35,23,18] [5,3][8,9][10,12][14,20] [95,90][60,40][23,35][0,18] 3,5, 8,9, 10,12, 14,20 95,90, 60,40, 35,23, 18,0 Step 4: [3,5,8,9,10,12,14,0] [95,90,60,40,35,23,18,20] [3,5,8,0] [10,12,14,9] [35,23,18,20][95,90,60,40] [3,0][8,5] [10,9][14,12] [18,20][35,23] [60,40][95,90] Sorted: 0,3,5,8,9,10,12,14,18,20,23,35,40,60,90,95 Bitonic Sort

Bitonic Sorting Functions void bitonicMerge(int lo, int n, int dir) { if (n>1) int m=n/2; for (int i=lo; i<lo+m; i++) compareExchange(i, i+m, dir); bitonicMerge(lo, m, dir); bitonicMerge(lo+m, m, dir); } } void bitonicSort(int lo, int n, int dir) { if (n>1) int m=n/2; bitonicSort(lo, m, UP); bitonicSort(lo+m, m, DOWN; bitonicMerge(lo, n, dir); } } Notes: dir = 0 for DOWN, and 1 for UP compareExchange moves low value left if dir = UP high value left if dir = DOWN

Bitonic Sort Partners/Direction Algorithm Steps  level 1 2 2 3 3 3 4 4 4 4 j 0 0 1 0 1 2 0 1 2 3 rank = 0 partners = 1/L, 2/L 1/L, 4/L 2/L 1/L, 8/L 4/L 2/L 1/L rank = 1 partners = 0/H, 3/L 0/H, 5/L 3/L 0/H, 9/L 5/L 3/L 0/H rank = 2 partners = 3/H, 0/H 3/L, 6/L 0/H 3/L, 10/L 6/L 0/H 3/L rank = 3 partners = 2/L, 1/H 2/H, 7/L 1/H 2/H, 11/L 7/L 1/H 2/H rank = 4 partners = 5/L, 6/H 5/H, 0/H 6/L 5/L, 12/L 0/H 6/L 5/L rank = 5 partners = 4/H, 7/H 4/L, 1/H 7/L 4/H, 13/L 1/H 7/L 4/H rank = 6 partners = 7/H, 4/L 7/H, 2/H 4/H 7/L, 14/L 2/H 4/H 7/L rank = 7 partners = 6/L, 5/L 6/L, 3/H 5/H 6/H, 15/L 3/H 5/H 6/H rank = 8 partners = 9/L, 10/L 9/L, 12/H 10/H 9/H, 0/H 12/L 10/L 9/L rank = 9 partners = 8/H, 11/L 8/H, 13/H 11/H 8/L, 1/H 13/L 11/L 8/H rank = 10 partners = 11/H, 8/H 11/L, 14/H 8/L 11/H, 2/H 14/L 8/H 11/L rank = 11 partners = 10/L, 9/H 10/H, 15/H 9/L 10/L, 3/H 15/L 9/H 10/H rank = 12 partners = 13/L, 14/H 13/H, 8/L 14/H 13/H, 4/H 8/H 14/L 13/L rank = 13 partners = 12/H, 15/H 12/L, 9/L 15/H 12/L, 5/H 9/H 15/L 12/H rank = 14 partners = 15/H, 12/L 15/H, 10/L 12/L 15/H, 6/H 10/H 12/H 15/L rank = 15 partners = 14/L, 13/L 14/L, 11/L 13/L 14/L, 7/H 11/H 13/H 14/H partner = rank ^ (1<<(level-j-1)); direction = ((rank<partner) == ((rank & (1<<level)) ==0))

Java Partner/Direction Code public static void main(String[] args) { int nproc = 16, partner, levels = (int)(Math.log(nproc)/Math.log(2)); for (int rank = 0; rank<nproc; rank++) { System.out.printf("rank = %2d partners = ", rank); for (int level = 1; level <= levels; level++ ) { for (int j = 0; j < level; j++) { partner = rank ^ (1<<(level-j-1)); String dir = ((rank<partner)==((rank&(1<<level))==0))?"L":"H"; System.out.printf("%3d/%s", partner, dir); } if (level<levels) System.out.print(", "); } System.out.println(); } } Small j values means partner is further away (longer shift)

Parallel Bitonic Pseudo code IF master processor Create or retrieve data to sort Scatter it among all processors (including the master) ELSE Receive portion to sort Sort local data using an algorithm of preference FOR( level = 1; level <= lg(P) ; level++ ) FOR ( j = 0; j<level; j++ ) partner = rank ^ (1<<(level-j-1)); Exchange data with partner IF ((rank<partner) == ((rank & (1<<level)) ==0)) extract low values from local and received data (mergeLow) ELSE extract high values from local and received data (mergeHigh) Gather sorted data at the master

Bucket Sort Partitioning Algorithm: Assign a range of values to each processor Each processor sorts the values assigned The resulting values are forwarded to the master Steps Scatter N/P numbers to each processor Each Processor Creates smaller buckets of numbers for designated for each processor Sends the designated buckets to the various processors and receives the designated buckets it expects to receive Sorts its section Sends its data back to the processor with rank 0

Bucket Sort Partitioning Unsorted Numbers Sorted Unsorted Numbers Sorted P1 P2 P3 Pp Sequential Bucket Sort Drop sections of data to sort into buckets Sort each bucket Copy sorted bucket data back into the primary array Complexity O(b * (n/b lg(n/b)) Parallel Bucket Sort Notes: Bucket Sort works well for uniformly distributed data Finding mediums can help equalize bucket sizes

Rank (Enumeration) Sort Count the numbers smaller to each number, src[i] or duplicates with a smaller index The count is the final array position for x for (i=0; i<N; i++) { count = 0; for (j=0; j<N; j++) if (src[i] > src[j]) count++; dest[i] = count; } Shared Memory parallel implementation Change the outer for to forall Reduces complexity from O(N2) to O(N)

Counting Sort Works on primitive fixed point types: int, char, long, etc. Assumption: the data entries contain a fixed number of values Master scatters the data among the processors In parallel, each processor counts the total occurrences for each of the N/P data points Processors perform a collective reduce sum operation Processors performs an all-to-all collective prefix sum operation In parallel, each processor stores V/P data items appropriately in the output array where V is the number of unique values Sorted data gathered at the master processor Note: Counting sort is used to reduce the memory needed for radix sort

Merge Sort P4 P2 P7 P6 P5 P3 P0 P1 Scatter N/P items to each processor Sort Phase: Processors sort its data with a method of choice Merge Phase: Data routed and a merge is performed at each level for (gap=1; gap<P; gap*=2) { if ((p/gap)%2 != 0) { Send data to p–gap; break; } else { Receive data from p+gap Merge with local data } }

Quick Sort WHILE true // Slave processors Request sub-list from work pool IF receive termination message THEN exit IF receive data to partition; Quick sort partition the data Sort partitions whose length is <= threshold and return sorted data Return unsorted partitions to master WHILE data not fully sorted // Master processor IF receive data sorted message pending THEN place data in output array IF receive partition message THEN add the partition to the work-pool IF work-pool not empty THEN partition the data into two parts Add partitions whose length > threshold back to work-pool Sort partitions whose length <= threshold and place in output array IF work-pool not empty IF request pending THEN send work-pool entry to slave processor Send termination message to all processors Note: The work-pool initially contains a single entry; the unsorted data Note: Distributed work pools can improve speedup. When local work-pools get too low, processors request work from their neighbors.