Parallel Sorting Algorithms

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
Lecture 7-2 : Distributed Algorithms for Sorting Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14)
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
CSCE 3110 Data Structures & Algorithm Analysis
DIVIDE AND CONQUER APPROACH. General Method Works on the approach of dividing a given problem into smaller sub problems (ideally of same size).  Divide.
Efficient Sorts. Divide and Conquer Divide and Conquer : chop a problem into smaller problems, solve those – Ex: binary search.
Updated QuickSort Problem From a given set of n integers, find the missing integer from 0 to n using O(n) queries of type: “what is bit[j]
Parallel Sorting Algorithms Comparison Sorts if (A>B) { temp=A; A=B; B=temp; } Potential Speed-up –Optimal Comparison Sort: O(N lg N) –Optimal Parallel.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Chapter 10 in textbook. Sorting Algorithms
Algorithms and Applications
CHAPTER 11 Sorting.
Chapter 7 (Part 2) Sorting Algorithms Merge Sort.
CSCI-455/552 Introduction to High Performance Computing Lecture 22.
Sorting (Part II: Divide and Conquer) CSE 373 Data Structures Lecture 14.
1 Sorting Algorithms - Rearranging a list of numbers into increasing (strictly non-decreasing) order. ITCS4145/5145, Parallel Programming B. Wilkinson.
1 Parallel Sorting Algorithms. 2 Potential Speedup O(nlogn) optimal sequential sorting algorithm Best we can expect based upon a sequential sorting algorithm.
Sorting HKOI Training Team (Advanced)
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Searching and Sorting Gary Wong.
Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2.
HKOI 2006 Intermediate Training Searching and Sorting 1/4/2006.
Adapted from instructor resource slides Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
1 Sorting Algorithms Sections 7.1 to Comparison-Based Sorting Input – 2,3,1,15,11,23,1 Output – 1,1,2,3,11,15,23 Class ‘Animals’ – Sort Objects.
1. 2 Sorting Algorithms - rearranging a list of numbers into increasing (strictly nondecreasing) order.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
Sorting and Searching by Dr P.Padmanabham Professor (CSE)&Director
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
Chapter 9 Sorting 1. The efficiency of data handling can often be increased if the data are sorted according to some criteria of order. The first step.
Chapter 9 Sorting. The efficiency of data handling can often be increased if the data are sorted according to some criteria of order. The first step is.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
CSCI-455/552 Introduction to High Performance Computing Lecture 21.
Data Structures and Algorithms Instructor: Tesfaye Guta [M.Sc.] Haramaya University.
INTRO2CS Tirgul 8 1. Searching and Sorting  Tips for debugging  Binary search  Sorting algorithms:  Bogo sort  Bubble sort  Quick sort and maybe.
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
Advanced Sorting.
Prof. U V THETE Dept. of Computer Science YMA
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Sorting.
Sorting Mr. Jacobs.
Introduction to Algorithms
Parallel Sorting Algorithms
Description Given a linear collection of items x1, x2, x3,….,xn
Algorithm Design Methods
Algorithm Design and Analysis (ADA)
Sorting Chapter 13 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved
Quicksort and Mergesort
Linear Sorting Sections 10.4
Data Structures Review Session
Linear Sorting Sorting in O(n) Jeff Chastine.
Sub-Quadratic Sorting Algorithms
Linear Sorting Section 10.4
Parallel Sorting Algorithms
Linear-Time Sorting Algorithms
Quicksort.
Algorithm Efficiency and Sorting
Analysis of Algorithms
CSE 373 Data Structures and Algorithms
Sorting Algorithms - Rearranging a list of numbers into increasing (strictly non-decreasing) order. Sorting number is important in applications as it can.
The Selection Problem.
Parallel Sorting Algorithms
CSE 332: Sorting II Spring 2016.
Algorithm Efficiency and Sorting
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Quicksort.
Presentation transcript:

Parallel Sorting Algorithms

The simple sorting algorithms (Bubble Sort, Insertion Sort, Selection Sort, …) are Lower bound on comparison based algorithms (Merge Sort, Quicksort, Heap Sort, …) is The best we can hope to do with parallelizing a comparison-based algorithm using n processors is

Question: What can we expect if we use n2 processors? Answer: Typically, it is still:

In place sorting – sorting the values within the same array Not in-place (or out-of-place) sorting – sorting the values to a new array Stable sorting algorithm – Identical values remain in the same order as the original after sorting

Compare and Exchange Most sorting algorithms (although not all) compare two values and possibly exchange them Two numbers in the array are compared with each other: if (A[i] > A[j]) { temp = A[i]; A[i] = A[j]; A[j] = temp; }

Message-passing Compare and Exchange (version 1) In a distributed-memory system, we have to send message to exchange values P1 send value to P2. P2 compares the two values; keeps the larger and sends the smaller back to P1 A[i] A[i] Processor 1 A[j] Processor 2 A[i] is sent Smaller Value Returned Larger Value Kept

Message-passing Compare and Exchange (version 2) Both processors send their values to each other. They both compare values. P1 keeps the smaller value, P2 keeps the larger values. A[i] A[i] Processor 1 A[j] Processor 2 A[i] is sent A[j] A[j] is sent Smaller Value Kept Larger Value Kept

Message-passing Compare and Exchange (version 1) When we have the number of values (n) is much larger than the number of processors (p), then each processors is responsible of n/p values Processor 1 Processor 2 Final Values Original Values (sorted first) 88 50 28 25 Original Values (sorted first) 98 80 43 42 merge 98 88 80 50 43 42 28 25 Larger Numbers Kept 43 42 28 25 88 50 28 25 Final Values Smaller Numbers Sent

Message-passing Compare and Exchange (version 2) When we have the number of values (n) is much larger than the number of processors (p), then each processors is responsible of n/p values Processor 2 Processor 2 98 88 80 50 43 42 28 25 88 50 28 25 98 80 43 42 merge 98 88 80 50 43 42 28 25 merge Larger Numbers Kept Original Values sent 98 80 43 42 88 50 28 25 Smaller Numbers Kept

Parallelizing Common Sorting Algorithms

Bubble Sort N passes are made comparing/exchanging adjacent values Larger values settle to the bottom (Sediment Sort?) After pass m, the m largest values are in the m last locations in the array.

After pass 1, the largest values is in it’s location

Complexity of Bubble Sort The number of comparison/exchanges with Bubble Sort is: Very slow but easy to parallelize We can use a pipeline-sort of technique

Parallel Bubble Sort P0 P1 P2 P3 P4 P5 P6 P7 Phase 1 Time Phase 2

Even-Odd Transposition Sort Uses alternating even and odd phases Even phase even processors exchange with next higher ranked processor odd processors exchange with the next lower ranked processor Odd phase even processors exchange with next lower ranked processor odd processors exchange with the next higher ranked processor

Even-Odd Sort P0 P1 P2 P3 P4 P5 P6 P7 4 2 7 8 5 1 3 6 2 4 7 8 1 5 3 6 Time 2 4 1 7 3 8 5 6 2 1 4 3 7 5 8 6 1 2 3 4 5 7 6 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

What is the time complexity of Parallel Bubble Sort with n processors? Answer: 2n=O(n) What is the time complexity of Even-Odd Sort with n processors? Answer: O(n)

Divide and Conquer Pattern Characterized by dividing problem into sub- problems of same form as larger problem. Further divisions into still smaller sub-problems, usually done by recursion. Creates a tree. Recursive divide and conquer amenable to parallelization because separate processes can be used for divided parts. Also usually data becomes naturally localized.

Sorting - Several sorting algorithms can often be partitioned or constructed in a recursive divide and conquer fashion, e.g. Mergesort, Quicksort, Searching algorithms dividing search space recursively

Merge Sort Merge Sort is a “Divide and Conquer” algorithm The array is divided into smaller and smaller parts Then the small parts are merged into a new sorted sub array

Merge Sort Divide Merge P0 P1 P2 P3 P4 P5 P6 P7 4 2 7 8 5 1 3 6 4 2 7 Time 4 2 7 8 5 1 3 6 Merge 2 4 7 8 1 5 3 6 2 4 7 8 1 3 5 6 1 2 3 4 5 6 7 8

Complexity of Parallel Merge Sort Complexity of the sequential merge sort: Complexity of parallel merge sort with n processors There are 2logn steps, but each merge step takes longer and longer Turns out to be:

Quicksort Pick a pivot The move all the values less than the pivot to the left side of the array And all the values greater than the pivot to the right side Then recursively sort the two sides

Quicksort Works well when the pivot is approximately in the middle If the pivot is chosen poorly, quicksort resorts to

Neither Merge Sort nor Quicksort parallelize efficiently Still Any tree structure (reduction, merge, quicksort) cannot be parallelized by n processors efficiently

Batcher’s Parallel Sorting Algorithms Odd-even Merge Sort (not the same as even- odd transposition sort) Bitonic Merge Sort Complexity with n processors:

Odd-Even Merge Sort

Summary (using n processors to sort n numbers) Parallel Bubble sort and Odd-even transposition sort - O(n) Parallel mergesort - O(n) but unbalanced processor load and communication (because of tree) Parallel quicksort - O(n) but unbalanced processor load and communication (because of tree) and can degenerate to O(n2) Odd-even Mergesort - O(log2n)

Sorting on Specific Networks Two network structures have received specific attention: Mesh Hypercube because parallel computers have been built with these networks. Of less interest nowadays because underlying architecture often hidden from user. There are MPI features for mapping algorithms onto meshes. Can always use a mesh or hypercube algorithm even if the underlying architecture is not the same.

Mesh - Two-Dimensional Sorting Final sorted list -- could be row by row or snakelike.

Shearsort Alternate row and column sorted until list fully sorted Result is a snake-like sorting On a mesh, this requires steps

Shearsort

Other Sorts (Coming Up Next) Rank Sort Counting Sort Radix Sort

Rank Sort The idea of rank sort is to count the number of values less than a[i] That count is the “rank” of that value The rank is where in the sorted array that value will be placed So we can put in into that spot b[rank] = a[i].

Rank Sort (Sequential Code) for (i = 0; i < n; i++) { // for each number x = 0; for (j = 0; j < n; j++) // count number less than it if (a[i] > a[j]) x++; b[x] = a[i]; // copy number into correct place } Doesn’t handle duplicate values (How can this be fixed?) The complexity is O(n2) However, this is easy to parallelize

Rank Sort (Parallel Code) forall (i = 0; i < n; i++) { // each processor allocated different value x = 0; for (j = 0; j < n; j++) // count number less than it if (a[i] > a[j]) x++; b[x] = a[i]; // copy number into correct place } Uses n processors and complexity is O(n) Very easy to implement in OpenMP As good as many of the parallel sorts (except bitonic and shear sort)

Rank Sort using n2 processors Instead of one processors comparing a[i] with other values, we use n-1 proces- sors Incrementing counter is a critical section; must be sequential, so this is still O(n)

Rank Sort using n2 processors Can do this as a reduction Complexity is O(logn) with n2 processors Low processor efficiency

Sequential Complexity Parallel Complexity with n processors Summary Sorting Algorithm Sequential Complexity Parallel Complexity with n processors Bubble Sort O(n2) O(n) Even-Odd Sort Merge Sort O(nlogn) Quicksort Odd-Even Merge Sort O(log2n) Shearsort O(√n(logn+1)) Rank Sort

Counting Sort The best sequential sorting algorithms are O(nlogn). We can do better if we can make assumption about the data If the values are all integers within a relatively small range of values, we can do Counting Sort in O(n) instead of O(n2) or O(nlogn).

Counting Sort Counting Sort is a stable sort (identical values remain in same relative position as in orginal array) However, it is not an “in-place” sort Further more, we need extra space Suppose original values in the array a[ ] Sorted array will be b[ ] Create an array c[ ] of size m where values of a[ ] are in the range 1..m.

Counting Sort – Step 1 c[ ] is the histogram of values in the array. In other words, it is the sum of equal values Complexity is O(m + n) for (i = 1; i <= m; i++) c[i] = 0; for (i = 1; i <= n; i++) c[ a[i] ]++;

Counting Sort – Step 2 We can find the rank of each value by doing a prefix sum on c[ ] for (i = 2; i <= m; i++) c[i] = c[i] + c[i-1]; Complexity is O(m) c[0] c[1] c[2] c[3] c[4] … c0 c0+ c1 c1+ c2 c2+ c3 c3+ c4

Counting Sort – Step 3 We can now use the prefix sum to place the values where they should go: for (i = n; i >= 1; i--) { b[ c[ a[i] ] ] = a[i] c[ a[i] ]--; // ensures stable sorting } Sequential complexity for all 3 steps is O(m + n) If m is linear with n, then complexity is O(n)

Counting Sort Step 3 moves backwards thro a[] 1 2 3 4 5 6 7 8 5 2 3 7 6 4 1 Original sequence a[] Step 3 moves backwards thro a[] Step 1. Histogram c[] 1 2 Step 2. Prefix sum c[] 1 2 3 4 6 7 8 Move 5 to position 6. Then decrement c[5] Step 3. Sort b[] 1 2 3 4 5 6 7 Final sorted sequence

Parallel Counting Sort (using n processors) Step 1 – Creating the Histogram: O(logn). We use the same tree as with rank sort. Updating c[a[i]]++ is a critical section when there are duplicate values in a[ ]. Step 2 – Parallel version of prefix sum using n processors is O(logn) Step 3 – Placing number in place: O(logn) again because of possible duplicate values (c[a[i]]– is a critical section) All 3 steps: O(logn)

Radix Sort Assumption: Values are either unsigned decimal or unsigned binary Sort values based on least significant digit Then sort on next least significant digit Continue until the most significant digit is used to sort. This works if we use a stable sorting algorithm Since each digit will be in the range 0-9 or 0-1, then we can use Counting Sort for each digit

Radix Sort on Decimal Values

Radix Sort on Binary Values

Radix Sort When sorting binary values, we can improve the counting sort To sort a column of bits, the prefix sum gives us the number of 1’s We can use that to place the ones (decremented each time to deal with the duplicates) The prefix sum of the inverse of the bits (called the diminished prefix sum) gives us the number of 0’s and can be used to place the zeros. The complexity of sorting this column is O(logn), same as using counting sort on the column

Radix Sort The complexity of Radix Sort is the complexity of sorting each digit times the number of digits If the range of values is 1..N, then the number of digits is log2N for binary numbers and log10N for decimal. O(logn*logN) where n is the number of values and N is the range of values

Sequential Complexity Parallel Complexity with n processors Summary Sorting Algorithm Sequential Complexity Parallel Complexity with n processors Bubble Sort O(n2) O(n) Even-Odd Sort Merge Sort O(nlogn) Quicksort Odd-Even Merge Sort O(log2n) Shearsort O(√n(logn+1)) Rank Sort Counting Sort O(logn) Radix Sort O(nlogN) O(logn*logN)

Questions