CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.

Slides:



Advertisements
Similar presentations
Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.
Advertisements

Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
Sorting Chapter 8 CSCI 3333 Data Structures.
Lecture 7-2 : Distributed Algorithms for Sorting Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14)
CS4413 Divide-and-Conquer
Stephen P. Carl - CS 2421 Recursive Sorting Algorithms Reading: Chapter 5.
Lecture 7 : Parallel Algorithms (focus on sorting algorithms) Courtesy : SUNY-Stony Brook Prof. Chowdhury’s course note slides are used in this lecture.
ISOM MIS 215 Module 7 – Sorting. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
Spring 2015 Lecture 5: QuickSort & Selection
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
Analysis of Algorithms
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
© 2004 Goodrich, Tamassia Quick-Sort     29  9.
CS 171: Introduction to Computer Science II Quicksort.
Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different but similar analyses –Probabilistic analysis of a deterministic algorithm.
© 2006 Pearson Addison-Wesley. All rights reserved10-1 Chapter 10 Algorithm Efficiency and Sorting CS102 Sections 51 and 52 Marc Smith and Jim Ten Eyck.
Sorting Algorithms CS 524 – High-Performance Computing.
1 Friday, November 17, 2006 “In the confrontation between the stream and the rock, the stream always wins, not through strength but by perseverance.” -H.
Chapter 4: Divide and Conquer The Design and Analysis of Algorithms.
General Computer Science for Engineers CISC 106 James Atlas Computer and Information Sciences 10/23/2009.
1 Lecture 11 Sorting Parallel Computing Fall 2008.
CHAPTER 11 Sorting.
Sorting Chapter 10.
Sorting Algorithms: Topic Overview
Sorting Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley,
CS 584. Sorting n One of the most common operations n Definition: –Arrange an unordered collection of elements into a monotonically increasing or decreasing.
Analysis of Algorithms CS 477/677
1 QuickSort Worst time:  (n 2 ) Expected time:  (nlgn) – Constants in the expected time are small Sorts in place.
© 2006 Pearson Addison-Wesley. All rights reserved10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
Design and Analysis of Algorithms - Chapter 41 Divide and Conquer The most well known algorithm design strategy: 1. Divide instance of problem into two.
Sorting Chapter 12 Objectives Upon completion you will be able to:
10 Algorithms in 20th Century Science, Vol. 287, No. 5454, p. 799, February 2000 Computing in Science & Engineering, January/February : The Metropolis.
Parallel Programming in C with MPI and OpenMP
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
1 Parallel Sorting Algorithms. 2 Potential Speedup O(nlogn) optimal sequential sorting algorithm Best we can expect based upon a sequential sorting algorithm.
Adaptive Parallel Sorting Algorithms in STAPL Olga Tkachyshyn, Gabriel Tanase, Nancy M. Amato
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2.
Chapter 10 B Algorithm Efficiency and Sorting. © 2004 Pearson Addison-Wesley. All rights reserved 9 A-2 Sorting Algorithms and Their Efficiency Sorting.
C++ Programming: From Problem Analysis to Program Design, Second Edition Chapter 19: Searching and Sorting.
CSC 211 Data Structures Lecture 13
© 2006 Pearson Addison-Wesley. All rights reserved10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
© 2006 Pearson Addison-Wesley. All rights reserved10 B-1 Chapter 10 (continued) Algorithm Efficiency and Sorting.
Communication and Computation on Arrays with Reconfigurable Optical Buses Yi Pan, Ph.D. IEEE Computer Society Distinguished Visitors Program Speaker Department.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
Parallel Programming with MPI and OpenMP
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
CS 284a, 29 October 1997 Copyright (c) , John Thornley1 CS 284a Lecture Tuesday, 29 October, 1997.
Sorting.
CSCI-455/552 Introduction to High Performance Computing Lecture 6.
Chapter 9 Sorting. The efficiency of data handling can often be increased if the data are sorted according to some criteria of order. The first step is.
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
Data Structures and Algorithms in Parallel Computing Lecture 8.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
CSCI-455/552 Introduction to High Performance Computing Lecture 21.
Computer Sciences Department1. Sorting algorithm 4 Computer Sciences Department3.
Randomized Quicksort (8.4.2/7.4.2) Randomized Quicksort –i = Random(p, r) –swap A[p]  A[i] –partition A(p, r) Average analysis = Expected runtime –solving.
Divide and Conquer Algorithms Sathish Vadhiyar. Introduction  One of the important parallel algorithm models  The idea is to decompose the problem into.
Quick-Sort 9/13/2018 1:15 AM Quick-Sort     2
Sorting atomic items Chapter 5 Distribution based sorting paradigms.
Parallel Sorting Algorithms
Algorithm Design Methods
Ch 7: Quicksort Ming-Te Chi
Parallel Sorting Algorithms
Chapter 4.
Divide and Conquer Merge sort and quick sort Binary search
Presentation transcript:

CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Outline Sorting problem Sequential quicksort Parallel quicksort Hyperquicksort Parallel sorting by regular sampling

Sorting Problem Permute: unordered sequence  ordered sequence Typically key (value being sorted) is part of record with additional values (satellite data) Most parallel sorts designed for theoretical parallel models: not practical Our focus: internal sorts based on comparison of keys

Sequential Quicksort Unordered list of values

Sequential Quicksort Choose pivot value

Sequential Quicksort Low list (  17) High list (> 17)

Sequential Quicksort Recursively apply quicksort to low list

Sequential Quicksort Recursively apply quicksort to high list

Sequential Quicksort Sorted list of values

Attributes of Sequential Quicksort Average-case time complexity:  (n log n) Worst-case time complexity:  (n 2 ) Occurs when low, high lists maximally unbalanced at every partitioning step Can make worst-case less probable by using sampling to choose pivot value Example: “Median of 3” technique

Quicksort Good Starting Point for Parallel Algorithm Speed Generally recognized as fastest sort in average case Preferable to base parallel algorithm on fastest sequential algorithm Natural concurrency Recursive sorts of low, high lists can be done in parallel

Definitions of “Sorted” Definition 1: Sorted list held in memory of a single processor Definition 2: Portion of list in every processor’s memory is sorted Value of last element on P i ’s list is less than or equal to value of first element on P i+1 ’s list We adopt Definition 2: Allows problem size to scale with number of processors

Parallel Quicksort 75, 91, 15, 64, 21, 8, 88, 54 50, 12, 47, 72, 65, 54, 66, 22 83, 66, 67, 0, 70, 98, 99, 82 20, 40, 89, 47, 19, 61, 86, 85 P0P0 P1P1 P2P2 P3P3

Parallel Quicksort 75, 91, 15, 64, 21, 8, 88, 54 50, 12, 47, 72, 65, 54, 66, 22 83, 66, 67, 0, 70, 98, 99, 82 20, 40, 89, 47, 19, 61, 86, 85 P0P0 P1P1 P2P2 P3P3 Process P 0 chooses and broadcasts randomly chosen pivot value

Parallel Quicksort 75, 91, 15, 64, 21, 8, 88, 54 50, 12, 47, 72, 65, 54, 66, 22 83, 66, 67, 0, 70, 98, 99, 82 20, 40, 89, 47, 19, 61, 86, 85 P0P0 P1P1 P2P2 P3P3 Exchange “lower half” and “upper half” values”

Parallel Quicksort 75, 15, 64, 21, 8, 54, 66, 67, 0, 70 50, 12, 47, 72, 65, 54, 66, 22, 20, 40, 47, 19, 61 83, 98, 99, 82, 91, 88 89, 86, 85 P0P0 P1P1 P2P2 P3P3 After exchange step Lower “half” Upper “half”

Parallel Quicksort 75, 15, 64, 21, 8, 54, 66, 67, 0, 70 50, 12, 47, 72, 65, 54, 66, 22, 20, 40, 47, 19, 61 83, 98, 99, 82, 91, 88 89, 86, 85 P0P0 P1P1 P2P2 P3P3 Processes P0 and P2 choose and broadcast randomly chosen pivots Lower “half” Upper “half”

Parallel Quicksort 75, 15, 64, 21, 8, 54, 66, 67, 0, 70 50, 12, 47, 72, 65, 54, 66, 22, 20, 40, 47, 19, 61 83, 98, 99, 82, 91, 88 89, 86, 85 P0P0 P1P1 P2P2 P3P3 Exchange values Lower “half” Upper “half”

Parallel Quicksort 15, 21, 8, 0, 12, 20, 19 50, 47, 72, 65, 54, 66, 22, 40, 47, 61, 75, 64, 54, 66, 67, 70 83, 82, 91, 88, 89, 86, 85 98, 99 P0P0 P1P1 P2P2 P3P3 Exchange values Lower “half” of lower “half” Lower “half” of upper “half” Upper “half” of lower “half” Upper “half” of upper “half”

Parallel Quicksort 0, 8, 12, 15, 19, 20, 21 22, 40, 47, 47, 50, 54, 54, 61, 64, 65, 66, 66, 67, 70, 72, 75 82, 83, 85, 86, 88, 89, 91 98, 99 P0P0 P1P1 P2P2 P3P3 Each processor sorts values it controls Lower “half” of lower “half” Lower “half” of upper “half” Upper “half” of lower “half” Upper “half” of upper “half”

Analysis of Parallel Quicksort Execution time dictated by when last process completes Algorithm likely to do a poor job balancing number of elements sorted by each process Cannot expect pivot value to be true median Can choose a better pivot value

Hyperquicksort Start where parallel quicksort ends: each process sorts its sublist First “sortedness” condition is met To meet second, processes must still exchange values Process can use median of its sorted list as the pivot value This is much more likely to be close to the true median

Hyperquicksort 75, 91, 15, 64, 21, 8, 88, 54 50, 12, 47, 72, 65, 54, 66, 22 83, 66, 67, 0, 70, 98, 99, 82 20, 40, 89, 47, 19, 61, 86, 85 P0P0 P1P1 P2P2 P3P3 Number of processors is a power of 2

Hyperquicksort 8, 15, 21, 54, 64, 75, 88, 91 12, 22, 47, 50, 54, 65, 66, 72 0, 66, 67, 70, 82, 83, 98, 99 19, 20, 40, 47, 61, 85, 86, 89 P0P0 P1P1 P2P2 P3P3 Each process sorts values it controls

Hyperquicksort 8, 15, 21, 54, 64, 75, 91, 88 12, 22, 47, 50, 54, 65, 66, 72 0, 66, 67, 70, 82, 83, 98, 99 19, 20, 40, 47, 61, 85, 86, 89 P0P0 P1P1 P2P2 P3P3 Process P 0 broadcasts its median value

Hyperquicksort 8, 15, 21, 54, 64, 75, 91, 88 12, 22, 47, 50, 54, 65, 66, 72 0, 66, 67, 70, 82, 83, 98, 99 19, 20, 40, 47, 61, 85, 86, 89 P0P0 P1P1 P2P2 P3P3 Processes will exchange “low”, “high” lists

Hyperquicksort 0, 8, 15, 21, 54 12, 19, 20, 22, 40, 47, 47, 50, 54 64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99 61, 65, 66, 72, 85, 86, 89 P0P0 P1P1 P2P2 P3P3 Processes merge kept and received values.

Hyperquicksort 0, 8, 15, 21, 54 12, 19, 20, 22, 40, 47, 47, 50, 54 64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99 61, 65, 66, 72, 85, 86, 89 P0P0 P1P1 P2P2 P3P3 Processes P 0 and P 2 broadcast median values.

Hyperquicksort 0, 8, 15, 21, 54 12, 19, 20, 22, 40, 47, 47, 50, 54 64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99 61, 65, 66, 72, 85, 86, 89 P0P0 P1P1 P2P2 P3P3 Communication pattern for second exchange

Hyperquicksort 0, 8, 12, 15 19, 20, 21, 22, 40, 47, 47, 50, 54, 54 61, 64, 65, 66, 66, 67, 70, 72, 75, 82 83, 85, 86, 88, 89, 91, 98, 99 P0P0 P1P1 P2P2 P3P3 After exchange-and-merge step

Complexity Analysis Assumptions Average-case analysis Lists stay reasonably balanced Communication time dominated by message transmission time, rather than message latency

Complexity Analysis Initial quicksort step has time complexity  ((n/p) log (n/p)) Total comparisons needed for log p merge steps:  ((n/p) log p) Total communication time for log p exchange steps:  ((n/p) log p)

Isoefficiency Analysis Sequential time complexity:  (n log n) Parallel overhead:  (n log p) Isoefficiency relation: n log n  C n log p  log n  C log p  n  p C The value of C determines the scalability. Scalability depends on ratio of communication speed to computation speed.

Another Scalability Concern Our analysis assumes lists remain balanced As p increases, each processor’s share of list decreases Hence as p increases, likelihood of lists becoming unbalanced increases Unbalanced lists lower efficiency Would be better to get sample values from all processes before choosing median

Parallel Sorting by Regular Sampling (PSRS Algorithm) Each process sorts its share of elements Each process selects regular sample of sorted list One process gathers and sorts samples, chooses pivot values from sorted sample list, and broadcasts these pivot values Each process partitions its list into p pieces, using pivot values Each process sends partitions to other processes Each process merges its partitions

PSRS Algorithm 75, 91, 15, 64, 21, 8, 88, 54 50, 12, 47, 72, 65, 54, 66, 22 83, 66, 67, 0, 70, 98, 99, 82 P0P0 P1P1 P2P2 Number of processors does not have to be a power of 2.

PSRS Algorithm Each process sorts its list using quicksort. 8, 15, 21, 54, 64, 75, 88, 91 12, 22, 47, 50, 54, 65, 66, 72 0, 66, 67, 70, 82, 83, 98, 99 P0P0 P1P1 P2P2

PSRS Algorithm Each process chooses p regular samples. 8, 15, 21, 54, 64, 75, 88, 91 12, 22, 47, 50, 54, 65, 66, 72 0, 66, 67, 70, 82, 83, 98, 99 P0P0 P1P1 P2P2

PSRS Algorithm One process collects p 2 regular samples. 15, 54, 75, 22, 50, 65, 66, 70, 83 8, 15, 21, 54, 64, 75, 88, 91 12, 22, 47, 50, 54, 65, 66, 72 0, 66, 67, 70, 82, 83, 98, 99 P0P0 P1P1 P2P2

PSRS Algorithm One process sorts p 2 regular samples. 15, 22, 50, 54, 65, 66, 70, 75, 83 8, 15, 21, 54, 64, 75, 88, 91 12, 22, 47, 50, 54, 65, 66, 72 0, 66, 67, 70, 82, 83, 98, 99 P0P0 P1P1 P2P2

PSRS Algorithm One process chooses p-1 pivot values. 15, 22, 50, 54, 65, 66, 70, 75, 83 8, 15, 21, 54, 64, 75, 88, 91 12, 22, 47, 50, 54, 65, 66, 72 0, 66, 67, 70, 82, 83, 98, 99 P0P0 P1P1 P2P2

PSRS Algorithm One process broadcasts p-1 pivot values. 15, 22, 50, 54, 65, 66, 70, 75, 83 8, 15, 21, 54, 64, 75, 88, 91 12, 22, 47, 50, 54, 65, 66, 72 0, 66, 67, 70, 82, 83, 98, 99 P0P0 P1P1 P2P2

PSRS Algorithm Each process divides list, based on pivot values. 8, 15, 21, 54, 64, 75, 88, 91 12, 22, 47, 50, 54, 65, 66, 72 0, 66, 67, 70, 82, 83, 98, 99 P0P0 P1P1 P2P2

PSRS Algorithm Each process sends partitions to correct destination process. 8, 15, 21 12, 22, 47, , 64 54, 65, , 88, , 70, 82, 83, 98, 99 P0P0 P1P1 P2P2

PSRS Algorithm Each process merges p partitions. 0, 8, 12, 15, 21, 22, 47, 50 54, 54, 64, 65, 66, 66 67, 70, 72, 75, 82, 83, 88, 91, 98, 99 P0P0 P1P1 P2P2

Assumptions Each process ends up merging close to n/p elements Experimental results show this is a valid assumption Processor interconnection network supports p simultaneous message transmissions at full speed 4-ary hypertree is an example of such a network

Time Complexity Analysis Computations Initial quicksort:  ((n/p)log(n/p)) Sorting regular samples:  (p 2 log p) Merging sorted sublists:  ((n/p)log p Overall:  ((n/p)(log n + log p) + p 2 log p) Communications Gather samples, broadcast pivots:  (log p) All-to-all exchange:  (n/p) Overall:  (n/p)

Isoefficiency Analysis Sequential time complexity:  (n log n) Parallel overhead:  (n log p) Isoefficiency relation: n log n  Cn log p  log n  C log p Scalability function same as for hyperquicksort Scalability depends on ratio of communication to computation speeds

Summary Three parallel algorithms based on quicksort Keeping list sizes balanced Parallel quicksort: poor Hyperquicksort: better PSRS algorithm: excellent Average number of times each key moved: Parallel quicksort and hyperquicksort: log p / 2 PSRS algorithm: (p-1)/p