1 Tuesday, November 14, 2006 “UNIX was never designed to keep people from doing stupid things, because that policy would also keep them from doing clever.

Slides:



Advertisements
Similar presentations
PERMUTATION CIRCUITS Presented by Wooyoung Kim, 1/28/2009 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad.
Advertisements

Lecture 19: Parallel Algorithms
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.
Lecture 3: Parallel Algorithm Design
1 Parallel Parentheses Matching Plus Some Applications.
Parallel Matrix Operations using MPI CPS 5401 Fall 2014 Shirley Moore, Instructor November 3,
ALGORITMOS DE ORDENACIÓN EN PARALELO
CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
Using Divide and Conquer for Sorting
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Parallel Sorting Algorithms Comparison Sorts if (A>B) { temp=A; A=B; B=temp; } Potential Speed-up –Optimal Comparison Sort: O(N lg N) –Optimal Parallel.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Chapter 10 in textbook. Sorting Algorithms
Sorting Algorithms CS 524 – High-Performance Computing.
1 Friday, November 17, 2006 “In the confrontation between the stream and the rock, the stream always wins, not through strength but by perseverance.” -H.
1 Lecture 25: Parallel Algorithms II Topics: matrix, graph, and sort algorithms Tuesday presentations:  Each group: 10 minutes  Describe the problem,
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Lecture 21: Parallel Algorithms
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.
Design of parallel algorithms
Sorting Algorithms: Topic Overview
CS 584. Sorting n One of the most common operations n Definition: –Arrange an unordered collection of elements into a monotonically increasing or decreasing.
Sorting Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley,
CS 584. Sorting n One of the most common operations n Definition: –Arrange an unordered collection of elements into a monotonically increasing or decreasing.
1 Lecture 24: Parallel Algorithms I Topics: sort and matrix algorithms.
External Memory Algorithms Kamesh Munagala. External Memory Model Aggrawal and Vitter, 1988.
Bitonic and Merging sorting networks Efficient Parallel Algorithms COMP308.
CSCI-455/552 Introduction to High Performance Computing Lecture 22.
1 Sorting Algorithms - Rearranging a list of numbers into increasing (strictly non-decreasing) order. ITCS4145/5145, Parallel Programming B. Wilkinson.
Parallel Programming in C with MPI and OpenMP
Lecture 12: Parallel Sorting Shantanu Dutt ECE Dept. UIC.
1 Parallel Sorting Algorithms. 2 Potential Speedup O(nlogn) optimal sequential sorting algorithm Best we can expect based upon a sequential sorting algorithm.
CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 12: February 6, 2006 Sorting.
Adaptive Parallel Sorting Algorithms in STAPL Olga Tkachyshyn, Gabriel Tanase, Nancy M. Amato
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Topic Overview One-to-All Broadcast and All-to-One Reduction All-to-All Broadcast and Reduction All-Reduce and Prefix-Sum Operations Scatter and Gather.
Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2.
CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Fall 2013.
Graph Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Adapted for 3030 To accompany the text ``Introduction to Parallel Computing'',
Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Reduced slides for CSCE 3030 To accompany the text ``Introduction.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
1 Parallel Sorting Algorithm. 2 Bitonic Sequence A bitonic sequence is defined as a list with no more than one LOCAL MAXIMUM and no more than one LOCAL.
“Sorting networks and their applications”, AFIPS Proc. of 1968 Spring Joint Computer Conference, Vol. 32, pp
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
Data Structures and Algorithms in Parallel Computing Lecture 8.
Review 1 Insertion Sort Insertion Sort Algorithm Time Complexity Best case Average case Worst case Examples.
1 Sorting Networks Sorting.
Lecture 9 Architecture Independent (MPI) Algorithm Design
Parallel Programming - Sorting David Monismith CS599 Notes are primarily based upon Introduction to Parallel Programming, Second Edition by Grama, Gupta,
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
CSCI-455/552 Introduction to High Performance Computing Lecture 21.
Sorting: Parallel Compare Exchange Operation A parallel compare-exchange operation. Processes P i and P j send their elements to each other. Process P.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Lecture 4 Sorting Networks
NEW SORTING ALGORITHMS
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Parallel Odd-Even Sort Algorithm Dr. Xiao.
Bitonic Sorting and Its Circuit Design
Parallel Computing Spring 2010
Bitonic and Merging sorting networks
Parallel Sorting Algorithms
Sorting Algorithms - Rearranging a list of numbers into increasing (strictly non-decreasing) order. Sorting number is important in applications as it can.
Parallel Sorting Algorithms
CS203 Lecture 15.
Presentation transcript:

1 Tuesday, November 14, 2006 “UNIX was never designed to keep people from doing stupid things, because that policy would also keep them from doing clever things.” -Doug Gwyn

2 Sorting Algorithms §Sorting fundamental part of computer operations. §Most common based method of sorting is comparison-based.

3 §The basic operation of comparison-based sorting is compare-exchange.  The lower bound on any sequential comparison-based sort of n numbers is ?

4  The lower bound on any sequential comparison-based sort of n numbers is Θ(nlog n).

5 Sorting: Basics What is a parallel sorted sequence? Where are the input and output lists stored?

6 Sorting: Basics Assumption that the input and output lists are distributed. §Sorting can be intermediate step in another algorithm

7 Sorting: Basics  The sorted output list is partitioned with the property that each partitioned list is sorted and each element in processor P i 's list is less than that in P j 's list if i < j.

8 Sorting: Parallel Compare Exchange Operation One element per process. ts+tw Overall runtime dominated by inter-process communication

9 Sorting: Parallel Compare Split Operation More than one element per process. The time for a compare-split operation is ? (assuming that the two partial lists were initially sorted).

10 Sorting: Parallel Compare Split Operation More than one element per process. The time for a compare-split operation is (ts+ twn/p) (assuming that the two partial lists were initially sorted). For larger block sizes time to merge two blocks is O(n/p)

11 Sorting Networks §Networks of comparators designed specifically for sorting.

12 Sorting Networks  We denote an increasing comparator by  and a decreasing comparator by Ө.

13 The speed of the network is proportional to ?

14 The speed of the network is proportional to its depth

15 Sorting Networks: Bitonic Sort §A bitonic sequence has two tones §Increasing - decreasing tone §Any cyclic rotation of such networks is also considered bitonic. (i.e. a bitonic sequence that becomes increasing-decreasing after shifting its elements)  Is  1,2,4,7,6,0  a bitonic sequence?

16 Sorting Networks: Bitonic Sort  Is  8,9,2,1,0,4  a bitonic sequence?

17 Sorting Networks: Bitonic Sort  Is  8,9,2,1,0,4  a bitonic sequence?  Yes, it is a cyclic shift of  0,4,8,9,2,1 .

18 Sorting Networks: Bitonic Sort  Let s =  a 0,a 1,…,a n-1  be a bitonic sequence such that a 0 ≤ a 1 ≤ ··· ≤ a n/2-1 and a n/2 ≥ a n/2+1 ≥ ··· ≥ a n-1.  Consider the following subsequences of s : s 1 =  min{a 0,a n/2 },min{a 1,a n/2+1 },…,min{a n/2-1,a n-1 }  s 2 =  max{a 0,a n/2 },max{a 1,a n/2+1 },…,max{a n/2-1,a n-1 } 

19 Sorting Networks: Bitonic Sort s 1 =  min{a 0,a n/2 },min{a 1,a n/2+1 },…,min{a n/2-1,a n-1 }  s 2 =  max{a 0,a n/2 },max{a 1,a n/2+1 },…,max{a n/2-1,a n-1 }  In s 1 there is an element b i such that all elements before b i are from increasing part and all elements after b i are from the decreasing part. In s 2 there is an element b’ i such that all elements before b’ i are from decreasing part and all elements after b’ i are from the increasing part.

20 s 1 =  min{a 0,a n/2 },min{a 1,a n/2+1 },…,min{a n/2-1,a n-1 }  s 2 =  max{a 0,a n/2 },max{a 1,a n/2+1 },…,max{a n/2-1,a n-1 }  Sorting Networks: Bitonic Sort  s 1 and s 2 are both bitonic and each element of s 1 is less than every element in s 2.

21 s 1 =  min{a 0,a n/2 },min{a 1,a n/2+1 },…,min{a n/2-1,a n-1 }  s 2 =  max{a 0,a n/2 },max{a 1,a n/2+1 },…,max{a n/2-1,a n-1 }  Sorting Networks: Bitonic Sort  s 1 and s 2 are both bitonic and each element of s 1 is less than every element in s 2. §Divided a bigger problem into two smaller problems (bitonic split).  We can apply the procedure recursively on s 1 and s 2 to get the sorted sequence.

22 Bitonic merge: The procedure of sorting a bitonic sequence using bitonic splits. Sorting Networks: Bitonic Sort

23 Sorting Networks: Bitonic Sort A bitonic merging network for n = 16. Note: input is a bitonic sequence. A  BM[ 16 ] bitonic merging network: The network takes a bitonic sequence and outputs it in sorted order.

24 §Number of splits required are?

25 §Number of splits required are log(n). §There are log(n) columns in a bitonic merge network.

26 §How do we sort n unordered elements?

27 §How do we sort n unordered elements using bitonic merge? §We must first build a single bitonic sequence from the given sequence.

28  A sequence of length 2 is a bitonic sequence.

29  A bitonic sequence of length 4 can be built by sorting the first two elements using  BM[ 2 ] and next two, using Ө BM[ 2 ]. §This process can be repeated to generate larger bitonic sequences.

30 Sorting Networks: Bitonic Sort The last merging network (  BM[ 16 ] ) sorts the input. In this example, n = 16.

31 Sorting Networks: Bitonic Sort The comparator network that transforms an input sequence of 16 unordered numbers into a bitonic sequence.

32 Sorting Networks: Bitonic Sort A bitonic merging network for n = 16. Note: input is a bitonic sequence. A  BM[ 16 ] bitonic merging network: The network takes a bitonic sequence and outputs it in sorted order.

33 d(n) = d(n/2) +log(n)

34  The depth of the network is Θ(log 2 n).  A serial implementation of the network would have complexity Θ(nlog 2 n).

35 §Bitonic algorithm is communication intensive. §Take into account topology of underlying network.

36 §Bitonic sorting network for n elements: l log n stages stage i consists of i columns of n/2 comparators. §On a parallel computer the compare exchange operations is performed by a pair of processes.

37 Sorting Networks: Bitonic Sort A bitonic merging network for n = 16. Note: input is a bitonic sequence. A  BM[ 16 ] bitonic merging network: The network takes a bitonic sequence and outputs it in sorted order.

38 §One element per processor. §How to map processes? l Compare-exchange should ideally be between neighboring processes.

39 Mapping Bitonic Sort to Hypercubes §The compare-exchange operation is performed between two wires only if their labels differ in exactly one bit! §This implies a direct mapping of wires to processors. All communication is nearest neighbor!

40

41

42

43

44 Mapping Bitonic Sort to Hypercubes Communication characteristics of bitonic sort on a hypercube. During each stage of the algorithm, processes communicate along the dimensions shown.

45 Mapping Bitonic Sort to Hypercubes Parallel formulation of bitonic sort on a hypercube with n = 2 d processes. What is the parallel runtime?

46 Mapping Bitonic Sort to Hypercubes Parallel formulation of bitonic sort on a hypercube with n = 2 d processes. Parallel runtime: Tp=O(log 2 n) Cost optimal?

47 Block of Elements Per Processor  Each process is assigned a block of n/p elements.

48 Block of Elements Per Processor §The first step is a local sort of the local block.

49 Block of Elements Per Processor: Hypercube  Initially the processes sort their n/p elements (using merge sort) in time Θ((n/p)log(n/p)) and then perform Θ(log 2 p) compare-split steps. §The parallel run time of this formulation is  Comparing to an optimal sort, the algorithm can efficiently use up to processes.