1 Tuesday, November 14, 2006 “UNIX was never designed to keep people from doing stupid things, because that policy would also keep them from doing clever things.” -Doug Gwyn
2 Sorting Algorithms §Sorting fundamental part of computer operations. §Most common based method of sorting is comparison-based.
3 §The basic operation of comparison-based sorting is compare-exchange. The lower bound on any sequential comparison-based sort of n numbers is ?
4 The lower bound on any sequential comparison-based sort of n numbers is Θ(nlog n).
5 Sorting: Basics What is a parallel sorted sequence? Where are the input and output lists stored?
6 Sorting: Basics Assumption that the input and output lists are distributed. §Sorting can be intermediate step in another algorithm
7 Sorting: Basics The sorted output list is partitioned with the property that each partitioned list is sorted and each element in processor P i 's list is less than that in P j 's list if i < j.
8 Sorting: Parallel Compare Exchange Operation One element per process. ts+tw Overall runtime dominated by inter-process communication
9 Sorting: Parallel Compare Split Operation More than one element per process. The time for a compare-split operation is ? (assuming that the two partial lists were initially sorted).
10 Sorting: Parallel Compare Split Operation More than one element per process. The time for a compare-split operation is (ts+ twn/p) (assuming that the two partial lists were initially sorted). For larger block sizes time to merge two blocks is O(n/p)
11 Sorting Networks §Networks of comparators designed specifically for sorting.
12 Sorting Networks We denote an increasing comparator by and a decreasing comparator by Ө.
13 The speed of the network is proportional to ?
14 The speed of the network is proportional to its depth
15 Sorting Networks: Bitonic Sort §A bitonic sequence has two tones §Increasing - decreasing tone §Any cyclic rotation of such networks is also considered bitonic. (i.e. a bitonic sequence that becomes increasing-decreasing after shifting its elements) Is 1,2,4,7,6,0 a bitonic sequence?
16 Sorting Networks: Bitonic Sort Is 8,9,2,1,0,4 a bitonic sequence?
17 Sorting Networks: Bitonic Sort Is 8,9,2,1,0,4 a bitonic sequence? Yes, it is a cyclic shift of 0,4,8,9,2,1 .
18 Sorting Networks: Bitonic Sort Let s = a 0,a 1,…,a n-1 be a bitonic sequence such that a 0 ≤ a 1 ≤ ··· ≤ a n/2-1 and a n/2 ≥ a n/2+1 ≥ ··· ≥ a n-1. Consider the following subsequences of s : s 1 = min{a 0,a n/2 },min{a 1,a n/2+1 },…,min{a n/2-1,a n-1 } s 2 = max{a 0,a n/2 },max{a 1,a n/2+1 },…,max{a n/2-1,a n-1 }
19 Sorting Networks: Bitonic Sort s 1 = min{a 0,a n/2 },min{a 1,a n/2+1 },…,min{a n/2-1,a n-1 } s 2 = max{a 0,a n/2 },max{a 1,a n/2+1 },…,max{a n/2-1,a n-1 } In s 1 there is an element b i such that all elements before b i are from increasing part and all elements after b i are from the decreasing part. In s 2 there is an element b’ i such that all elements before b’ i are from decreasing part and all elements after b’ i are from the increasing part.
20 s 1 = min{a 0,a n/2 },min{a 1,a n/2+1 },…,min{a n/2-1,a n-1 } s 2 = max{a 0,a n/2 },max{a 1,a n/2+1 },…,max{a n/2-1,a n-1 } Sorting Networks: Bitonic Sort s 1 and s 2 are both bitonic and each element of s 1 is less than every element in s 2.
21 s 1 = min{a 0,a n/2 },min{a 1,a n/2+1 },…,min{a n/2-1,a n-1 } s 2 = max{a 0,a n/2 },max{a 1,a n/2+1 },…,max{a n/2-1,a n-1 } Sorting Networks: Bitonic Sort s 1 and s 2 are both bitonic and each element of s 1 is less than every element in s 2. §Divided a bigger problem into two smaller problems (bitonic split). We can apply the procedure recursively on s 1 and s 2 to get the sorted sequence.
22 Bitonic merge: The procedure of sorting a bitonic sequence using bitonic splits. Sorting Networks: Bitonic Sort
23 Sorting Networks: Bitonic Sort A bitonic merging network for n = 16. Note: input is a bitonic sequence. A BM[ 16 ] bitonic merging network: The network takes a bitonic sequence and outputs it in sorted order.
24 §Number of splits required are?
25 §Number of splits required are log(n). §There are log(n) columns in a bitonic merge network.
26 §How do we sort n unordered elements?
27 §How do we sort n unordered elements using bitonic merge? §We must first build a single bitonic sequence from the given sequence.
28 A sequence of length 2 is a bitonic sequence.
29 A bitonic sequence of length 4 can be built by sorting the first two elements using BM[ 2 ] and next two, using Ө BM[ 2 ]. §This process can be repeated to generate larger bitonic sequences.
30 Sorting Networks: Bitonic Sort The last merging network ( BM[ 16 ] ) sorts the input. In this example, n = 16.
31 Sorting Networks: Bitonic Sort The comparator network that transforms an input sequence of 16 unordered numbers into a bitonic sequence.
32 Sorting Networks: Bitonic Sort A bitonic merging network for n = 16. Note: input is a bitonic sequence. A BM[ 16 ] bitonic merging network: The network takes a bitonic sequence and outputs it in sorted order.
33 d(n) = d(n/2) +log(n)
34 The depth of the network is Θ(log 2 n). A serial implementation of the network would have complexity Θ(nlog 2 n).
35 §Bitonic algorithm is communication intensive. §Take into account topology of underlying network.
36 §Bitonic sorting network for n elements: l log n stages stage i consists of i columns of n/2 comparators. §On a parallel computer the compare exchange operations is performed by a pair of processes.
37 Sorting Networks: Bitonic Sort A bitonic merging network for n = 16. Note: input is a bitonic sequence. A BM[ 16 ] bitonic merging network: The network takes a bitonic sequence and outputs it in sorted order.
38 §One element per processor. §How to map processes? l Compare-exchange should ideally be between neighboring processes.
39 Mapping Bitonic Sort to Hypercubes §The compare-exchange operation is performed between two wires only if their labels differ in exactly one bit! §This implies a direct mapping of wires to processors. All communication is nearest neighbor!
40
41
42
43
44 Mapping Bitonic Sort to Hypercubes Communication characteristics of bitonic sort on a hypercube. During each stage of the algorithm, processes communicate along the dimensions shown.
45 Mapping Bitonic Sort to Hypercubes Parallel formulation of bitonic sort on a hypercube with n = 2 d processes. What is the parallel runtime?
46 Mapping Bitonic Sort to Hypercubes Parallel formulation of bitonic sort on a hypercube with n = 2 d processes. Parallel runtime: Tp=O(log 2 n) Cost optimal?
47 Block of Elements Per Processor Each process is assigned a block of n/p elements.
48 Block of Elements Per Processor §The first step is a local sort of the local block.
49 Block of Elements Per Processor: Hypercube Initially the processes sort their n/p elements (using merge sort) in time Θ((n/p)log(n/p)) and then perform Θ(log 2 p) compare-split steps. §The parallel run time of this formulation is Comparing to an optimal sort, the algorithm can efficiently use up to processes.