Download presentation
Presentation is loading. Please wait.
Published byShannon Thompson Modified over 9 years ago
2
Outline introduction Sorting Networks Bubble Sort and its Variants 2
3
Introduction Sorting is the most common operations performed by a computer Internal or external Comparison-based Θ(nlogn) and non comparison- based Θ(n) 3
4
background Where the input and output sequence are stored? stored on one process distributed among the process ○ Useful as an intermediate step What’s the order of output sequence among the processes? Global enumeration 4
5
How comparisons are performed Compare-exchange is not easy in parallel sorting algorithms One element per process Ts+Tw, Ts>>Tw => poor performance 5
6
How comparisons are performed (contd’) More than one element per process n/p elements, Ai <= Aj Compare-split, (ts+tw*n/p)=> Ɵ (n/p) 6
7
Outline introduction Sorting Networks Bitonic sort Mapping bitonic sort to hypercube and mesh Bubble Sort and its Variants 7
8
Sorting Networks Ɵ (log 2 n) Key component: Comparator Increasing comparator Decreasing comparator 8
9
A typical sorting network Depth: the number of columns it contains Network speed is proportional to it 9
10
Bitonic sort: Ɵ (log 2 n) Bitonic sequence Monotonically increasing then decreasing There exists a cyclic shift of indices so that the above satisfied EG: 8 9 2 1 0 4 5 7 How to rearrange a bitonic sequence to obtain a monotonic sequence? Let s= is a bitonic sequence s 1,s 2 are bitonic every element of s 1 are smaller than every element of s 2 Bitonic-split; bitonic-merge=>bitonic-merging network or 10
11
Example of bitonic merging 11
12
Bitonic merging network Logn column 12
13
Sorting n unordered elements Bitonic sort, bitonic-sorting network d(n)=d(n/2)+logn => d(n)=Θ(log 2 n) 13
14
The first three stage 14
15
How to map Bitonic sort to a hypercube ? One element per process How to map the bitonic sort algorithm on general purpose parallel computer? Process a wire Compare-exchange function is performed by a pair of processes Bitonic is communication intensive=> considering the topology of the interconnection network ○ Poor mapping => long distance before compare, degrading performance Observation: Communication happens between pairs of wire which have 1 bit different 15
16
The last stage of bitonic sort 16
17
Communication characteristics 17
18
Bitonic sort algorithm on 2 d processors Tp=Θ(log 2 n), cost optimal to bitonic sort 18
19
Mapping Bitonic sort to a mesh 19
20
The last stage of the bitonic sort 20
21
A block of elements per process case Each processor has n/p elements S1: Think of each process as consisting of n/p smaller processes ○ Poor parallel implementation S2: Compare-exchange=> compare-split:Θ(n/p)+Θ(n/p) The different: S2 initially sorted locally Hypercube mesh 21
22
Performance on different Architecture Either very efficient nor very scalable, since the sequential algorithm is sub optimal 22
23
Outline introduction Sorting Networks Bubble Sort and its Variants 23
24
Bubble sort O(n 2 ) Inherently sequential 24
25
Odd-even transposition N phases, each Θ(n) comparisons 25
26
Odd-even transposition 26
27
Parallel formulation O(n) 27
28
Shellsort Drawback of odd-even sort A sequence which has a few elements out of order, still need Θ(n 2 ) to sort. idea Add a preprocessing phase, moving elements across long distance Thus reduce the odd and even phase 28
29
Shellsort 29
30
Conclusion Sorting Networks Bitonic network Mapping to hypercube and mesh Bubble Sort and its Variants Odd-even sort Shell sort 30
32
Outline Issues in Sorting Sorting Networks Bubble Sort and its Variants Quick sort Bucket and Sample sort Other sorting algorithms 32
33
Quick Sort Feature Simple, low overhead Θ(nlogn) ~ Θ(n 2 ), Idea Choosing a pivot, how? Partitioning into two parts, Θ(n) Recursively solving two sub-problems complexity T(n)=T(n-1)+ Θ(n)=> Θ(n 2 ) T(n)=T(n/2)+ Θ(n)=> Θ(nlogn) 33
34
The sequential algorithm 34
35
Parallelizing quicksort Solution 1 Recursive decomposition Drawback: partition handled by single process, Ω(n). Ω(n 2 ) Solution 2 Idea: performing partition parallelly we could partition an array of size n into two smaller arrays in time Θ(1) by using Θ(n) processes ○ how? ○ CRCW PRAM, Shard-address, message-passing model 35
36
Parallel Formulation for CRCW PRAM –cost optimal assumption n elements, n process write conflicts are resolved arbitrarily Executing quicksort can be visualized as constructing a binary tree 36
37
Example 37
38
algorithm 38 1. procedure BUILD TREE (A[1...n]) 2. begin 3. for each process i do 4. begin 5. root := i; 6. parent i := root; 7. leftchild[i] := rightchild[i] := n + 1; 8. end for 9. repeat for each process i ≠ root do 10. begin 11. if (A[i] < A[parent i ]) or (A[i]= A[parent i ] and i <parent i ) then 12. begin 13. leftchild[parent i ] :=i ; 14. ifi = leftchild[parent i ] then exit 15. else parent i := leftchild[parent i ]; 16. end for 17. else 18. begin 19. rightchild[parent i ] :=i; 20. If i = rightchild[parent i ] then exit 21. else parent i := rightchild[parent i ]; 22. end else 23. end repeat 24. end BUILD_TREE Assuming balanced tree: Partition distribute To all process O(1) Θ(logn) * Θ(1)
39
Parallel Formulation for Shared- Address-Space Architecture assumption N element, p processes Shared memory How to parallelize? Idea of the algorithm Each process is assigned a block Selecting a pivot element, broadcast Local rearrangement Global rearrangement=> smaller block S, larger block L redistributing blocks to processes ○ How many? Until breaking the array into p parts 39
40
Example 40 How to compute the location?
41
Example(contd’) 41
42
How to do global rearrangement? 42
43
Analysis Assumption Pivot selection results in balanced partitions Logp steps Broadcasting Pivot Θ(logp) Locally rearrangement Θ(n/p) Prefix sum Θ(log p) Global rearrangement Θ(n/p) 43
44
Parallel Formulation for Message Passing Architecture Similar to shared-address architecture Different Array distributed to p processes 44
45
Pivot selection Random selection Drawback: bad pivot lead to significant performance degradation Median selection Assumption: the initial distribution of elements in each process is uniform 45
46
Outline Issues in Sorting Sorting Networks Bubble Sort and its Variants Quick sort Bucket and Sample sort Other sorting algorithms 46
47
Bucket Sort Assumption n elements distributed uniformly over [a, b] Idea Divided into m equal sized subinterval Element replacement Sorted each one Θ(nlog(n/m)) => Θ(n) Compare with QuickSort 47
48
Parallelization on message passing architecture N elements, p processes=> p buckets Preliminary idea Distributing elements n/p Subinterval, elements redistribution Locally sorting Drawback: the assumption is not realistic => performance degradation Solution: Sample sorting => splitters Guarantee elements < 2n/m 48
49
Example 49
50
analysis Distributing elements n/p Local sort & sample selection Θ(p) Sample combining Θ(P 2 ),sortingΘ(p 2 logp), global splitter Θ(p) elements partitioning Θ(plog(n/p)), redistribution O(n)+O(plogp) Locally sorting 50
51
Outline Issues in Sorting Sorting Networks Bubble Sort and its Variants Quick sort Bucket and Sample sort Other sorting algorithms 51
52
Enumeration Sort Assumption O(n 2 ) process, n elements, CRCW PRAM Feature Based the rank of each element Θ(1) 52
53
Algorithm 53 1. procedure ENUM SORT (n) 2. begin 3. for each process P1,j do 4. C[j] :=0; 5. for each process Pi,j do 6. if (A[i] < A[j]) or ( A[i]= A[j] and i < j) then 7. C[j] := 1; 8. else 9. C[j] := 0; 10. for each process P1,j do 11. A[C[j]] := A[j]; 12. end ENUM_SORT Common structure: A[n], C[n]
54
Radix Sort Assumption n elements, n process Feature Based on binary presentation of the elements Leveraging the enumeration sorting 54
55
Algorithm 55 1. procedure RADIX SORT(A, r) 2. begin 3. for i := 0 to b/r - 1 do 4. begin 5. offset := 0; 6. for j := 0 to 2^r -1 do 7. begin 8. flag := 0; 9. if the ith least significant r-bit block of A[Pk] = j then 10. flag := 1; 11. index := prefix_sum(flag) // Θ(log n) 12. if flag = 1 then 13. rank := offset + index; 14. offset := parallel_sum(flag); // Θ(log n) 15. endfor 16. each process Pk send its element A[Pk] to process Prank;//Θ(n) 17. endfor 18. end RADIX_SORT
56
Conclusion Sorting Networks Bitonic network, mapping to hypercube and mesh Bubble Sort and its Variants Odd-even sorting, shell sorting Quick sort Parallel formation on CRCW PRAM, shared address/MP architecutre Bucket and Sample sort Enumeration and radix sorting 56
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.