Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2.

Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2

Introduction  Sorting is the most common operations performed by a computer  Internal or external  Comparison-based Θ(nlogn) and non comparison- based Θ(n) 3

background  Where the input and output sequence are stored? stored on one process distributed among the process ○ Useful as an intermediate step  What’s the order of output sequence among the processes? Global enumeration 4

How comparisons are performed  Compare-exchange is not easy in parallel sorting algorithms  One element per process Ts+Tw, Ts>>Tw => poor performance 5

How comparisons are performed (contd’)  More than one element per process n/p elements, Ai <= Aj Compare-split, (ts+tw*n/p)=> Ɵ (n/p) 6

Outline  introduction  Sorting Networks Bitonic sort Mapping bitonic sort to hypercube and mesh  Bubble Sort and its Variants 7

Sorting Networks Ɵ (log 2 n)  Key component: Comparator Increasing comparator Decreasing comparator 8

A typical sorting network  Depth: the number of columns it contains Network speed is proportional to it 9

Bitonic sort: Ɵ (log 2 n)  Bitonic sequence Monotonically increasing then decreasing There exists a cyclic shift of indices so that the above satisfied EG: 8 9 2 1 0 4 5 7  How to rearrange a bitonic sequence to obtain a monotonic sequence? Let s= is a bitonic sequence s 1,s 2 are bitonic every element of s 1 are smaller than every element of s 2 Bitonic-split; bitonic-merge=>bitonic-merging network or 10

Example of bitonic merging 11

Bitonic merging network  Logn column 12

Sorting n unordered elements  Bitonic sort, bitonic-sorting network  d(n)=d(n/2)+logn => d(n)=Θ(log 2 n) 13

The first three stage 14

How to map Bitonic sort to a hypercube ?  One element per process  How to map the bitonic sort algorithm on general purpose parallel computer? Process a wire Compare-exchange function is performed by a pair of processes Bitonic is communication intensive=> considering the topology of the interconnection network ○ Poor mapping => long distance before compare, degrading performance  Observation: Communication happens between pairs of wire which have 1 bit different 15

The last stage of bitonic sort 16

Communication characteristics 17

Bitonic sort algorithm on 2 d processors  Tp=Θ(log 2 n), cost optimal to bitonic sort 18

Mapping Bitonic sort to a mesh 19

The last stage of the bitonic sort 20

A block of elements per process case  Each processor has n/p elements S1: Think of each process as consisting of n/p smaller processes ○ Poor parallel implementation S2: Compare-exchange=> compare-split:Θ(n/p)+Θ(n/p) The different: S2 initially sorted locally Hypercube mesh 21

Performance on different Architecture  Either very efficient nor very scalable, since the sequential algorithm is sub optimal 22

Outline  introduction  Sorting Networks  Bubble Sort and its Variants 23

Bubble sort  O(n 2 )  Inherently sequential 24

Odd-even transposition  N phases, each Θ(n) comparisons 25

Odd-even transposition 26

Parallel formulation  O(n) 27

Shellsort  Drawback of odd-even sort A sequence which has a few elements out of order, still need Θ(n 2 ) to sort.  idea Add a preprocessing phase, moving elements across long distance Thus reduce the odd and even phase 28

Shellsort 29

Conclusion  Sorting Networks Bitonic network Mapping to hypercube and mesh  Bubble Sort and its Variants Odd-even sort Shell sort 30

Outline  Issues in Sorting  Sorting Networks  Bubble Sort and its Variants  Quick sort  Bucket and Sample sort  Other sorting algorithms 32

Quick Sort  Feature Simple, low overhead Θ(nlogn) ~ Θ(n 2 ),  Idea Choosing a pivot, how? Partitioning into two parts, Θ(n) Recursively solving two sub-problems  complexity T(n)=T(n-1)+ Θ(n)=> Θ(n 2 ) T(n)=T(n/2)+ Θ(n)=> Θ(nlogn) 33

The sequential algorithm 34

Parallelizing quicksort  Solution 1 Recursive decomposition Drawback: partition handled by single process, Ω(n). Ω(n 2 )  Solution 2 Idea: performing partition parallelly we could partition an array of size n into two smaller arrays in time Θ(1) by using Θ(n) processes ○ how? ○ CRCW PRAM, Shard-address, message-passing model 35

Parallel Formulation for CRCW PRAM –cost optimal  assumption n elements, n process  write conflicts are resolved arbitrarily  Executing quicksort can be visualized as constructing a binary tree 36

Example 37

algorithm 38 1. procedure BUILD TREE (A[1...n]) 2. begin 3. for each process i do 4. begin 5. root := i; 6. parent i := root; 7. leftchild[i] := rightchild[i] := n + 1; 8. end for 9. repeat for each process i ≠ root do 10. begin 11. if (A[i] < A[parent i ]) or (A[i]= A[parent i ] and i <parent i ) then 12. begin 13. leftchild[parent i ] :=i ; 14. ifi = leftchild[parent i ] then exit 15. else parent i := leftchild[parent i ]; 16. end for 17. else 18. begin 19. rightchild[parent i ] :=i; 20. If i = rightchild[parent i ] then exit 21. else parent i := rightchild[parent i ]; 22. end else 23. end repeat 24. end BUILD_TREE Assuming balanced tree: Partition distribute To all process O(1) Θ(logn) * Θ(1)

Parallel Formulation for Shared- Address-Space Architecture  assumption N element, p processes Shared memory  How to parallelize?  Idea of the algorithm Each process is assigned a block Selecting a pivot element, broadcast Local rearrangement Global rearrangement=> smaller block S, larger block L redistributing blocks to processes ○ How many? Until breaking the array into p parts 39

Example 40 How to compute the location?

Example(contd’) 41

How to do global rearrangement? 42

Analysis  Assumption Pivot selection results in balanced partitions  Logp steps Broadcasting Pivot Θ(logp) Locally rearrangement Θ(n/p) Prefix sum Θ(log p) Global rearrangement Θ(n/p) 43

Parallel Formulation for Message Passing Architecture  Similar to shared-address architecture  Different Array distributed to p processes 44

Pivot selection  Random selection Drawback: bad pivot lead to significant performance degradation  Median selection Assumption: the initial distribution of elements in each process is uniform 45

Bucket Sort  Assumption n elements distributed uniformly over [a, b]  Idea Divided into m equal sized subinterval Element replacement Sorted each one  Θ(nlog(n/m)) => Θ(n)  Compare with QuickSort 47

Parallelization on message passing architecture  N elements, p processes=> p buckets  Preliminary idea Distributing elements n/p Subinterval, elements redistribution Locally sorting Drawback: the assumption is not realistic => performance degradation  Solution: Sample sorting => splitters Guarantee elements < 2n/m 48

Example 49

analysis  Distributing elements n/p  Local sort & sample selection Θ(p)  Sample combining Θ(P 2 ),sortingΘ(p 2 logp), global splitter Θ(p)  elements partitioning Θ(plog(n/p)), redistribution O(n)+O(plogp)  Locally sorting 50

Enumeration Sort  Assumption O(n 2 ) process, n elements, CRCW PRAM  Feature Based the rank of each element  Θ(1) 52

Algorithm 53 1. procedure ENUM SORT (n) 2. begin 3. for each process P1,j do 4. C[j] :=0; 5. for each process Pi,j do 6. if (A[i] < A[j]) or ( A[i]= A[j] and i < j) then 7. C[j] := 1; 8. else 9. C[j] := 0; 10. for each process P1,j do 11. A[C[j]] := A[j]; 12. end ENUM_SORT Common structure: A[n], C[n]

Radix Sort  Assumption n elements, n process  Feature Based on binary presentation of the elements Leveraging the enumeration sorting 54

Algorithm 55 1. procedure RADIX SORT(A, r) 2. begin 3. for i := 0 to b/r - 1 do 4. begin 5. offset := 0; 6. for j := 0 to 2^r -1 do 7. begin 8. flag := 0; 9. if the ith least significant r-bit block of A[Pk] = j then 10. flag := 1; 11. index := prefix_sum(flag) // Θ(log n) 12. if flag = 1 then 13. rank := offset + index; 14. offset := parallel_sum(flag); // Θ(log n) 15. endfor 16. each process Pk send its element A[Pk] to process Prank;//Θ(n) 17. endfor 18. end RADIX_SORT

Conclusion  Sorting Networks Bitonic network, mapping to hypercube and mesh  Bubble Sort and its Variants Odd-even sorting, shell sorting  Quick sort Parallel formation on CRCW PRAM, shared address/MP architecutre  Bucket and Sample sort  Enumeration and radix sorting 56

Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2.

Similar presentations

Presentation on theme: "Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2.

Similar presentations

Presentation on theme: "Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2."— Presentation transcript:

Similar presentations

About project

Feedback