Sorting
Quick Sort Example 2, 1, 3, 4, 5 2, 4, 3, 1, 5, 5, 6, 9, 7, 8, 5 S={6, 5, 9, 2, 4, 3, 5, 1, 7, 5, 8} 1, 234, 5 5 5, 6, 7, 8, 9
Quick Sort Step 1. If n = 1 then return. Step 2. Find the median m of the input array A. Step 3. Use m to partition A into two subsequences B and C. Step 4. Quick sort B. Step 5. Quick sort C. O(1) O(n) t(n/2)
The time complexity of Quick sort t(n)= O(1) + O(n) + O(n) + 2t(n/2) = cn + 2t(n/2) = O(n log n)
Merging Networks (1, 1)-merger (comparator): Sorting Networks
Merging Networks (2, 2)-merger:
Sorting Networks
Running Time s(2) = 1, i = 1 s(2 i ) = s(2 i-1 ) + 1, i > 1. s(2 i ): the time required in the ith stage. There are log n stages in all. s(2 i ) = i
t(n) = s(2 1 ) + s(2 2 ) + … + s(2 log n ) = … + log n = O(log 2 n)
Number of Processors q(2) = 1, i = 1 q(2 i ) = 2q(2 i-1 ) + 2 i-1 - 1, i > 1. q(2 i ): the number of comparators required in the ith stage. q(2 i ) = (i-1)2 i-1 + 1
q(2 i ) = 2q(2 i-1 ) + 2 i-1 – 1 = 2 2 q(2 i-2 ) + 2 i-1 – i-1 – 1 =2 3 q(2 i-3 ) + 2 i-1 – i-1 – i-1 – 2 0 … = 2 i-1 q(2 1 ) + 2 i-1 + … + 2 i-1 – ( … + 2 i-2 ) = 2 i-1 + (i – 1)2 i-1 – 2 i = (i – 1)2 i-1 + 1
p(n) = 2 (logn)-1 q(2 1 ) + 2 (logn)-2 q(2 2 ) + … q(2 log n ) = O(nlog 2 n)
Cost c(n) = p(n) * t(n) = O(nlog 2 n) * O(log 2 n) = O(nlog 4 n)
Sorting on a linear array (odd-even transposition)
Procedure ODD-EVEN TRANSPOSITION (S) for j = 1 to ┌ n/2 ┐ do (1) for i = 1, 3, …, 2 └ n/2 ┘ - 1 do in parallel if x i > x i+1 then x i ←→ x i+1 end if end for (2) for i = 2, 4, …, 2 └ (n-1)/2 ┘ do in parallel if x i > x i+1 then x i ←→ x i+1 end if end for
Time: odd-even steps. = O() Cost:
2 ways for reducing cost: (i) reduce running time (ii) reduce # of processors Reducing running time is hopeless since the lower bound for sorting on a linear array of n processors is..
Sorting on a linear array (odd-even transposition)
N processors are available, N < n. Each processor stores data elements. Stage 1: Sort sequentially in each processor. Stage 2: Odd-even transposition sort. Each comparison-exchange is replaced with a merge-split.. O((n/N)log(n/N)) ┌ N/2 ┐ O(n/N) 用 sequential merge, 每次 O(n/N) 的時間, 共有 N/2 個 steps, 所以總時間為
C(n) = p(n)*t(n) = The algorithm is cost optimal when..
CRCW Sort
Procedure CRCW SORT(S) Step 1. for i = 1 to n do in parallel for j = 1 to n do in parallel if (s i > s j ) or (s i = s j and i > j) then P(i, j) writes 1 in c i else P(i, j) writes 0 in c i end if end for Step 2. for i = 1 to n do in parallel P(i, 1) stores s i in position 1 + c i of S end for O(1)
p(n) = n 2 t(n) = O(1) c(n) = n 2
CREW Sort ( 利用 CREW MERGE) S={2, 8, 5, 10, 15, 1, 12, 6, 14, 3, 11, 7, 9, 4, 13, 16} N = 4 Step 1. P 1 : {2, 8, 5, 10} P 2 : {15, 1, 12, 6} P 3 : {14, 3, 11, 7} P 4 : {9, 4, 13, 16} Step 2. P 1 : {2, 5, 8, 10} P 2 : {1, 6, 12, 15} P 3 : {3, 7, 11, 14} P 4 : {4, 9, 13, 16} Step 3. P 1, P 2 : {1, 2, 5, 6, 8, 10, 12, 15} P 3, P 4 : {3, 4, 7, 9, 11, 13, 14, 16} P 1, P 2, P 3, P 4 : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
Procedure CREW SORT(S) Step 1. for i = 1 to N do in parallel Processor P i 1.1 reads a distinct subsequence S i of S of size n/N 1.2 QUICKSORT(S i ) 1.3 S(i, 1) ← S i 1.4 P(i, 1)← {P i } O(n/N log n/N)
Step 2. u ← 1 v ← N while v > 1 do for m = 1 to └ v/2 ┘ do in parallel P(u+1, m) ← P(u, 2m-1) ∪ P(u, 2m) The processors in the set P(u+1, m) perform CREW MERGE(S(u, 2m-1), S(u, 2m), S(u+1, m)) end for if v is odd then P(u+1, [v/2]) ← P(u, v) S(u+1, [v/2]) ← S(u, v) end if u ← u + 1 v ← [v/2] end while 每次 O(n/N + log n) 共做 [log N] 次, 總共所花時間為 [log N] * O(n/N + log n)
t(n) = O((n/N) log(n/N)) + O((n/N) + log n) *log N = O((n/N) log n – (n/N) log N) + O((n/N) log N + log n log N) = O((n/N) log n + log n log N) c(n) = p(n) * t(n) = N * O((n/N) log n + log n log N) = O(n log n + N log n log N) The algorithm is cost optimal when N ≦ n/log N.
Sorting on the EREW Model Simulating Procedure CREW Sort: 用 MULTIPLE BROADCAST 來取代 Concurrent Read. 多花 log N 的時間. t(n) = O((n/N)log n + log n log N) * O(log N) = O([(n/N) + log N] log n log N) c(n) = O([(n/N) + log N] log n log N) * N = O((n +N log N) log n log N) which is not cost optimal.
Sorting by Conflict-Free Merging 用 EREW MERGE 取代 CREW MERGE. 每次 EREW MERGE 所花的時間為 O((n/N) + log n log N). 要做 log N 次, 所以總共所花時間為 t(n) = O((n/N) log (n/N)) + O((n/N) + log n log N) * log N = O([(n/N) + log 2 N] log n) c(n) = O((n + N log 2 N) log n) which is cost optimal when N ≦ n/log 2 N.
Parallel Quicksort S={5, 9, 12, 16, 18, 2, 10, 13, 17, 4, 7, 18, 18,1 1, 3, 17, 20, 19, 14, 8, 5, 17, 1, 11, 15, 10, 6} n = 27N = 5 N = n 1-x = 27 1-x x ≒ 0.5 將 n 個 data 分成 2 1/x 塊, 每塊有 n/2 1/x 個 data. 原因稍後解釋 2 1/x = 2 1/0.5 = 4 n/2 1/x = 27/4 ≒ 7 所以 PARALLEL SELECT 第 7, 14, 21 大的數字, 分別另為 m 1, m 2, 和 m 3. m 1 = 6, m 2 = 11, m 3 =17.
S 1 = {5, 2, 4, 3, 5, 1, 6} S 2 = {9, 10, 7, 8, 10, 11, 11} S 3 = {12, 16, 13, 14, 15, 17, 17} S 4 = {18, 18, 18, 20, 19, 17} m 1 = 6, m 2 = 11, m 3 = 17 n = 7 N = n 1-x = ≒ 2 P 1, P 2 : S 1 = {5, 2, 4, 3, 5, 1, 6} P 3, P 4 : S 2 = {9, 10, 7, 8, 10, 11, 11} m 1 = 2, m 2 = 4, m 3 = 5m 1 = 8, m 2 = 10, m 3 = 11 將 n 個 data 分成 2 1/x 塊, 每塊有 n/2 1/x 個 data. 7 42
P 1, P 2 : S = {5, 2, 4, 3, 5, 1, 6} P 3, P 4 : S = {9, 10, 7, 8, 10, 11, 11} m 1 = 2, m 2 = 4, m 3 = 5m 1 = 8, m 2 = 10, m 3 = 11 S 1 = {1, 2} S 2 = {3, 4} S 3 = {5, 5} S 4 = {6} S 1 = {7, 8} S 2 = {9, 10} S 3 = {10, 11} S 4 = {11}
S 1 = {5, 2, 4, 3, 5, 1, 6} S 2 = {9, 10, 7, 8, 10, 11, 11} S 3 = {12, 16, 13, 14, 15, 17, 17} S 4 = {18, 18, 18, 20, 19, 17} m 1 = 6, m 2 = 11, m 3 = 17 n = 7 N = n 1-x = ≒ 2 P 1, P 2 : S 3 = {12, 16, 13, 14, 15, 17, 17} P 3, P 4 : S 4 = {18, 18, 18, 20, 19, 17} m 1 = 13, m 2 = 15, m 3 = 17 m 1 = 18, m 2 = 18, m 3 = 20 將 n 個 data 分成 2 1/x 塊, 每塊有 n/2 1/x 個 data. 7 42
P 1, P 2 : S = {12, 16, 13, 14, 15, 17, 17} P 3, P 4 : S = {18, 18, 18, 20, 19, 17} m 1 = 13, m 2 = 15, m 3 = 17 m 1 = 18, m 2 = 18, m 3 = 20 S 1 = {12, 13} S 2 = {14, 15} S 3 = {16, 17} S 4 = {17} S 1 = {17, 18} S 2 = {18, 18} S 3 = {19, 20} S 4 = { }
procedure EREW SORT (S) if then QUICKSORT (S) else (1) for i=1 to k-1 do PARALLEL SELECT (S, ) {Obtain} (3) for i=2 to k-1 do end for (5) for i=1 to k/2 do in parallel EREW SORT end for (2) (4) (6) for to k do in parallel end for EREW SORT end if
: number of processors,
Why ? elements use processors. elements use processors.
time: which is cost optimal. c(n) = p(n)*t(n) = n 1-x * n x log n = n logn