Download presentation
Presentation is loading. Please wait.
Published byStephanie Holland Modified over 9 years ago
1
Binary Merge-Sort Merge-Sort(A,i,j) 01 if (i < j) then 02 m = (i+j)/2; 03 Merge-Sort(A,i,m); 04 Merge-Sort(A,m+1,j); 05 Merge(A,i,m,j) Merge-Sort(A,i,j) 01 if (i < j) then 02 m = (i+j)/2; 03 Merge-Sort(A,i,m); 04 Merge-Sort(A,m+1,j); 05 Merge(A,i,m,j) Divide Conquer Combine 1 2 8 10 7 9 13 19 127 Merge is linear in the #items to be merged
2
Few key observations Items = (short) strings = atomic... On english wikipedia, about 10 9 tokens to sort (n log n) memory accesses (I/Os ??) [5ms] * n log 2 n ≈ 3 years In practice it is a “faster”, why?
3
Recursion 102 10 2 51 5 1 1319 13 19 97 9 7 154 15 4 83 8 3 1217 12 17 611 6 11 10 2 5 113 19 9 715 4 8 312 17 6 11 10 2 5 1 13 19 9 715 4 8 3 12 17 6 11 10 2 5 1 13 19 9 7 15 4 8 3 12 17 6 11 log 2 N
4
Implicit Caching… 102 2 10 51 1 5 1319 13 19 97 7 9 154 4 15 83 3 8 1217 12 17 611 6 11 1 2 5 107 9 13 193 4 8 156 11 12 17 1 2 5 7 9 10 13 193 4 6 8 11 12 15 17 1 2 3 4 5 6 7 8 9 10 11 12 13 15 17 19 log 2 N M N/M runs, each sorted in internal memory (no I/Os) 2 passes (one Read/one Write) = 2 * (N/B) I/Os — I/O-cost for binary merge-sort is ≈ 2 (N/B) log 2 (N/M) Log 2 (N/M) 2 passes (R/W)
5
B A key inefficiency 1 2 4 7 9 10 13 193 5 6 8 11 12 15 17 B After few steps, every run is longer than B !!! B We are using only 3 pages But memory contains M/B pages ≈ 2 30 /2 15 = 2 15 B Output Buffer Disk 1, 2, 3 Output Run 4,...
6
Multi-way Merge-Sort Sort N items with main-memory M and disk-pages B: Pass 1: Produce (N/M) sorted runs. Pass i: merge X = M/B-1 runs log X N/M passes Main memory buffers of B items Pg for run1 Pg for run X Out Pg Disk Pg for run 2...
7
How it works 1 2 5 107 9 13 19 1 2 5 7…. 1 2 3 4 5 6 7 8 9 10 11 12 13 15 17 19 M N/M runs, each sorted in internal memory = 2 (N/B) I/Os 2 passes (one Read/one Write) = 2 * (N/B) I/Os — I/O-cost for X-way merge is ≈ 2 (N/B) I/Os per level Log X (N/M) M X X
8
Cost of Multi-way Merge-Sort Number of passes = log X N/M log M/B (N/M) Total I/O-cost is ( (N/B) log M/B N/M ) I/Os Large fan-out (M/B) decreases #passes In practice M/B ≈ 10 5 #passes = 1 few mins Tuning depends on disk features Compression would decrease the cost of a pass!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.