External Sorting Access to secondary storage is orders of magnitude slower than memory access. Minimize access to secondary storage (tape or disk). Also may want to read data sequentially (tapes).
External Sorting Simple merge example - sorting M records at a time (M=3), with 4 tapes (T a1, T a2, T b1, T b2 ) T a ; ; ; ; 15 T a2 T b1, T b2 empty
External Sorting T a1, T a2 empty T b ; ; 15 T b ; T a ; 15 T a T b1, T b2 empty
External Sorting –read M records at a time and sort internally –a set of sorted records is called a run –it will require log(N/M) passes, plus the initial run-constructing pass –given 10 million records of 128 bytes, and 4 M bytes of internal memory N=10*10 6, M=4*10 6 /128, # of runs = N/M = 320 # of passes = log(N/M) + 1= 10
External Sorting T a1, T a2 empty T b T b2 15 T a T a2 T b1, T b2 empty
External Sorting Multiway Merge –k input devices instead of just 2 –e.g, k=3 for the previous example T a ; ; ; ; 15 T a2 T a3 T b1, T b2, T b3 empty
External Sorting T a1, T a2, T a3 empty T b ; T b ; 15 T b T a T a T a3 T b1, T b2, T b3 empty
External Sorting T a1, T a2, T a3 empty T b T b2, T b3 empty –it will require log k (N/M) passes, plus the initial run-constructing pass –for N=10*10 6, M=4*10 6 /128, # of passes = log 5 (10*128/4) + 1= 5 Skip rest of Chapter 7