Lecture 6 : External Sorting Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University
2 External Sorting Sorting algorithm that can handle massive amounts of data (using external memory) Required when data does not fit into main memory out-of-core algorithm vs in-core algorithm
Motivation Sometimes the data to sort are too large to fit in memory (Why not virtual memory?) Use external memory (disk) Disk performance seek time (major factor) rotational latency Transfer Primary rule for disk access Minimize the number of disk accesses Assume external(secondary) memory is divided into equal sized blocks (ex. 1KB, 4KB, …) Block : unit where data is stored and retrived
External Merge Sort : Idea EX) sorting 900MB of data using only 100MB of RAM: Read 100 MB of the data in main memory and sort by some conventional method (usually quicksort). Write the sorted data to disk. Repeat steps 1 and 2 until all of the data is sorted in 100 MB chunks, which now need to be merged into one single output file. Read the first 10 MB of each sorted chunk (call them input buffers) in main memory (90 MB total) and allocate the remaining 10 MB for output buffer. Perform a 9-way merging and store the result in the output buffer. If the output buffer is full, write it to the final sorted file. If any of the 9 input buffers gets empty, fill it with the next 10 MB of its associated 100 MB sorted chunk or otherwise mark it as exhausted if there is no more data in the sorted chunk and do not use it for merging.
2-way merge sort R1R2R3R4R5R6R7R8R9R10R11R12R13R14R15R16R17R18R19R20 S1S2S3S4S5S6S7S8S9 S10 T1T2T3 T4T5 U1 U2 U3 V1 V2 W1 # of passes : 5
5-way merge sort R1R2R3R4R5R6R7R8R9R10R11R12R13R14R15R16R17R18R19R20 T1 S1S2S3S4 we can reduce # of passes