Improve Run Generation

Improve Run Generation
Overlap input,output, and internal CPU work. Reduce the number of runs (equivalently, increase average run length). DISK MEMORY

Internal Quick Sort 6 2 8 5 11 10 4 1 9 7 3 Use 6 as the pivot (median of 3). Input first, middle, and last blocks first. In-place partitioning. 4 2 3 5 1 6 10 11 9 7 8 In quick sort we can’t start the in-place partitioning until all keys are known because (a) selection of pivot using median of 3 rule (say) requires knowledge of first, middle, and last keys. Overcome this problem by reading first block and then the middle block, and then the next block. Compares in the in-place scheme are done from the extremities of the array to the middle. To facilitate this, we can read the blocks into memory in this fashion. We can begin output as soon as the leftmost block is sorted. Once a block is output, we can get its replacement block for next round. However, we need all records before placement of pivot is determined. Many of the subsequent recursive calls need to be done w/o any I/O taking place. When leftmost block is sorted we can begin to output and replace with block for next run. Cumbersome scheme with limited overlap of I/O/CPU. In a merge sort, we could reorganize the merges so as to work with the merges of segments already input. However, we cannot generate the first block of the output until all input blocks have been read. In neither case can we get runs that are larger than memory capacity. Input blocks from the ends toward the middle. Sort left and right groups recursively. Can begin output as soon as left most block is ready.

Alternative Internal Sort Scheme
Partition into 3 areas, each may be more than 1 block in size. A1 A2 A3 DISK Alternatively, read in 1/3 memory load. Then sort while simultaneously reading in next load into 1/3 memory. In steady state output 1/3-memory sorted sequence, sort 1/3-memory load, read into remaining 1/3-memory. Run size is reduced to 1/3-memory capacity. Effective overlap though. Note A1 etc may be more than 1 block in length. DISK

Steady State Operation
Read from disk Write to disk Run generation Read/write multiple blocks. Synchronization is done when the current internal sort terminates.

New Strategy Use 2 input and 2 output buffers.
DISK MEMORY Output 0 Output 1 Loser Tree Input 0 Input 1 Buffers are 1 block long. Actually, 3 buffers suffice as the output buffer fills at the same rate as the input buffer empties. At any time, one buffer is the input buffer that is being filled from disk, another the output buffer that is being written to disk, and the third is used to feed the loser tree and also is filled from the loser tree. Use 2 input and 2 output buffers. Rest of memory is used for a min loser tree. Actually, 3 buffers adequate.

Steady State Operation
Read from disk Write to disk Run generation Read/write single block. Synchronization is done when the active input buffer gets empty (the active output buffer will be full at this time).

Initialize O0 O1 3 4 8 4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8 I0 I1 Fill From Disk

Initialize O0 O1 3 6 1 4 8 5 7 4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8 I0 I1 Fill From Disk

Initialize O0 O1 1 3 6 3 2 4 8 5 7 6 9 4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8 I0 I1 Fill From Disk

Initialize O0 O1 1 3 2 6 3 4 2 4 8 5 7 6 9 5 8 4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8 I0 I1 Fill From Disk

Generate Run 1 1 4 3 6 8 5 7 2 9 O0 O1 I0 I1 3 5 4 Fill From Tree
Fill From Disk

Generate Run 1 Fill From Tree O0 1 O1 3 2 3 2 6 3 4 5 4 8 5 7 6 9 5 8 4 3 6 8 3 5 7 3 2 6 9 4 5 2 5 8 I0 3 5 4 I1 Fill From Disk

Generate Run 1 Fill From Tree O0 1 O1 2 3 3 4 2 6 3 5 4 5 4 8 5 7 6 9 5 8 4 3 6 8 3 5 7 3 5 6 9 4 5 2 5 8 I0 3 5 4 I1 Fill From Disk

Interchange Role Of Buffers
Generate Run 1 Fill From Tree O0 1 O1 2 3 2 3 4 2 6 3 5 4 5 4 8 5 7 6 9 5 8 4 3 6 8 3 5 7 3 5 6 9 4 5 4 5 8 I0 3 5 4 I1 Interchange Role Of Buffers Fill From Disk

Write To Disk Fill From Tree O0 1 O1 2 3 2 3 4 2 6 3 5 4 5 4 8 5 7 6 9 5 8 4 3 6 8 3 5 7 3 5 6 9 4 5 4 5 8 1 9 2 I0 I1 Fill From Disk

Continue With Run 1 Write To Disk Fill From Tree O0 1 O1 2 4 3 2 3 4 6 3 5 4 5 4 8 5 7 6 9 5 8 4 3 6 8 3 5 7 3 5 6 9 4 5 4 5 8 1 9 2 I0 I1 Fill From Disk

Continue With Run 1 Write To Disk Fill From Tree O0 1 O1 3 2 4 2 3 4 6 5 3 5 4 5 4 8 1 5 7 6 9 5 8 Blue bold numbers denote members of run 2. 4 3 6 8 5 4 1 5 7 3 6 9 4 5 5 8 1 9 2 I0 I1 Fill From Disk

Continue With Run 1 Write To Disk Fill From Tree O0 1 O1 3 2 3 4 2 5 3 4 6 5 7 3 5 4 5 4 8 1 5 9 7 6 9 5 8 Blue bold numbers denote members of run 2. 4 3 6 8 1 5 7 9 5 6 9 4 5 4 5 8 1 9 2 I0 I1 Fill From Disk

Continue With Run 1 Write To Disk Fill From Tree O0 1 O1 3 2 3 4 2 3 5 3 4 6 7 5 3 5 4 5 4 8 1 5 9 7 6 9 5 8 Blue bold numbers denote members of run 2. 4 2 6 8 5 7 9 5 1 6 9 4 5 4 5 8 1 9 2 I0 I1 Interchange Role Of Buffers Fill From Disk

Fill From Tree Interchange Role Of Buffers Write To Disk O0 O1 3 3 4 3 5 3 4 6 5 7 3 5 4 5 4 8 1 5 9 7 6 9 5 8 Blue bold numbers denote members of run 2. 4 2 6 8 9 5 4 1 5 7 6 9 4 5 5 8 6 1 3 I0 I1 Fill From Disk

Continue With Run 1 Fill From Tree Write To Disk O0 O1 3 3 4 3 5 3 4 6 5 7 3 5 4 5 2 4 8 1 5 9 7 6 9 5 8 Blue bold numbers denote members of run 2. 4 2 6 8 9 5 1 5 7 6 9 4 5 4 5 8 6 1 3 I0 I1 Fill From Disk

Continue With Run 1 Fill From Tree Write To Disk O0 4 O1 3 3 5 4 3 6 5 3 4 6 5 7 3 5 4 5 2 4 8 1 5 9 7 6 9 5 8 6 2 6 8 5 7 9 5 1 6 9 4 5 4 5 8 6 1 3 I0 I1 Fill From Disk

Continue With Run 1 Fill From Tree Write To Disk O0 4 O1 3 4 3 5 4 3 6 5 3 5 4 6 5 7 3 5 9 4 5 2 4 8 1 5 9 7 6 1 9 5 8 6 2 6 8 5 7 9 5 1 6 9 1 5 4 5 8 6 1 3 I0 I1 Fill From Disk

RUN SIZE Let k be number of external nodes in the loser tree.
Run size >= k. Sorted input => 1 run. Reverse of sorted input => n/k runs. Average run size is ~2k.

Comparison Memory capacity = m records.
Run size using fill memory, sort, and output run scheme = m. Use loser tree scheme. Assume block size is b records. Need memory for 4 buffers (4b records). Loser tree k = m – 4b. Average run size = 2k = 2(m – 4b). 2k >= m when m >= 8b. M >= 6b when only 3 buffers are used.

Comparison Assume b = 100. m 600 1000 5000 10000 k 200 4600 9600 2k
400 1200 9200 19200

Comparison Total internal processing time using fill memory, sort, and output run scheme = O((n/m) m log m) = O(n log m). Total internal processing time using loser tree = O(n log k). Loser tree scheme generates runs that differ in their lengths. Note that k is O(m). Comparison with min heap. Same asymptotic complexity and same average run length. Constant factors in run times will be different. If input file is already sorted (or nearly sorted), then replacing current min element with next element from input buffer will move from heap root to leaf. At each level, 2 compares are done, for a total of 2log n compares. A loser tree inserts new element with log n compares. Min heaps wins out only when we move half way down the heap (on average) when replacing current min with new element. In both min heap and loser tree, we can do away with external nodes when the records are small. For large records we maintain the external nodes to avoid moving records too many times.

Merging Runs Of Different Length
4 3 6 9 22 4 3 6 9 22 13 7 15 7 Cost = 42 Cost = 44 Best merge sequence?

Improve Run Generation

Similar presentations

Presentation on theme: "Improve Run Generation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Improve Run Generation

Similar presentations

Presentation on theme: "Improve Run Generation"— Presentation transcript:

Similar presentations

About project

Feedback