제 7 장 Cosequential Processing and the Sorting of Large Files
Cosequential Operations Coordinated processing of two or more sequential lists to produce a single output list Kinds of Operations merging, union matching intersection combination of above 2019-04-16 K.O. Lee
Matching Operation Output the names common to the two lists Matching or an intersection Four step 1. Initializing 2. Synchronizing 3. Handling end-of-file conditions 4. Recognizing errors 2019-04-16 K.O. Lee
Matching Operation (2) Algorithm p261 Figure 7.2 three-way conditional statement if NAME_1 < NAME_2 read the next from LIST_1 if NAME_1 > NAME_2 read the next from LIST_2 else output the name read the next from both list 2019-04-16 K.O. Lee
Matching Operation (3) Key of algorithm End-of-file condition always return to the head of the main loop End-of-file condition test MORE_NAMES_EXIST flag until either of two list reaches end-of-file 2019-04-16 K.O. Lee
Merging Two Lists Based on matching operation p264 Figure 7.5 Difference must read each of the lists completely change MORE_NAMES_EXIST behavior HIGH_VALUE comes after all legal input values in the file’s ordered sequence 2019-04-16 K.O. Lee
Cosequential Processing Model Assumptions Two or more input files are processed in a parallel fashion Each file is sorted Comments Output may be the same as one of the input files Not necessary that all files have the same record structures 2019-04-16 K.O. Lee
Cosequential Processing Model (2) Assumptions must exist a high key and a low key value records are in logical sorted order Comments not necessary, but decreases complexity physical ordering can have a large impact on processing 2019-04-16 K.O. Lee
Cosequential Processing Model (3) Assumptions for each file, only one current record records should be manipulated only in internal memory Comments not prohibits looking ahead or looking back, but such operations should be restricted to subprocedures cannot alter a record 2019-04-16 K.O. Lee
Cosequential Processing Model (4) Components Initialization read from the first record in the files Synchronization loop as long as relevant records remain Selection in main synchronization loop Use high values as end-of-file condition no special code to deal with end-of-file 2019-04-16 K.O. Lee
Cosequential Processing Model (5) Components - cont’d I/O and error detection are to be relegated to subprocesses hide details Simple and robust Example: General Ledger Program pp. 268~276 2019-04-16 K.O. Lee
Multiway Merging K-way merge merge K input lists to create a single, ordered output list p277 Figure 7.16 less then 8 or so 2019-04-16 K.O. Lee
Multiway Merging (2) Selection Tree K-way merge set of comparisons becomes expensive time vs space trade-off a kind of tournament tree each higher-level node represents the winner of the two descendent keys the depth of tree is log2 K 2019-04-16 K.O. Lee
Selection Tree 2019-04-16 K.O. Lee
Sorting in RAM Can we improve on the time of RAM sort? Heapsort perform some of parts in parallel selection tree is good but cannot used to sort entire file Heapsort sorting and reading can occur in parallel keeping all of the keys in heap 2019-04-16 K.O. Lee
Heapsort Heap Processing overlap with I/O 자식 노드는 부모노드보다 크거나 같다. 노드 i의 자식 노드는 2i와 2i+1 Fig 7.20, Fig 7.21 Processing overlap with I/O use more than one buffer p284 Figure 7.22 fill buffer while building heap Procedure for outputting : Fig 7.23 2019-04-16 K.O. Lee
Sorting Large Files on Disk Keysort shortcomings cost of seeking cannot sort really large file all key/pointer pairs in RAM Multiway merge algorithm run: sorted subfile 2019-04-16 K.O. Lee
Sorting Large Files on Disk (2) 2019-04-16 K.O. Lee
Sorting Large Files on Disk (3) Multiway merging can be extended to files of any size reading during the run creation step no seeking due to sequential reading reading and writing during merging sequential I/O overlap using heapsorting tape can be used 2019-04-16 K.O. Lee
How Much Time Does a Merge Sort Take? Merge Sort vs Key Sort pp. 287~290 (10분대 5시간) 4 Steps reading records and forming runs writing sorted runs reading sorted runs for merging writing sorted file 2019-04-16 K.O. Lee
Sorting a Very Large File Kinds of I/O sort phase sequential if using heapsort no improvement merge phase random access(run의 개수에 비례) Ways to improve performance cut down the number of random access in the merge phase 2019-04-16 K.O. Lee
Cost of Increasing the File Size For a K-way merge of K runs, the buffer size for each of the runs 1/K * size of RAM = 1/K * size of each run merge operation requires K2 seeks Merge sort is O(K2) operation 2019-04-16 K.O. Lee
Cost of Increasing the File Size (2) Ways to reduce time more hardware merge more than one step reducing the order of each merge increasing the buffer size for each run Increase the length of the initial sorted runs Overlap I/O operations 2019-04-16 K.O. Lee
Hardware-based Improvements Possible configuration increasing the amount of RAM increasing the number of disk drives increasing the number of I/O channels 2019-04-16 K.O. Lee
Multiple-Step Merging Break the original set of runs into small groups and merge the runs in these groups separately Fewer seeks, but extra transmission time in second pass Read every record twice to form the intermediate runs and to form the final sorted file 2019-04-16 K.O. Lee
Multiple-Step Merging (2) Essence of multiple-step merging increase the available buffer space for run extra pass vs random access decreasing More than two steps? reduced seek and rotational times vs transmission times 2019-04-16 K.O. Lee
Increasing Run Lengths A longer initial run fewer total runs bigger buffers fewer seeks Replacement selection 2019-04-16 K.O. Lee
Replacement Selection Idea aways select the key from memory that has the lowest value output the key replacing it with a new key from the input list Implementation: p299 p300 Figure 7.27 2019-04-16 K.O. Lee
Replacement Seletion (2) What about a key arriving in memory too late to be output into its proper position? use of second heap p301 Figure 7.28 2019-04-16 K.O. Lee
Replacement Selection (4) Two questions Given P locations in memory, how long a run can we expect replacement selection to produce, on the average? pp. 301~302 What are the costs of using replacement selection? pp. 303~304 less than 1/3 as many seeks as RAM sorting 2019-04-16 K.O. Lee