Download presentation
Presentation is loading. Please wait.
Published byKristin Phelps Modified over 9 years ago
1
Sorting by the Numbers Sorting Part Four
2
Question Suppose you are given the task of writing an application to sort a big data file. What do you need to know to pick a good solution? File Size = 1 GB Record Size = 250 Bytes Available Memory = ¼ GB
3
How many Runs? How big is each Run? Total Records to Process 1 billion bytes in the file 250 bytes for each record = 4 million records in the file Run Size 1GB file ¼ GB memory = 4 Runs of 1 million records each
4
Time to Create the Runs Sorting One Run Using either Quicksort or Ordered Binary Tree N log 2 N 1million * 20 approximately 20 million comparisons of internal memory locations Sorting Four Runs 80 million internal memory comparisons
5
Refresher on Merging Files So, to merge 2 files of N random records each, requires 2N compares And, to merge 2 files where the runs were built from a sorted file requires N compares File One 1 3 5 7 9 File Two 2 4 6 8 10 File One 1 2 3 4 5 File Two 6 7 8 9 10
6
Merging the Four Files R1R2 T2 R3 T1 R4R1R2 T2 R3T1 R4 2 million compares 4 million compares 3 million compares 2 million compares 4 million compares
7
Total Processing Time Time to Create the 4 Runs 80 million comparisons Time to Merge the 4 Runs 8 million comparisons Assuming a File Read takes just 100 times longer than a Memory Read Total Time = 880 million time units note, we have omitted the time to read the runs into memory and to write the runs to temp files
8
Second Example 2 Runs of 2 Million Records each 2 Runs of 2 Million Records each Internal Sorting N log2 N = 2million * 24 = 48 million compares 96 million to create both runs File Merging 4 million compares Total Time 496 million time units 496 million time units
9
Next in this course So how much time does it take to access the disk?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.