Presentation is loading. Please wait.

Presentation is loading. Please wait.

Putting things in order

Similar presentations


Presentation on theme: "Putting things in order"— Presentation transcript:

1 Putting things in order
Sorting Putting things in order This is the big picture version, it includes sort_i and sort_e and would be suitable for data structures. Copyright © Curt Hill

2 Sorting is Important One of the largest consumers of resources
Estimated that in the 1960s 1/3 of all CPU power was used on sorts One sort was recorded to take months Sorts are expensive Still consume cycles or accesses Too many things do not work without them Entire grad level classes on just sorting Copyright © Curt Hill

3 Definitions Internal sorting External sorting Merge
All the data is in memory – most often in an array No I/Os are required External sorting The data will not fit in memory Usually the I/O costs exceed the CPU costs Merge Start with two (or more) sorted files Produce one sorted result Copyright © Curt Hill

4 Keys Often more than one Primary key Secondary key Most important
The DB notion is different than the sort one Secondary key Discriminate between two records where primary key is the same There may be multiple secondaries Ordered by importance Copyright © Curt Hill

5 The Sort Merge Generalized program that will sort any file
Standard program on mainframes, but less common on other platforms User specifies the sort and file characteristics The program then creates the sorted file Copyright © Curt Hill

6 Sort Characteristics Primary and secondary keys Exits Position Length
Type Ascending or descending Exits Points in the algorithm where a user written program could be invoked Usually to filter/reformat the data Copyright © Curt Hill

7 The Sort Merge Did as much as it could using internal sorting
Usually had to resort to file merging for files that were too large to fit in memory Copyright © Curt Hill

8 Internal Sorting Data fits in memory Typically an array or table
Examined first Sort_I.ppt Copyright © Curt Hill

9 External Sorting Usual situation with a DBMS
Occurs elsewhere as well Too many pages for a file to fit in memory The internal sorts mostly require random access to individual records While we have random access to pages Want to minimize the number of pages sort_e.ppt Copyright © Curt Hill

10 Parallelism With multiple CPUs some gain in sorting may occur
Each CPU may do part of a sort in parallel Only the final merge needs a single CPU to properly handle Copyright © Curt Hill

11 Using a BTree Suppose that the file in question has a BTree index on the field in question Should that be used instead of sorting? The answer depends on whether this is a clustered or unclustered index Copyright © Curt Hill

12 Well? Clustered Unclustered Traverse root to initial value
Ride the leaves as long as needed This eliminates the need for a sort, the leaves are already sorted Unclustered We will have to use index for each record Generally a sequential scan that is sorted will be faster for normal index sizes Copyright © Curt Hill


Download ppt "Putting things in order"

Similar presentations


Ads by Google