Download presentation
Presentation is loading. Please wait.
Published byKristopher Garrett Modified over 9 years ago
1
zA classic problem in computer science! zData requested in sorted order ye.g., find students in increasing cap order zSorting is used in many applications yFirst step in bulk loading operations. ySorting useful for eliminating duplicate copies in a collection of records (How?) y Sort-merge join algorithm involves sorting. zProblem: sort 1Gb of data with 1Mb of RAM. External Sorting
2
2-Way Sort: Requires 3 Buffers zPass 1: Read a page, sort it, write it. yonly one buffer page is used zPass 2, 3, …, etc.: y three buffer pages used. Main memory buffers INPUT 1 INPUT 2 OUTPUT Disk
3
Two-Way External Merge Sort zEach pass we read + write each page in file. zN pages in the file => the number of passes zSo total cost is: z Idea: Divide and conquer: sort subfiles and merge Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 9 3,4 6,2 9,48,75,63,1 2 3,45,62,64,97,8 1,32 2,3 4,6 4,7 8,9 1,3 5,62 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5 6,6 7,8
4
General External Merge Sort zTo sort a file with N pages using B buffer pages: yPass 0: use B buffer pages. Produce sorted runs of B pages each. yPass 2, …, etc.: merge B-1 runs. B Main memory buffers INPUT 1 INPUT B-1 OUTPUT Disk INPUT 2... More than 3 buffer pages. How can we utilize them?
5
Cost of External Merge Sort zNumber of passes: zCost = 2N * (# of passes) zE.g., with 5 buffer pages, to sort 108 page file: yPass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages) yPass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages) yPass 2: 2 sorted runs, 80 pages and 28 pages yPass 3: Sorted file of 108 pages
6
Number of Passes of External Sort
7
Sequential vs Random I/Os zIs minimizing passes optimal ? Would merging as many runs as possible the best solution? zSuppose we have 80 runs, each 80 pages long and we have 80 pages of buffer space. zWe can merge all 80 runs in a single pass yeach page requires a seek to access (Why?) ythere are 80 pages per run, so 80 seeks per run ytotal cost = 80 runs X 80 seeks = 6,400 seeks
8
Sequential vs Random I/Os (Cont) zWe can merge all 80 runs in two steps y5 sets of 16 runs each xread 80/16=5 pages of one run x16 runs result in sorted run of 1280 pages xeach merge requires 80/5X16 = 256 seeks xfor 5 sets, we have 5X256 = 1280 seeks ymerge 5 runs of 1280 pages xread 80/5=16 pages of one run => 1280/16=80 seeks in total x5 runs => 5X80 = 400 seeks ytotal: 1280+400=1680 seeks!!! zNumber of passes increases, but number of seeks decreases!
9
Using B+ Trees for Sorting zScenario: Table to be sorted has B + tree index on sorting column(s). zIdea: Can retrieve records in order by traversing leaf pages. z Is this a good idea? zCases to consider: yB+ tree is clustered -- Good idea! yB+ tree is not clustered -- Could be a very bad idea!
10
Clustered B + Tree Used for Sorting zCost: root to the left-most leaf, then retrieve all leaf pages ( pair organization) zIf pair organization is used? Additional cost of retrieving data records: each page fetched just once. * Always better than external sorting! (Directs search) Data Records Index Data Entries ("Sequence set")
11
Unclustered B+ Tree Used for Sorting zeach data entry contains of a data record. In general, one I/O per data record! (Directs search) Data Records Index Data Entries ("Sequence set")
12
Summary zExternal sorting is important; DBMS may dedicate part of buffer pool for sorting! zExternal merge sort minimizes disk I/O cost: yPass 0: Produces sorted runs of size B (# buffer pages). Later passes: merge runs. y# of runs merged at a time depends on B, and block size. yLarger block size means less I/O cost per page. yLarger block size means smaller # runs merged. yIn practice, # of runs rarely more than 2 or 3. zClustered B+ tree is good for sorting; unclustered tree is usually very bad.
13
Database Tuning Database Tuning is the activity of making a database application run more quickly. “More quickly” usually means higher throughput, though it may mean lower response time for time-critical applications.
14
1 - Basic Principles14 Tuning Principles zThink globally, fix locally zPartitioning breaks bottlenecks (temporal and spatial) zStart-up costs are high; running costs are low zRender onto server what is due onto Server zBe prepared for trade-offs (indexes and inserts)
15
Tuning Mindset zSet reasonable performance tuning goals zMeasure and document current performance zIdentify current system performance bottleneck zIdentify current OS bottleneck zTune the required components eg: application, DB, I/O, contention, OS etc zTrack and exercise change-control procedures zMeasure and document current performance zRepeat step 3 through 7 until the goal is met
16
1 - Basic Principles16 Goals Met? zAppreciation of DBMS architecture zStudy the effect of various components on the performance of the systems zTuning principle zTroubleshooting techniques for chasing down performance problems zHands-on experience in Tuning
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.