Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.

Similar presentations


Presentation on theme: "CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al."— Presentation transcript:

1 CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.

2 CPSC 404, Laks V.S. Lakshmanan2 Motivation v A classic problem in computer science! v Data requested in sorted order – e.g., find students in increasing gpa order v Sorting –is first step in bulk loading (i.e., creating) B+ tree index. –useful also for eliminating duplicate copies in a collection of records & for aggregation (Why?) –(later) for efficient joins of relations, i.e., Sort-merge algo. v Problem: sort 1Gb of data with 50Mb of RAM.

3 CPSC 404, Laks V.S. Lakshmanan3 I/O Computation Model v disk I/O (read/write a block) very expensive compared to processing it once it’s in memory. v random block accesses very common. v reasonable model for computation using secondary storage  count only disk I/O’s.

4 CPSC 404, Laks V.S. Lakshmanan4 Desiderata for good DBMS algorithms v overlap I/O and cpu processing as much as possible. v use as much data from each block read as possible; depends on record clustering. v cluster blocks that are accessed together in consecutive pages. v buffer frequently accessed blocks in memory.

5 CPSC 404, Laks V.S. Lakshmanan5 Sorting Example Setup: v 10M records of 100 bytes = 1Gb file. –Stored on Megatron 747 disk, with 4Kb blocks, each holding 40 records + header information. –Entire file takes 250,000 blocks = 1000 cylinders. v 50M available main memory = 12,800 blocks = 1/20th of file. v Sort by primary key field.

6 CPSC 404, Laks V.S. Lakshmanan6 Merge Sort v Common main­memory sorting algorithms don't optimize disk I/O's. Variants of Merge Sort do better. v Merge = take two sorted lists and repeatedly choose the smaller of the ``heads'' of the lists (head = first of the unchosen). v Example: merge 1,3,4,8 with 2,5,7,9 = 1,2,3,4,5,7,8,9. v Merge Sort based on recursive algorithm: divide records into two parts; recursively mergesort the parts, and merge the resulting lists.

7 CPSC 404, Laks V.S. Lakshmanan7 Two­Phase, Multiway Merge Sort v Merge Sort still not very good in disk I/O model. v log 2 n passes, so each record is read/written from disk log 2 n times. v 2PMMS: 2 reads + 2 writes per block. Phase 1 v 1. Fill main memory with records. v 2. Sort using favorite main­memory sort. v 3. Write sorted sublist to disk. v 4. Repeat until all records have been put into one of the sorted lists.

8 CPSC 404, Laks V.S. Lakshmanan8 2PPMS (contd.) Phase 2 v Use one buffer for each of the sorted sublists and one buffer for an output block. Pointers to first unchosen Select the smallest unchosen Output buffer

9 CPSC 404, Laks V.S. Lakshmanan9 2PPMS (contd.) v Initially load input buffers with the first blocks of their respective sorted lists. v Repeatedly run a competition among the first unchosen records of each of the buffered blocks. –Move the record with the least key to the output block; it is now ``chosen.'' v Manage the buffers as needed: –If an input block is exhausted, get the next block from the same list (i.e., file). –If the output block is full, write it to disk.

10 CPSC 404, Laks V.S. Lakshmanan10 Analysis of Naive Implementation v Assume blocks are stored at random, so average access time is about 15 ms. v File stored on 250,000 blocks, read and written once in each phase. v 1,000,000 disk I/O's * 0.015 seconds = 15,000 seconds = 250 minutes = 4+ hours. Problem and Interlude: How many records can you sort with 2PMMS, under our assumptions about records, main memory, and the disk? What would you do if there were more?

11 CPSC 404, Laks V.S. Lakshmanan11 Improving the Running Time of 2PMMS Here are some techniques that sometimes make secondary­memory algorithms more efficient: 1. Group blocks by cylinder.  2. One big disk  several smaller disks. 3. Mirror disks = multiple copies of the same data. 4. ``Prefetching'' or ``double buffering.''  5. Disk scheduling; the ``elevator'' algorithm.

12 CPSC 404, Laks V.S. Lakshmanan12 Cylindrification If we are going to read or write blocks in a known order, place them by cylinder, so once we reach that cylinder we can read block after block, with no seek time or rotational latency. Application to Phase 1 of 2PMMS: 1. Initially, records on 1000 consecutive cylinders. 2. Load main memory from 50 consecutive cylinders. v Order of blocks read is unimportant, so only time besides transfer time is one random seek and 49 1­cylinder seeks (neglect). v Time to transfer 12,800 blocks at 0.5 ms/block = 6.4 sec. 3. Write each sorted list onto 50 consecutive cylinders, so write time is also about 6.4 sec. But in Phase 2: Blocks are read from the 20 sublists in a data­ dependent order, so we cannot take much advantage of cylinders.

13 CPSC 404, Laks V.S. Lakshmanan13 Prefetching If we have extra space for main­memory buffers, consider loading buffers in advance of need. Similarly, keep output blocks buffered in main memory until it is convenient to write them to disk. Example: Phase 2 of 2PMMS With 50Mb of main memory, we can afford to buffer two cylinders for each sublist and for the output (one cylinder = 1Mb). v Consume one cylinder for each sublist while the other is being loaded from main memory. v Similarly, write one output cylinder while the other is being constructed. v Thus, seek and rotational latency are almost eliminated, lowering total read and write times to about two minutes each.

14 CPSC 404, Laks V.S. Lakshmanan14 Using B+ Trees for Sorting v Scenario: Table to be sorted has B+ tree index on sorting column(s). v Idea: Can retrieve records in order by traversing leaf pages. v Is this a good idea? v Cases to consider: – B+ tree is clustered Good idea! – B+ tree is not clustered Could be a very bad idea!

15 CPSC 404, Laks V.S. Lakshmanan15 Clustered B+ Tree Used for Sorting v Cost: root to the left- most leaf, then retrieve all leaf pages (Alternative 1) v If Alternative 2 is used? Additional cost of retrieving data records: each page fetched just once. (Direct search) Data Records Index Data Entries ("Sequence set")

16 CPSC 404, Laks V.S. Lakshmanan16 Unclustered B+ Tree Used for Sorting v Alternative (2) for data entries; each data entry contains rid of a data record. In general, one I/O per data record! (Directs search) Data Records Index Data Entries ("Sequence set")

17 CPSC 404, Laks V.S. Lakshmanan17 Sorting from Raw Data – Merge Sort  Idea: given B buffer pages of main memory –( Sort phase) read in B pages of data each time and sort internally –suppose entire data stored in m * B pages –sort phase produces m sorted runs of B pages each –( Merge phase) repeatedly merge two sorted runs  (Pass 0: the sort phase)  Pass 1 produces m/2 sorted runs of 2B pages each  Pass 2 produces m/4 sorted runs of 4B pages each  continue until one sorted run of m * B pages produced  2-way merge can be optimized to (B-1) -way merge (see Ch. 11.2)

18 CPSC 404, Laks V.S. Lakshmanan18 Conclusion: Sorting Records! v Sorting has become a blood sport! – Parallel sorting is the name of the game... v Datamation: Sort 1M records of size 100 bytes – Typical DBMS: 15 minutes – World record: 3.5 seconds –12-CPU SGI machine, 96 disks, 2GB of RAM v New benchmarks proposed: – Minute Sort: How many can you sort in 1 minute? – Dollar Sort: How many can you sort for $1.00?


Download ppt "CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al."

Similar presentations


Ads by Google