CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Professor Elke A. Rundensteiner.

Slides:



Advertisements
Similar presentations
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Advertisements

Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting “There it was, hidden in alphabetical order.” Rita Holt R&G Chapter 13.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 11 External Sorting.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.
External Sorting R & G Chapter 13 One of the advantages of being
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
External Sorting R & G Chapter 11 One of the advantages of being disorderly is that one is constantly making exciting discoveries. A. A. Milne.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting 198:541. Why Sort?  A classic problem in computer science!  Data requested in sorted order e.g., find students in increasing gpa order.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Using the Disk, and Disk Optimizations Professor Elke A. Rundensteiner.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
Lecture 11: DMBS Internals
BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
Sorting.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13 (Sec ): Ramakrishnan & Gehrke and Chapter 11 (Sec ): G-M et al. (R2) OR.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
1 Database Systems ( 資料庫系統 ) December 7, 2011 Lecture #11.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
ZA classic problem in computer science! zData requested in sorted order ye.g., find students in increasing cap order zSorting is used in many applications.
B+ Trees: An IO-Aware Index Structure Lecture 13.
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
CPSC Why do we need Sorting? 2.Complexities of few sorting algorithms ? 3.2-Way Sort 1.2-way external merge sort 2.Cost associated with external.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
External Sorting Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
Lecture 16: Data Storage Wednesday, November 6, 2006.
External Sorting Chapter 13
External Sorting Chapter 13 (Sec ): Ramakrishnan & Gehrke and
Lecture 11: DMBS Internals
Database Management Systems (CS 564)
Lecture#12: External Sorting (R&G, Ch13)
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
External Sorting Chapter 13
Selected Topics: External Sorting, Join Algorithms, …
CS222P: Principles of Data Management UCI, Fall 2018 Notes #09 External Sorting Instructor: Chen Li.
CS222: Principles of Data Management Lecture #10 External Sorting
External Sorting.
CS222P: Principles of Data Management Lecture #10 External Sorting
Database Systems (資料庫系統)
External Sorting Chapter 13
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #09 External Sorting Instructor: Chen Li.
CS 245: Database System Principles Notes 02: Hardware
External Sorting Dina Said
Presentation transcript:

CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Professor Elke A. Rundensteiner

CS 4432lecture #32 Quick Logistics

CS 4432lecture #33 Where have I been ? (at EDBT’04 in sunny Greece) backlog …

CS 4432lecture #34 Use MyWPI for general questions Use for specific questions to me only; but make sure to have “CS4432” in BS/MS Credit for this class (talk to me) MQPs in DB for next year (talk to me) I’ll stay after class today to address anything that needs immediate attention.

CS 4432lecture #35 Lecture #3 Still on Chapter 2 (in textbook) Today : On Disk Optimizations Next Week: On Storage Layout

CS 4432lecture #36 Thus far : Hardware: Disks Architecture: Layers of Access Access Times and Abstractions Example - Megatron 747 from textbook

CS 4432lecture #37 TODAY : Using secondary storage effectively (Sec. 2.3) SKIP part of chapter 2 : Disk Failure Issues

CS 4432lecture #38 One Simple Idea : Prefetching Problem: Have a File » Sequence of Blocks B1, B2 Have a Program » Process B1 » Process B2 » Process B3...

CS 4432lecture #39 Single Buffer Solution (1) Read B1  Buffer (2) Process Data in Buffer (3) Read B2  Buffer (4) Process Data in Buffer...

CS 4432lecture #310 SayP = time to process/block R = time to read in 1 block n = # blocks Single buffer time = n(P+R)

CS 4432lecture #311 Question: Could the DBMS know something about behavior of such future block accesses ? What if: If we knew more about sequence of future block accesses, what and how could we do better ?

CS 4432lecture #312 Idea : Double Buffering/Prefetching Memory: Disk: ABCDGEFA B done process A C B done

CS 4432lecture #313 Say P  R What is processing time now? P = Processing time/block R = IO time/block n = # blocks Double buffering time = ?

CS 4432lecture #314 Say P  R P = Processing time/block R = IO time/block n = # blocks Double buffering time = R + nP Single buffering time = n(R+P)

CS 4432lecture #315 Block Size Selection? Question : Do we want Small or Big Block Sizes ? Pros ? Cons ?

CS 4432lecture #316 Block Size Selection? Big Block  Amortize I/O Cost –For seek and rotational delays are reduced … Big Block  Read in more useless stuff! and takes longer to read Unfortunately...

CS 4432lecture #317 Trend As memory prices drop, blocks get bigger... Trend

CS 4432lecture #318 Using secondary storage effectively Example: Sorting data on disk General Wisdom : –I/O costs dominate –Design algorithms to reduce I/O

Disk IO Model Of Comptations  Efficient Use of Disk Example: Sort Task

CS 4432lecture #320 “Good” DBMS Algorithms Try to make sure if we read a block, we use much of data on that block Try to put blocks together that are accessed together Try to buffer commonly used blocks in main memory

Why Sort Example ? A classic problem in computer science! Data requested in sorted order – e.g., find students in increasing gpa order Sorting is first step in bulk loading B+ tree index. Sorting useful for eliminating duplicate copies in a collection of records (Why?) Sort-merge join algorithm involves sorting. Problem: sort 1Gb of data with 1Mb of RAM. – why not virtual memory?

CS 4432lecture #322 Sorting Algorithms Any examples algorithms you know ?? Typically they are main-memory oriented They don’t look too good when you take disk I/Os into account ( why? )

CS 4432lecture #323 Merge Sort Merge : Merge two sorted lists and repeatedly choose the smaller of the two “heads” of the lists Merge Sort: Divide records into two parts; merge-sort those recursively, and then merge the lists.

2-Way Sort: Requires 3 Buffers Pass 1: Read a page, sort it, write it. – only one buffer page is used Pass 2, 3, …, etc.: – three buffer pages used. Main memory buffers INPUT 1 INPUT 2 OUTPUT Disk

Two-Way External Merge Sort Idea: Divide and conquer: sort subfiles and merge Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 9 3,4 6,2 9,48,75,63,1 2 3,45,62,64,97,8 1,32 2,3 4,6 4,7 8,9 1,3 5,62 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5 6,6 7,8

Two-Way External Merge Sort Costs for each pass? How many passes do we need ? What is the total cost for sorting? Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 9 3,4 6,2 9,48,75,63,1 2 3,45,62,64,97,8 1,32 2,3 4,6 4,7 8,9 1,3 5,62 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5 6,6 7,8

Two-Way External Merge Sort Each pass we read + write each page in file. = 2 * N N pages in file => number of passes: So total cost is: Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 9 3,4 6,2 9,48,75,63,1 2 3,45,62,64,97,8 1,32 2,3 4,6 4,7 8,9 1,3 5,62 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5 6,6 7,8

General External Merge Sort What if we had more buffer pages? How do we utilize them ?

General External Merge Sort B Main memory buffers INPUT ? OUTPUT? Disk INPUT ?... To sort file with N pages using B buffer pages?

General External Merge Sort To sort file with N pages using B buffer pages Phase 1 (pass 0): – Fill memory with records – Sort using any favorite main-memory sort – Write sorted records to disk – Repeat above, until all records have been put into one sorted list B Main memory buffers INPUT 1 INPUT B Disk INPUT 2...

General External Merge Sort Phase 1 (pass 0): using B buffer pages – Produce what output ??? – Cost (in terms of I/Os) ??? B Main memory buffers INPUT 1 INPUT B Disk INPUT 2...

General External Merge Sort To sort file with N pages using B buffer pages: – Produce output: Sorted runs of B pages each Run Sizes: B pages each run. How many runs: [ N / B ] runs. – Cost : ? B Main memory buffers INPUT 1 INPUT B-1 OUTPUT Disk INPUT 2...

General External Merge Sort To sort file with N pages using B buffer pages: – Pass 0: use B buffer pages. – Produce output: Sorted runs of B pages each Run Sizes: B pages each run. How many runs: [ N / B ] runs. – Cost: 2 * N I/Os B Main memory buffers INPUT 1 INPUT B-1 OUTPUT Disk INPUT 2...

General External Merge Sort Sort N pages using B buffer pages: – Phase 1 (which is pass 0 ). Produce sorted runs of B pages each. – Phase 2 (may involve several passes 2, 3, etc.) Each pass merges B – 1 runs. B Main memory buffers INPUT 1 INPUT B-1 OUTPUT Disk INPUT 2...

CS 4432lecture #335 Phase 2 Initially load input buffers with the first blocks of respective sorted run Repeatedly run a competition among list unchosen records of each of buffered blocks –Move record with least key to output Manage buffers as needed: –If input block exhausted, get next block from file –If output block is full, write it to disk

General External Merge Sort Sort N pages using B buffer pages: – Phase 1 (which is pass 0 ). Produce sorted runs of B pages each. – Phase 2 (may involve several passes 2, 3, etc.) Number of passes ? Cost of each pass? B Main memory buffers INPUT 1 INPUT B-1 OUTPUT Disk INPUT 2...

Cost of External Merge Sort Number of passes: Cost = 2N * (# of passes) Total Cost : multiply above

Example Buffer : with 5 buffer pages, File to sort : 108 pages – Pass 0: Size of each run? Number of runs? – Pass 1: Size of each run? Number of runs? – Pass 2: ???

Example Buffer : with 5 buffer pages File to sort : 108 pages – Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages) – Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages) – Pass 2: 2 sorted runs, 80 pages and 28 pages – Pass 3: Sorted file of 108 pages Total I/O costs: ?

Example Buffer : with 5 buffer pages File to sort : 108 pages – Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages) – Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages) – Pass 2: 2 sorted runs, 80 pages and 28 pages – Pass 3: Sorted file of 108 pages Total I/O costs: 2*N ( 4 )

Number of Passes of External Sort

CS 4432lecture #342 How large a file can be sorted in 2 passes with a given buffer size M? ???

Double Buffering (Useful here) To reduce wait time for I/O request to complete, can prefetch into `shadow block’. – Potentially, more passes; in practice, most files still sorted in 2 or at most 3 passes. OUTPUT OUTPUT' Disk INPUT 1 INPUT k INPUT 2 INPUT 1' INPUT 2' INPUT k' block size b B main memory buffers, k-way merge

Sorting Summary External sorting is important; DBMS may dedicate part of buffer pool for sorting! External merge sort minimizes disk I/O cost – Larger block size means less I/O cost per page. – Larger block size means smaller # runs merged In practice, # of runs rarely > 2 or 3

CS 4432lecture #345 Recap Today : On Disk Optimizations Next Week: On Storage Layout Start to read chapter 3 in textbook

CS 4432lecture #346 Homework 1 Out: Today Due: Next Friday (in class)