Lecture 31: The IO Model 2 Repacking

Slides:



Advertisements
Similar presentations
CS 400/600 – Data Structures External Sorting.
Advertisements

External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Sorting Really Big Files Sorting Part 3. Using K Temporary Files Given  N records in file F  M records will fit into internal memory  Use K temp files,
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
CS 4432query processing - lecture 161 CS4432: Database Systems II Lecture #16 Join Processing Algorithms Professor Elke A. Rundensteiner.
6.830 Lecture 9 10/1/2014 Join Algorithms. Database Internals Outline Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter.
Lecture 8 Join Algorithms. Intro Until now, we have used nested loops for joining data – This is slow, n^2 comparisons How can we do better? – Sorting.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 11 External Sorting.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
External Sorting R & G Chapter 13 One of the advantages of being
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
External Sorting R & G Chapter 11 One of the advantages of being disorderly is that one is constantly making exciting discoveries. A. A. Milne.
CPSC 231 Sorting Large Files (D.H.)1 LEARNING OBJECTIVES Sorting of large files –merge sort –performance of merge sort –multi-step merge sort.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting 198:541. Why Sort?  A classic problem in computer science!  Data requested in sorted order e.g., find students in increasing gpa order.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
B+ Trees: An IO-Aware Index Structure Lecture 13.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
CPSC Why do we need Sorting? 2.Complexities of few sorting algorithms ? 3.2-Way Sort 1.2-way external merge sort 2.Cost associated with external.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
List Algorithms Taken from notes by Dr. Neil Moore & Dr. Debby Keen
Lecture 16: Data Storage Wednesday, November 6, 2006.
External Sorting Chapter 13
Simple Sorting Algorithms
Chapter 12: Query Processing
Lecture 11: DMBS Internals
Database Management Systems (CS 564)
List Algorithms Taken from notes by Dr. Neil Moore
Database Management Systems (CS 564)
Lecture#12: External Sorting (R&G, Ch13)
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Improve Run Merging Reduce number of merge passes.
Midterm Review – Part I ( Disk, Buffer and Index )
External Sorting Chapter 13
Selected Topics: External Sorting, Join Algorithms, …
Improve Run Generation
CS222P: Principles of Data Management UCI, Fall 2018 Notes #09 External Sorting Instructor: Chen Li.
Lecture 30: The IO Model 1 External Sorting
Lecture 2- Query Processing (continued)
Lecture 28: Index 3 B+ Trees
Merge Sort Michael Morton.
Lecture 27: Index 2 B+ Trees
CS222: Principles of Data Management Lecture #10 External Sorting
Chapter 12 Query Processing (1)
Lecture 20: Intro to Transactions & Logging II
Lecture 19b: Intro to Transactions & Logging
Evaluation of Relational Operations: Other Techniques
External Sorting.
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
CS222P: Principles of Data Management Lecture #10 External Sorting
CENG 351 Data Management and File Structures
External Sorting Adapt fastest internal-sort methods.
Database Systems (資料庫系統)
External Sorting Chapter 13
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #09 External Sorting Instructor: Chen Li.
Lecture 20: Representing Data Elements
External Sorting Dina Said
Presentation transcript:

Lecture 31: The IO Model 2 Repacking Professor Xiannong Meng Spring 2018 Lecture and activity contents are based on what Prof Chris Ré of Stanford used in his CS 145 in the fall 2016 term with permission

Repacking

Repacking for even longer initial runs With B+1 buffer pages, we can now start with B+1-length initial runs (and use B-way merges) to get 2𝑁( log 𝐵 𝑵 𝑩+𝟏 +1) IO cost… Can we reduce this cost more by getting even longer initial runs? Use repacking- produce longer initial runs by “merging” in buffer as we sort at initial stage

Repacking Example: 3 page buffer Start with unsorted single input file, and load 2 pages Disk Main Memory Buffer 31,12 10,33 44,55 F1 18,22 57,24 3,98 F2

Repacking Example: 3 page buffer Start with unsorted single input file, and load 2 pages Disk Main Memory Buffer 44,55 F1 18,22 57,24 3,98 F2 31,12 10,33

Repacking Example: 3 page buffer Also keep track of max (last) value in current run… Take the minimum two values, and put in output page Disk Main Memory Buffer m=12 44,55 F1 18,22 57,24 3,98 F2 31,12 31 10,33 33 10,12

Repacking Example: 3 page buffer Next, repack Disk Main Memory Buffer m=12 44,55 F1 18,22 57,24 3,98 F2 33 10,12 31,33

Repacking Example: 3 page buffer Next, repack, then load another page and continue! Disk Main Memory Buffer m=33 m=12 44,55 F1 18,22 57,24 3,98 F2 10,12 31,33

Repacking Example: 3 page buffer Now, however, the smallest values are less than the largest (last) in the sorted run… Disk Main Memory Buffer m=33 F1 57,24 3,98 44,55 F2 18,22 10,12 31,33 18,22 We call these values frozen because we can’t add them to this run…

Repacking Example: 3 page buffer Now, however, the smallest values are less than the largest (last) in the sorted run… Disk Main Memory Buffer m=55 F1 3,98 F2 57,24 10,12 31,33 44,55 18,22

Repacking Example: 3 page buffer Now, however, the smallest values are less than the largest (last) in the sorted run… Disk Main Memory Buffer m=55 F1 F2 57,24 3,98 10,12 31,33 44,55 18,22

Repacking Example: 3 page buffer Now, however, the smallest values are less than the largest (last) in the sorted run… Disk Main Memory Buffer m=55 F1 F2 3,24 57,98 10,12 31,33 44,55 18,22

Repacking Example: 3 page buffer Once all buffer pages have a frozen value, or input file is empty, start new run with the frozen values Disk Main Memory Buffer m=0 F1 F2 3,24 10,12 31,33 44,55 18,22 57,98 F3

Repacking Note that, for buffer with B+1 pages: If input file is sorted  nothing is frozen  we get a single run! If input file is reverse sorted (worst case)  everything is frozen  we get runs of length B+1 In general, with repacking we do no worse than without it! What if the file is already sorted? Engineer’s approximation: runs will have ~2(B+1) length ~2𝑁( log 𝐵 𝑵 𝟐(𝑩+𝟏) +1)

Summary Basics of IO and buffer management. See notebook for more fun! (Learn about sequential flooding) We introduced the IO cost model using sorting. Saw how to do merges with few IOs, Works better than main-memory sort algorithms. Described a few optimizations for sorting