External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.

Slides:



Advertisements
Similar presentations
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting “There it was, hidden in alphabetical order.” Rita Holt R&G Chapter 13.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 11 External Sorting.
External Sorting R & G Chapter 13 One of the advantages of being
External Sorting R & G Chapter 11 One of the advantages of being disorderly is that one is constantly making exciting discoveries. A. A. Milne.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Professor Elke A. Rundensteiner.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting 198:541. Why Sort?  A classic problem in computer science!  Data requested in sorted order e.g., find students in increasing gpa order.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
Sorting.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
CPSC Why do we need Sorting? 2.Complexities of few sorting algorithms ? 3.2-Way Sort 1.2-way external merge sort 2.Cost associated with external.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
External Sorting Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
CF 1334 Sistem Basis Data (3 SKS)
Diskusi-08 Jelaskan dan berikan contoh penggunaan theta join, equijoin, natural join, outer join, dan semijoin The slides for this text are organized into.
Diskusi-5 Sebutkan perangkat (tools) yang berpotensi mendukung kebutuhan tugas-tugas manajerial (management work) Jelaskan enam karakteristik informasi.
Tree-Structured Indexes
Storage and Indexes Chapter 8 & 9
Diskusi-1 Bacalah materi chapter-01, lalu buatlah ringkasan yang berisi tentang : Definisi EUIS Siapa end user Dampak euis pada lingkungan kerja Perencanaan.
External Sorting Chapter 13
Latihan Answer the following questions using the relational schema from the Exercises at the end of Chapter 3: Create the Hotel table using the integrity.
External Sorting Chapter 13 (Sec ): Ramakrishnan & Gehrke and
Diskusi-16 Buatlah ringkasan tentang pertimbangan dalam desain yang ergonomis pada tiga perangkat utama komputer yaitu monitor, keyboard dan mouse (lihat.
Hash-Based Indexes Chapter 11
Latihan Create a separate table with the same structure as the Booking table to hold archive records. Using the INSERT statement, copy the records from.
Tugas-05 a. Sebutkan primary key masing-masing tabel
Introduction to Query Optimization
Relational Algebra Chapter 4, Part A
File Organizations Chapter 8 “How index-learning turns no student pale
Database Management Systems (CS 564)
Lecture#12: External Sorting (R&G, Ch13)
B+-Trees and Static Hashing
File Organizations and Indexing
File Organizations and Indexing
Tree-Structured Indexes
Team Project, Part II NOMO Auto, Part II IST 210 Section 4
Hash-Based Indexes Chapter 10
External Sorting Chapter 13
Selected Topics: External Sorting, Join Algorithms, …
Relational Algebra Chapter 4, Sections 4.1 – 4.2
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
CS222P: Principles of Data Management UCI, Fall 2018 Notes #09 External Sorting Instructor: Chen Li.
CS222: Principles of Data Management Lecture #10 External Sorting
Evaluation of Relational Operations: Other Techniques
External Sorting.
Distributed Databases
CS222P: Principles of Data Management Lecture #10 External Sorting
Database Systems (資料庫系統)
Introduction to Database Systems
Chapter 11 Instructor: Xin Zhang
Tree-Structured Indexes
External Sorting Chapter 13
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #09 External Sorting Instructor: Chen Li.
File Organizations and Indexing
COSC 3480 Projects & Homeworks Fall 2003
External Sorting Dina Said
Presentation transcript:

External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter 2: The Entity-Relationship Model Chapter 3: The Relational Model Chapter 4 (Part A): Relational Algebra Chapter 4 (Part B): Relational Calculus Chapter 5: SQL: Queries, Programming, Triggers Chapter 6: Query-by-Example (QBE) Chapter 7: Storing Data: Disks and Files Chapter 8: File Organizations and Indexing Chapter 9: Tree-Structured Indexing Chapter 10: Hash-Based Indexing Chapter 11: External Sorting Chapter 12 (Part A): Evaluation of Relational Operators Chapter 12 (Part B): Evaluation of Relational Operators: Other Techniques Chapter 13: Introduction to Query Optimization Chapter 14: A Typical Relational Optimizer Chapter 15: Schema Refinement and Normal Forms Chapter 16 (Part A): Physical Database Design Chapter 16 (Part B): Database Tuning Chapter 17: Security Chapter 18: Transaction Management Overview Chapter 19: Concurrency Control Chapter 20: Crash Recovery Chapter 21: Parallel and Distributed Databases Chapter 22: Internet Databases Chapter 23: Decision Support Chapter 24: Data Mining Chapter 25: Object-Database Systems Chapter 26: Spatial Data Management Chapter 27: Deductive Databases Chapter 28: Additional Topics 1

Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing gpa order Sorting is first step in bulk loading B+ tree index. Sorting useful for eliminating duplicate copies in a collection of records (Why?) Sort-merge join algorithm involves sorting. Problem: sort 1Gb of data with 1Mb of RAM. why not virtual memory? 4

2-Way Sort: Requires 3 Buffers Pass 1: Read a page, sort it, write it. only one buffer page is used Pass 2, 3, …, etc.: three buffer pages used. INPUT 1 OUTPUT INPUT 2 Main memory buffers Disk Disk 5

Two-Way External Merge Sort 3,4 6,2 9,4 8,7 5,6 3,1 2 Input file Each pass we read + write each page in file. N pages in the file => the number of passes So toal cost is: Idea: Divide and conquer: sort subfiles and merge PASS 0 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs PASS 1 2,3 4,7 1,3 2-page runs 4,6 8,9 5,6 2 PASS 2 2,3 4,4 1,2 4-page runs 6,7 3,5 8,9 6 PASS 3 1,2 2,3 3,4 8-page runs 4,5 6,6 7,8 9 6

General External Merge Sort More than 3 buffer pages. How can we utilize them? To sort a file with N pages using B buffer pages: Pass 0: use B buffer pages. Produce sorted runs of B pages each. Pass 2, …, etc.: merge B-1 runs. INPUT 1 . . . INPUT 2 . . . . . . OUTPUT INPUT B-1 Disk Disk B Main memory buffers 7

Cost of External Merge Sort Number of passes: Cost = 2N * (# of passes) E.g., with 5 buffer pages, to sort 108 page file: Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages) Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages) Pass 2: 2 sorted runs, 80 pages and 28 pages Pass 3: Sorted file of 108 pages 8

Number of Passes of External Sort 9

I/O for External Merge Sort … longer runs often means fewer passes! Actually, do I/O a page at a time In fact, read a block of pages sequentially! Suggests we should make each buffer (input/output) be a block of pages. But this will reduce fan-out during merge passes! In practice, most files still sorted in 2-3 passes.

Number of Passes of Optimized Sort Block size = 32, initial pass produces runs of size 2B. 13

Double Buffering To reduce wait time for I/O request to complete, can prefetch into `shadow block’. Potentially, more passes; in practice, most files still sorted in 2-3 passes. INPUT 1 INPUT 1' INPUT 2 OUTPUT INPUT 2' OUTPUT' b block size Disk INPUT k Disk INPUT k' B main memory buffers, k-way merge

Sorting Records! Sorting has become a blood sport! Parallel sorting is the name of the game ... Datamation: Sort 1M records of size 100 bytes Typical DBMS: 15 minutes World record: 3.5 seconds 12-CPU SGI machine, 96 disks, 2GB of RAM New benchmarks proposed: Minute Sort: How many can you sort in 1 minute? Dollar Sort: How many can you sort for $1.00?

Summary External sorting is important; DBMS may dedicate part of buffer pool for sorting! External merge sort minimizes disk I/O cost: Pass 0: Produces sorted runs of size B (# buffer pages). Later passes: merge runs. # of runs merged at a time depends on B, and block size. Larger block size means less I/O cost per page. Larger block size means smaller # runs merged. In practice, # of runs rarely more than 2 or 3. The best sorts are wildly fast: Despite 40+ years of research, we’re still improving! 19