Improve Run Generation Overlap input,output, and internal CPU work. Reduce the number of runs (equivalently, increase average run length). DISK MEMORY.

Slides:



Advertisements
Similar presentations
CS 400/600 – Data Structures External Sorting.
Advertisements

Sorting Really Big Files Sorting Part 3. Using K Temporary Files Given  N records in file F  M records will fit into internal memory  Use K temp files,
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Improve Run Merging Reduce number of merge passes.  Use higher order merge.  Number of passes = ceil(log k (number of initial runs)) where k is the merge.
Divide And Conquer Distinguish between small and large instances. Small instances solved differently from large ones.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting “There it was, hidden in alphabetical order.” Rita Holt R&G Chapter 13.
1 Today’s Material Divide & Conquer (Recursive) Sorting Algorithms –QuickSort External Sorting.
CS 206 Introduction to Computer Science II 04 / 27 / 2009 Instructor: Michael Eckmann.
Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.
Given data file.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
CS 206 Introduction to Computer Science II 12 / 05 / 2008 Instructor: Michael Eckmann.
E.G.M. Petrakissorting1 Sorting  Put data in order based on primary key  Many methods  Internal sorting:  data in arrays in main memory  External.
External Sorting R & G Chapter 13 One of the advantages of being
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
External Sorting R & G Chapter 11 One of the advantages of being disorderly is that one is constantly making exciting discoveries. A. A. Milne.
External Sorting Access to secondary storage is orders of magnitude slower than memory access. Minimize access to secondary storage (tape or disk).
External Sorting 198:541. Why Sort?  A classic problem in computer science!  Data requested in sorted order e.g., find students in increasing gpa order.
Sorting Rearrange n elements into ascending order. 7, 3, 6, 2, 1  1, 2, 3, 6, 7.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved L18 (Chapter 23) Algorithm.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
Divide-And-Conquer Sorting Small instance.  n
External Sorting Sort n records/elements that reside on a disk. Space needed by the n records is very large.  n is very large, and each record may be.
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
1 Sorting Algorithms Sections 7.1 to Comparison-Based Sorting Input – 2,3,1,15,11,23,1 Output – 1,1,2,3,11,15,23 Class ‘Animals’ – Sort Objects.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Divide And Conquer A large instance is solved as follows:  Divide the large instance into smaller instances.  Solve the smaller instances somehow. 
Lecture 6 : External Sorting Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University.
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
Tim Roughgarden Introduction Merge Sort (Analysis ) Design and Analysis of Algorithms I.
External Sorting Adapt fastest internal-sort methods.
FALL 2005CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
1 Ch.19 Divide and Conquer. 2 BIRD’S-EYE VIEW Divide and conquer algorithms Decompose a problem instance into several smaller independent instances May.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Chapter 4, Part II Sorting Algorithms. 2 Heap Details A heap is a tree structure where for each subtree the value stored at the root is larger than all.
CPSC Why do we need Sorting? 2.Complexities of few sorting algorithms ? 3.2-Way Sort 1.2-way external merge sort 2.Cost associated with external.
External Sorting Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Sorting – Lecture 3 More about Merge Sort, Quick Sort.
CMPT 238 Data Structures More on Sorting: Merge Sort and Quicksort.
CENG 3511 External Sorting. CENG 3512 Outline Introduction Heapsort Multi-way Merging Multi-step merging Replacement Selection in heap-sort.
Sorting.
External Sorting Sort n records/elements that reside on a disk.
External Sort Any sort algorithm which uses external memory, such as tape or disk, during the sort. The best algorithms for processing large amounts of.
Quick-Sort 9/13/2018 1:15 AM Quick-Sort     2
External Sorting Adapt fastest internal-sort methods.
Selection Selection 1 Quick-Sort Quick-Sort 10/30/16 13:52
MergeSort Source: Gibbs & Tamassia.
Database Management Systems (CS 564)
Improve Run Merging Reduce number of merge passes.
Improve Run Generation
CS222: Principles of Data Management Lecture #10 External Sorting
Chapter 12 Query Processing (1)
General External Merge Sort
Quick-Sort 4/25/2019 8:10 AM Quick-Sort     2
CENG 351 Data Management and File Structures
External Sorting Adapt fastest internal-sort methods.
Database Systems (資料庫系統)
External Sorting Dina Said
Presentation transcript:

Improve Run Generation Overlap input,output, and internal CPU work. Reduce the number of runs (equivalently, increase average run length). DISK MEMORY DISK

Internal Quick Sort Use 6 as the pivot (median of 3). Input first, middle, and last blocks first. In-place partitioning. Input blocks from the ends toward the middle. Sort left and right groups recursively. Can begin output as soon as left most block is ready

Alternative Internal Sort Scheme DISK B1B2B3 Partition into 3 buffers.

Steady State Operation Read from disk Write to disk Run generation Synchronization is done when the active input buffer gets empty (the active output buffer will be full at this time).

DISK MEMORY DISK New Strategy Use 2 input and 2 output buffers. Rest of memory is used for a min loser tree. Input 1Input 0Output 0Output 1 Loser Tree

Steady State Operation Read from disk Write to disk Run generation Synchronization is done when the active input buffer gets empty (the active output buffer will be full at this time).

O0O1 I0 I1 Initialize

O0O1 I0 I1

Initialize O0O1 I0 I1

Initialize O0O1 I0 I1

Initialize O0O1 I0 I1

Initialize O0O1 I0 I1

Generate Run O0O1 I0 I

Generate Run O0O1 I0 I

5 O0 2 3 Generate Run O1 I0 I

45 O0 2 3 Generate Run O1 I0 I

5 O O1 I0 I

5 O O1 I0 I Continue With Run 1

O O I0 I Continue With Run 1 1 5

1 O O I0 I Continue With Run

9 1 O O I0 I Continue With Run

9 1 O O I0 I

9 1 O O I0 I Continue With Run 1 2

2 9 1 O O I0 I Continue With Run

2 9 1 O O I0 I Continue With Run

Buffer Reduction We may reduce the number of buffers to 3. At any time, 1 us used to read into, 1 to write from, and the 3 rd both feeds the loser tree and is filled from the tree. The 3 physical buffers rotate through these three roles.

Let k be number of external nodes in loser tree. Run size >= k. Sorted input => 1 run. Reverse of sorted input => n/k runs. Average run size is ~2k.

Memory capacity = m records. Run size using fill memory, sort, and output run scheme = m. Use loser tree scheme.  Assume block size is b records.  Need memory for 4 buffers (4b records).  Loser tree k = m – 4b.  Average run size = 2k = 2(m – 4b).  2k >= m when m >= 8b.

Assume b = 100. m k k

Total internal processing time using fill memory, sort, and output run scheme = O((n/m) m log m) = O(n log m). Total internal processing time using loser tree = O(n log k). Loser tree scheme generates runs that differ in their lengths.

4369 Merging Runs Of Different Length