Putting things in order

Slides:



Advertisements
Similar presentations
Copyright 2003Curt Hill Hash indexes Are they better or worse than a B+Tree?
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
1 Overview of Storage and Indexing Chapter 8 (part 1)
External Sorting R & G Chapter 13 One of the advantages of being
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
External Sorting R & G Chapter 11 One of the advantages of being disorderly is that one is constantly making exciting discoveries. A. A. Milne.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Overview of Storage and Indexing Chapter 8 1. Basics about file management 2. Introduction to indexing 3. First glimpse at indices and workloads.
CPSC 404, Laks V.S. Lakshmanan1 Tree-Structured Indexes BTrees -- ISAM Chapter 10 – Ramakrishnan & Gehrke (Sections )
Copyright © Curt Hill Query Evaluation Translating a query into action.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
1 Overview of Storage and Indexing Chapter 8 (part 1)
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
B+ Trees: An IO-Aware Index Structure Lecture 13.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
External Sorting Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
CS522 Advanced database Systems
CS 540 Database Management Systems
Indexing Goals: Store large files Support multiple search keys
Database Management System
Lecture 16: Data Storage Wednesday, November 6, 2006.
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
External Sorting Chapter 13
Are they better or worse than a B+Tree?
Chapter 12: Query Processing
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
File organization and Indexing
Chapter 11: Indexing and Hashing
Database Management Systems (CS 564)
Lecture#12: External Sorting (R&G, Ch13)
Physical Database Design
CS222: Principles of Data Management Notes #09 Indexing Performance
External Sorting Chapter 13
Selected Topics: External Sorting, Join Algorithms, …
CS222P: Principles of Data Management Notes #09 Indexing Performance
Lecture 2- Query Processing (continued)
Lecture 28: Index 3 B+ Trees
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Chapter 12 Query Processing (1)
CS222p: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
General External Merge Sort
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
External Sorting.
CENG 351 Data Management and File Structures
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Database Systems (資料庫系統)
Lecture 20: Indexes Monday, February 27, 2006.
Chapter 11: Indexing and Hashing
External Sorting Chapter 13
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #04 Schema versioning and File organizations Instructor: Chen Li.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #08 Comparisons of Indexes and Indexing Performance Instructor: Chen Li.
CSE 190D Database System Implementation
CS222P: Principles of Data Management UCI, Fall 2018 Notes #04 Schema versioning and File organizations Instructor: Chen Li.
Join Implementation How is it done? Copyright © Curt Hill.
Presentation transcript:

Putting things in order Sorting Putting things in order This is the big picture version, it includes sort_i and sort_e and would be suitable for data structures. Copyright © 2003-2011 Curt Hill

Sorting is Important One of the largest consumers of resources Estimated that in the 1960s 1/3 of all CPU power was used on sorts One sort was recorded to take months Sorts are expensive Still consume cycles or accesses Too many things do not work without them Entire grad level classes on just sorting Copyright © 2003-2011 Curt Hill

Definitions Internal sorting External sorting Merge All the data is in memory – most often in an array No I/Os are required External sorting The data will not fit in memory Usually the I/O costs exceed the CPU costs Merge Start with two (or more) sorted files Produce one sorted result Copyright © 2003-2011 Curt Hill

Keys Often more than one Primary key Secondary key Most important The DB notion is different than the sort one Secondary key Discriminate between two records where primary key is the same There may be multiple secondaries Ordered by importance Copyright © 2003-2011 Curt Hill

The Sort Merge Generalized program that will sort any file Standard program on mainframes, but less common on other platforms User specifies the sort and file characteristics The program then creates the sorted file Copyright © 2003-2011 Curt Hill

Sort Characteristics Primary and secondary keys Exits Position Length Type Ascending or descending Exits Points in the algorithm where a user written program could be invoked Usually to filter/reformat the data Copyright © 2003-2011 Curt Hill

The Sort Merge Did as much as it could using internal sorting Usually had to resort to file merging for files that were too large to fit in memory Copyright © 2003-2011 Curt Hill

Internal Sorting Data fits in memory Typically an array or table Examined first Sort_I.ppt Copyright © 2003-2011 Curt Hill

External Sorting Usual situation with a DBMS Occurs elsewhere as well Too many pages for a file to fit in memory The internal sorts mostly require random access to individual records While we have random access to pages Want to minimize the number of pages sort_e.ppt Copyright © 2003-2011 Curt Hill

Parallelism With multiple CPUs some gain in sorting may occur Each CPU may do part of a sort in parallel Only the final merge needs a single CPU to properly handle Copyright © 2003-2011 Curt Hill

Using a BTree Suppose that the file in question has a BTree index on the field in question Should that be used instead of sorting? The answer depends on whether this is a clustered or unclustered index Copyright © 2003-2011 Curt Hill

Well? Clustered Unclustered Traverse root to initial value Ride the leaves as long as needed This eliminates the need for a sort, the leaves are already sorted Unclustered We will have to use index for each record Generally a sequential scan that is sorted will be faster for normal index sizes Copyright © 2003-2011 Curt Hill