Chapter 24: Querying Data Efficiently

Slides:



Advertisements
Similar presentations
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Advertisements

Performing Queries Using PROC SQL Chapter 1 1 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Modern Information Retrieval
1 Overview of Storage and Indexing Chapter 8 (part 1)
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Team Dosen UMN Physical DB Design Connolly Book Chapter 18.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8.
Chapter 3: Combining Tables Horizontally using PROC SQL 1 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina Chapter 17 supplement: Review of Formatting Data STAT 541.
Chapter 15: Combining Data Horizontally 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 “How index-learning turns no student pale Yet holds.
© Pearson Education Limited, Chapter 13 Physical Database Design – Step 4 (Choose File Organizations and Indexes) Transparencies.
1 Overview of Storage and Indexing Chapter 8 (part 1)
Chapter 16: Using Lookup Tables to Match Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Databases.  A database is simply a collection of information stored in an orderly manner.  A database can be as simple as a birthday book, address book.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 19: Introduction to Efficient SAS Programming 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
File Organizations and Indexing
Chapter 4: Combining Tables Vertically using PROC SQL 1 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 17 Supplement: Alternatives to IF-THEN/ELSE Processing STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South.
Chapter 23: Selecting Efficient Sorting Strategies 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 21: Controlling Data Storage Space 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Table Structures and Indexing. The concept of indexing If you were asked to search for the name “Adam Wilbert” in a phonebook, you would go directly to.
Chapter 14: Combining Data Vertically 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 Jianping Fan Dept of Computer Science UNC-Charlotte.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Module 11: File Structure
CS522 Advanced database Systems
Record Storage, File Organization, and Indexes
Indexing Goals: Store large files Support multiple search keys
Database Management System
Storage and Indexes Chapter 8 & 9
File Organizations and Indexes
Choosing Access Path The basic methods.
Chapter 19: Introduction to Efficient SAS Programming
Methodology – Physical Database Design for Relational Databases
Chapter 3: Working With Your Data
Chapter 12: Query Processing
Chapter 13: Creating Samples and Indexes
Chapter 18: Modifying SAS Data Sets and Tracking Changes
Former Chapter 23: Selecting Efficient Sorting Strategies
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
國立臺北科技大學 課程:資料庫系統 fall Chapter 18
File organization and Indexing
Chapter 11: Indexing and Hashing
Chapter 7: Macros in SAS Macros provide for more flexible programming in SAS Macros make SAS more “object-oriented”, like R Not a strong suit of text ©
Lecture 12 Lecture 12: Indexing.
Chapter 24 (4th ed.): Creating Functions
File Organizations and Indexing
File Organizations and Indexing
Chapter 8 – Part II. A glimpse at indices and workloads
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Using Macros to Solve the Collation Problem
Database Management System
2018, Spring Pusan National University Ki-Joune Li
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Chapter 12 Query Processing (1)
CS222p: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Chapter 11: Indexing and Hashing
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
Presentation transcript:

Chapter 24: Querying Data Efficiently STAT 541 Chapter 24: Querying Data Efficiently ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina

Outline Using an index for efficient WHERE processing Identifying available indexes Identifying conditions that can be optimized Estimating the number of observations Comparing probable resource usage Deciding whether to create an index Comparing procedures that produce detail reports Comparing tools for summarizing data

Using an Index for Efficient WHERE Processing A WHERE statement can use sequential access or direct access (e.g., with an index) to search observations An index is effective when the WHERE group is small There is overhead associated with indexes 3

Identifying Available Indexes SAS will use an index for a variable in a WHERE statement only if The variable is the key variable in a simple index The variable(s) is(are) the first variable(s) in a composite index SAS will use the same index for WHERE and BY statements when they are both present Consecutive ordering in a composite index is important 4

Identifying Conditions that can be Optimized WHERE conditions will not be tested for optimization with an index if they contain: functions other than TRIM or SUBSTR SUBSTR, under certain conditions =* (sounds like) arithmetic operators variable-to-variable comparisons Compound WHERE conditions have additional constraints 5

Estimating the Number of Observations SAS estimates the subset size specified by the WHERE condition in deciding to use an index SAS actually stores quantiles with indexes to help estimate subset size Percentage of Data Set SAS Action 0-3% Direct Access 3-33% Probably Direct Access 33%-100% Probably Sequential Access 6

Comparing Probable Resource Usage Direct access will always be more costly in retrieving data SAS compares the number of predicted I/O swaps for direct access vs the number of I/O swaps for sequential access to decide whether to use an index Other factors can affect I/O swaps (e.g., order of the data, whether data is compressed) 7

Deciding Whether to Create an Index Do not create an index when the file is small Indexes do require overhead—do not create them needlessly Sort the data by the index variables before using the index 8