Creating and Managing Indexes Using Proc SQL Chapter 6 1
What is An Index? An index is an auxiliary file that stores the physical location of values for one or more specified columns in a table. An index stores unique values for a specified column(s) in ascending order. An index includes value/identifier pairs that allow you to access a row directly, by value. You cannot create an index on a view. 2
Why Use An Index? When a query is submitted on a table, PROC SQL accesses rows in a table in the order in which they are stored in the table, beginning with the first row and reading all rows in the table. For large tables, it can be time consuming for PROC SQL to read all the rows in a table. 3
Why Use An Index? An index on a table may allow a PROC SQL query to execute more efficiently. We can refer to the LOG to view real time and CPU time (we will learn about STIMER in Chapter 8) for a partial measure of efficiency gains 4
Simple and Composite Indexes Two types of indexes can be created –Simple – based on one column. The name of the index must be the name of the indexed column. –Composite – based on two or more columns. The name of the composite index cannot be the name of any existing column or index in the table. 5
Example - Creating A Simple Index Suppose we had a table (meddb) containing all doctors claims for 100,000 patients. If we wanted to create an index on the patient’s Social Security Number (ssn), we would use the following code. proc sql; create index ssn create index ssn on meddb(ssn); on meddb(ssn); quit; quit; 6
Example - Creating A Composite Index If we wanted to create a composite index on the patient’s Social Security Number (ssn) and the type of service the patient received (tos), we would use the following code proc sql; proc sql; create index ssntos create index ssntos on meddb(ssn,tos); on meddb(ssn,tos); quit; quit; 7
Unique Indexes Simple and composite indexes can be defined as unique indexes. Add the word unique before the word index when creating a unique index If a table contains multiple occurrences of the same value in a column, a unique index cannot be defined on that column. 8
Costs of Using an Index Additional CPU time Additional input/output requests may be required Additional memory Additional disk space to store the index file 9
Guidelines for Creating Indexes Keep the number of indexes to a minimum Do not create an index for small tables Do not create an index on a column that has a small number of distinct values (low cardinality) Use indexes for queries that return a small subset of rows (< 15%) 10
Describe Table Statement Use to display a CREATE INDEX statement in the SAS log for each index that is defined If no indexes are defined for the specified table, a CREATE INDEX statement will not appear in the SAS log 11
MSGLEVEL = Option To determine if an index is used, specify the SAS system option MSGLEVEL=I. The default for the MSGLEVEL= option is MSGLEVEL=N. This will display notes, warnings, and error messages only. OPTIONS MSGLEVEL=I; 12
IDXWHERE = Option IDXWHERE=YES –Tells SAS to choose the best index to process the WHERE expression IDXWHERE=NO –Tells SAS to ignore all indexes IDXNAME= index name –Tells SAS to use the specified index 13
Dropping Indexes Use the drop index statement to drop(delete) one or more indexes Use the drop index statement to drop(delete) one or more indexes proc sql; drop index ssn drop index ssn from claims; from claims; quit; quit; 14