Index Tuning Conventional index
Secondary index To speed up queries on attributes not within primary key Primary index –Determine the placement of records in the data file –Each table has only one primary index Secondary –Only give the location of the records –One table may have multiple secondary index –Always dense
Secondary indexes Sequence field
Secondary indexes Sequence field Sparse index does not make sense!
Secondary indexes Sequence field Dense index sparse high level
With secondary indexes: Lowest level is dense Other levels are sparse Also: Pointers are record pointers (not block pointers; not computed)
Application of secondary indexes in clustered file Given relations –Movie(title, year, length, incolor, studioName, producerC#) –Studio(name, address, presC#) Suppose the following query is typical –SELECT t i t l e, year FROM Movie, Studio WHERE presC# = zzz AND Movie.studioName = Studio.name; Clustered file structure Secondary index on presC# can minimize disk I/Os!
Duplicate values & secondary indexes
Duplicate values & secondary indexes one option... Problem: excess overhead! disk space search time
Duplicate values & secondary indexes another option Problem: variable size records in index!
Duplicate values & secondary indexes Another idea (suggested in class): Chain records with same key? Problems: Need to add fields to records Need to follow chain to know records
Duplicate values & secondary indexes buckets Using Indirection!
Why “ bucket ” idea is useful IndexesRecords Name: primary EMP (name,dept,floor,...) Dept: secondary Floor: secondary We can use the pointers in the buckets to help answer queries without looking at most of records in the data file!
Query: Get employees in (Toy Dept) ^ (2nd floor) Dept. indexEMP Floor index Toy 2nd Intersect toy bucket and 2nd Floor bucket to get set of matching EMP ’ s
This idea used in text information retrieval Documents...the cat is fat......was raining cats and dogs......Fido the dog... Inverted lists cat dog
IR QUERIES Find articles with “ cat ” and “ dog ” Find articles with “ cat ” or “ dog ” Find articles with “ cat ” and not “ dog ” Find articles with “ cat ” in title Find articles with “ cat ” and “ dog ” within 5 words
Common technique: more info in inverted list cat Title5 100 Author10 Abstract57 Title12 d3d3 d2d2 d1d1 dog type position location
Summary so far Conventional index –Basic Ideas: sparse, dense, multi-level … –Duplicate Keys –Deletion/Insertion –Secondary indexes –Buckets of Postings List
Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance
ExampleIndex (sequential) continuous free space overflow area (not sequential)
summary