Indexing Techniques
Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index? ……
Advanced DatabasesIndexing Techniques3 Definitions Index: an auxiliary data structure to speed up record retrieval Search key: the field/s of a table which is/are indexed Storage: index files that contain index records –Each entry storing Actual data record or, search key value k and record ID or, search key value k and list of records IDs Types: ordered and unordered (hash) indices Page iPage i+1 Paul Anna Tim
Advanced DatabasesIndexing Techniques4 Types of Ordered Indices (1/3) Assuming ordered data files Depending on which field is indexed –Primary index: search key is ordering key field Pointer for each page –Secondary index: search key is non ordering field Paul Anna Matt Tim Carol Rob Anna Carol Paul Tim primary secondary
Advanced DatabasesIndexing Techniques5 Types of Ordered Indices (2/3) Depending on the density of index records –Dense index: an index record for each distinct search key value, ie every record –Sparse index: index records for only some search key values search key value for first record in page pointer to page Paul Anna Matt Tim Carol Rob sparse dense
Advanced DatabasesIndexing Techniques6 Types of Ordered Indices (3/3) Ordering field is nonkey (may have duplicates) –Clustered index –Unclustered index Paul Anna Matt Tim Carol Rob Paul Tim Tim Anna Carol Matt Paul Rob Tim clustered unclustered
Advanced DatabasesIndexing Techniques7 Indices Exercise 2 15 records 128 bytes/record 2 10 bytes/page ordered file equality search on ordering field, unspanned organization –without an index –with a primary index on field of size 12 bytes assume pointer 4 bytes long
Advanced DatabasesIndexing Techniques8 Multi-level Indices (1/2) If access using first-level index is still expensive Build a sparse index on the first-level index –Multi-level Index Fan-out: index blocking factor Paul Anna Matt Tim Carol Rob first-level index second-level index
Advanced DatabasesIndexing Techniques9 Multi-level Indices (2/2) 2 6 index records/page (fan-out) 2 15 index records 1st-level –2 9 pages 2nd-level –2 9 index records –2 3 pages 3rd-level –2 3 index records –1 page 1 <= 2 15 / (2 6 ) t t = ceil(log ) = 3 t = ceil(log fo #index-records)
Advanced DatabasesIndexing Techniques10 Dynamic multi-level indices So far assumed indices are physically ordered files –expensive insertions and deletions Dynamic multi-level indices –B trees –B + trees
Advanced DatabasesIndexing Techniques11 Tree-structured Indices For each node: K 1 < K 2 < … K q-1 For each value X in subtree pointed to by P i –K i-1 < X < K i, 1<i<q –X < K i, i=1 –K i-1 < X, i=q P1P1 K1K1 …K i-1 PiPi KiKi …K q-1 PqPq XXX
Advanced DatabasesIndexing Techniques12 B tree Problems: empty nodes, unbalanced trees –solution: B trees ………………………
Advanced DatabasesIndexing Techniques13 B tree: Definition Each node:, P 2,…,, P q > P i tree pointer, K i search value, Pr i data pointer For each node: K 1 < K 2 < … K q-1 For each value X in subtree pointed to by P i –K i-1 < X < K i, 1<i<q –X < K i, i=1 –K i-1 < X, i=q Each node at most q pointers –B tree is order q Each node at least ceil(q/2) tree pointers –except from root Internal node with p pointers has p-1 values All leaves at the same level –balanced tree
Advanced DatabasesIndexing Techniques14 B tree: Example 58 ø1ø3øø6ø7øø9ø12ø tree pointer data pointer ø null pointer
Advanced DatabasesIndexing Techniques15 B + tree Most implementations of B tree are B + tree Data pointers only in leaves –more entries in internal nodes than regular B trees –less internal nodes –less levels –faster access
Advanced DatabasesIndexing Techniques16 B + tree: Definition Internal nodes: Leaf nodes:,,…,, P next > Pr i points a data records or block of pointers of such records leaf order
Advanced DatabasesIndexing Techniques B+ tree: Search At each level, find smallest K i larger than search key Follow associated pointer P i
Advanced DatabasesIndexing Techniques18 B+ tree: Insert Nodes may overflow or underflow Ignoring overflow or underflow Inserting data record with with search key value k –find leaf node –if k found add record to file, create indirect block if there isn’t one add record pointer to indirect block –if k not found add data record to file insert record pointer in leaf node (all search keys in order)
Advanced DatabasesIndexing Techniques19 B+ tree: Delete Ignoring overflow or underflow Find leaf node with search key value k Find data record pointer, delete record delete index record –and indirect block, if any, if empty
Advanced DatabasesIndexing Techniques20 B+ tree: Simple Insert Insert k <
Advanced DatabasesIndexing Techniques21 B+ tree: Leaf Overflow (1/2) Insert k < 100
Advanced DatabasesIndexing Techniques22 B+ tree: Leaf Overflow (2/2) first ceil(n/2) in existing node, rest in new leaf node n=3+1= k <
Advanced DatabasesIndexing Techniques k < B+ tree: Internal Node Overflow (1/3) Insert 210, insert
Advanced DatabasesIndexing Techniques24 B+ tree: Internal Node Overflow (2/3) Leaf Split 930 k <
Advanced DatabasesIndexing Techniques25 B+ tree: Internal Node Overflow (3/3) 930 k <
Advanced DatabasesIndexing Techniques26 B+ tree: New Root (1/2) Insert 210, insert
Advanced DatabasesIndexing Techniques27 B+ tree: New Root (2/2)
Advanced DatabasesIndexing Techniques28 Index Insert Exercise Insert 8, 7,
Advanced DatabasesIndexing Techniques29 B+ tree: Delete Simple delete case Underflow case: –redistribute records –coalesce with siblings –update parents
Advanced DatabasesIndexing Techniques30 B+ tree: Simple Delete (1/2) Delete
Advanced DatabasesIndexing Techniques31 B+ tree: Simple Delete (2/2) Leaf Updated
Advanced DatabasesIndexing Techniques32 B+ tree: Delete Redistribution (1/2) Delete
Advanced DatabasesIndexing Techniques33 B+ tree: Delete Redistribution (2/2) Redistribute entries –left or right sibling
Advanced DatabasesIndexing Techniques34 B+ tree: Delete Coalesce (1/4) Delete
Advanced DatabasesIndexing Techniques35 B+ tree: Delete Coalesce (2/4) Leaf updated No redistribution –sibling coalesce
Advanced DatabasesIndexing Techniques36 B+ tree: Delete Coalesce (3/4) Leaf updated No redistribution –sibling coalesce
Advanced DatabasesIndexing Techniques37 B+ tree: Delete Coalesce (4/4) Redistribution
Hashing Techniques
Advanced DatabasesIndexing Techniques39 Static Hashing (1/2) Store records in buckets with overflow chains Allocate a fixed number of buckets M Problems: –small M long overflow chains, slow search-delete-insert null h
Advanced DatabasesIndexing Techniques40 Static Hashing (2/2) Problems: –large M wasted space, slow scan null h
Advanced DatabasesIndexing Techniques41 Dynamic Hashing Splitting and coalescing buckets as the database grows-shrinks One scheme: Extendible Hashing Hash function generates large values, eg 32 bits –use i bits, change i as database size changes If overflow, double the number of buckets –use i+1 bits of the hash function –but, expensive: read all pages M and distribute records in 2*M pages solution: use a directory and double the size of the directory –only split bucket that overflowed
Advanced DatabasesIndexing Techniques42 Extendible Hashing (1/4) h(18) = Directory Buckets 37 2 A B C D 18
Advanced DatabasesIndexing Techniques43 Extendible Hashing (2/4) h(4) = A B C D 18
Advanced DatabasesIndexing Techniques44 Extendible Hashing (3/4) A B C D A1
Advanced DatabasesIndexing Techniques45 Extendible Hashing (4/4) A B C D A Global Depth Local Depth If bucket full: –split bucket –increment LD If GD=LD –increment GD –double directory
Advanced DatabasesIndexing Techniques46 Extendible Hashing: Delete If deletion make bucket empty –merge with split image If directory pointers point to same bucket as split image –directory halved
Advanced DatabasesIndexing Techniques47 Extendible Hashing: Summary Avoids overflow pages Directory can get large Key search requires just 2 page reads Space utilization fluctuates –59-90% for uniformly distributed records
Advanced DatabasesIndexing Techniques48 Extendible Hashing: Exercise Initially GD = LD = 1 M = 2 buckets Hash function: h(k) = k mod 2 i inserts: 14, 18, 22, 3, 9 deletes 9, 22,