Indexing Techniques. Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index?

Indexing Techniques

Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index? ……

Advanced DatabasesIndexing Techniques3 Definitions Index: an auxiliary data structure to speed up record retrieval Search key: the field/s of a table which is/are indexed Storage: index files that contain index records –Each entry storing Actual data record or, search key value k and record ID or, search key value k and list of records IDs Types: ordered and unordered (hash) indices Page iPage i+1 Paul Anna Tim

Advanced DatabasesIndexing Techniques4 Types of Ordered Indices (1/3) Assuming ordered data files Depending on which field is indexed –Primary index: search key is ordering key field Pointer for each page –Secondary index: search key is non ordering field Paul00112233 Anna00112234 Matt00112235 Tim00112236 Carol00112237 Rob00112238 00112233 00112235 00112236 00112238 Anna Carol Paul Tim primary secondary

Advanced DatabasesIndexing Techniques5 Types of Ordered Indices (2/3) Depending on the density of index records –Dense index: an index record for each distinct search key value, ie every record –Sparse index: index records for only some search key values search key value for first record in page pointer to page Paul00112233 Anna00112234 Matt00112235Tim00112236 Carol00112237 Rob0011223800112233 00112235 00112236 00112238 sparse 00112233 00112234 00112235 00112236 00112237 00112238 dense

Advanced DatabasesIndexing Techniques6 Types of Ordered Indices (3/3) Ordering field is nonkey (may have duplicates) –Clustered index –Unclustered index Paul00112233 Anna00112234 Matt00112235 Tim00112236 Carol00112237 Rob00112238 Paul01112233 Tim01112236 Tim02112236 Anna Carol Matt Paul Rob Tim 00112233 00112234 00112235 00112236 00112237 00112238 01112233 01112236 02112236 clustered unclustered

Advanced DatabasesIndexing Techniques7 Indices Exercise 2 15 records 128 bytes/record 2 10 bytes/page ordered file equality search on ordering field, unspanned organization –without an index –with a primary index on field of size 12 bytes assume pointer 4 bytes long

Advanced DatabasesIndexing Techniques8 Multi-level Indices (1/2) If access using first-level index is still expensive Build a sparse index on the first-level index –Multi-level Index Fan-out: index blocking factor Paul00112233 Anna00112234 Matt00112235 Tim00112236 Carol00112237 Rob00112238 00112233 00112234 00112235 00112236 00112237 00112238 00112233 00112235 00112236 first-level index second-level index

Advanced DatabasesIndexing Techniques9 Multi-level Indices (2/2) 2 6 index records/page (fan-out) 2 15 index records 1st-level –2 9 pages 2nd-level –2 9 index records –2 3 pages 3rd-level –2 3 index records –1 page 1 <= 2 15 / (2 6 ) t t = ceil(log 2 6 2 15 ) = 3 t = ceil(log fo #index-records)

Advanced DatabasesIndexing Techniques10 Dynamic multi-level indices So far assumed indices are physically ordered files –expensive insertions and deletions Dynamic multi-level indices –B trees –B + trees

Advanced DatabasesIndexing Techniques11 Tree-structured Indices For each node: K 1 < K 2 < … K q-1 For each value X in subtree pointed to by P i –K i-1 < X < K i, 1<i<q –X < K i, i=1 –K i-1 < X, i=q P1P1 K1K1 …K i-1 PiPi KiKi …K q-1 PqPq XXX

Advanced DatabasesIndexing Techniques12 B tree Problems: empty nodes, unbalanced trees –solution: B trees ………………………

Advanced DatabasesIndexing Techniques13 B tree: Definition Each node:, P 2,…,, P q > P i tree pointer, K i search value, Pr i data pointer For each node: K 1 < K 2 < … K q-1 For each value X in subtree pointed to by P i –K i-1 < X < K i, 1<i<q –X < K i, i=1 –K i-1 < X, i=q Each node at most q pointers –B tree is order q Each node at least ceil(q/2) tree pointers –except from root Internal node with p pointers has p-1 values All leaves at the same level –balanced tree

Advanced DatabasesIndexing Techniques14 B tree: Example 58 ø1ø3øø6ø7øø9ø12ø tree pointer data pointer ø null pointer

Advanced DatabasesIndexing Techniques15 B + tree Most implementations of B tree are B + tree Data pointers only in leaves –more entries in internal nodes than regular B trees –less internal nodes –less levels –faster access

Advanced DatabasesIndexing Techniques16 B + tree: Definition Internal nodes: Leaf nodes:,,…,, P next > Pr i points a data records or block of pointers of such records leaf order 120150180 150156179 180200 100101110 120130

Advanced DatabasesIndexing Techniques17 100101110 120130 150156179 180200 3511 3035 120150180 30 100 B+ tree: Search At each level, find smallest K i larger than search key Follow associated pointer P i

Advanced DatabasesIndexing Techniques18 B+ tree: Insert Nodes may overflow or underflow Ignoring overflow or underflow Inserting data record with with search key value k –find leaf node –if k found add record to file, create indirect block if there isn’t one add record pointer to indirect block –if k not found add data record to file insert record pointer in leaf node (all search keys in order)

Advanced DatabasesIndexing Techniques19 B+ tree: Delete Ignoring overflow or underflow Find leaf node with search key value k Find data record pointer, delete record delete index record –and indirect block, if any, if empty

Advanced DatabasesIndexing Techniques20 B+ tree: Simple Insert Insert 42 100101110 120130 150156179 180200 3511 3035 12015018030 100 k < 100 42

Advanced DatabasesIndexing Techniques21 B+ tree: Leaf Overflow (1/2) Insert 9 100101110 120130 150156179 180200 3511 303542 12015018030 100 k < 100

Advanced DatabasesIndexing Techniques22 B+ tree: Leaf Overflow (2/2) first ceil(n/2) in existing node, rest in new leaf node n=3+1=4 100101110 120130 150156179 180200 120150180 930 100 k < 100 35303542911

Advanced DatabasesIndexing Techniques23 930 k < 100 35303542911 B+ tree: Internal Node Overflow (1/3) Insert 210, insert 205 100101110 120130 150156179 180200210 120150180 100

Advanced DatabasesIndexing Techniques24 B+ tree: Internal Node Overflow (2/3) Leaf Split 930 k < 100 35303542911 100101110 120130 150156179 180200 120150180 100 205210

Advanced DatabasesIndexing Techniques25 B+ tree: Internal Node Overflow (3/3) 930 k < 100 35303542911 100101110 120130 150156179 180200 120 100150 205210 180205

Advanced DatabasesIndexing Techniques26 B+ tree: New Root (1/2) Insert 210, insert 205 100101110 120130 150156179 180200 120150180 205210

Advanced DatabasesIndexing Techniques27 B+ tree: New Root (2/2) 180205 100101110 120130 150156179 180200 120 205210 150

Advanced DatabasesIndexing Techniques28 Index Insert Exercise Insert 8, 7, 41 930 35 3542911

Advanced DatabasesIndexing Techniques29 B+ tree: Delete Simple delete case Underflow case: –redistribute records –coalesce with siblings –update parents

Advanced DatabasesIndexing Techniques30 B+ tree: Simple Delete (1/2) Delete 110 180205 100101110 120130 150156179 180200 120 205210215 150

Advanced DatabasesIndexing Techniques31 B+ tree: Simple Delete (2/2) Leaf Updated 180205 100101120130 150156179 180200 120 205210215 150

Advanced DatabasesIndexing Techniques32 B+ tree: Delete Redistribution (1/2) Delete 180 180205 100101120130 150156179 180200 120 205210215 150

Advanced DatabasesIndexing Techniques33 B+ tree: Delete Redistribution (2/2) Redistribute entries –left or right sibling 179205 100101120130150156179200 120 205210 150

Advanced DatabasesIndexing Techniques34 B+ tree: Delete Coalesce (1/4) Delete 101 179205 100101120130150156179200 120 205210215 150

Advanced DatabasesIndexing Techniques35 B+ tree: Delete Coalesce (2/4) Leaf updated No redistribution –sibling coalesce 179205 100120130150156179200 120 205210215 150

Advanced DatabasesIndexing Techniques36 B+ tree: Delete Coalesce (3/4) Leaf updated No redistribution –sibling coalesce 179205 100120130150156179200 205210215 150

Advanced DatabasesIndexing Techniques37 B+ tree: Delete Coalesce (4/4) Redistribution 205 100120130150156179200 150 205210215 179

Hashing Techniques

Advanced DatabasesIndexing Techniques39 Static Hashing (1/2) Store records in buckets with overflow chains Allocate a fixed number of buckets M Problems: –small M long overflow chains, slow search-delete-insert null h

Advanced DatabasesIndexing Techniques40 Static Hashing (2/2) Problems: –large M wasted space, slow scan null h

Advanced DatabasesIndexing Techniques41 Dynamic Hashing Splitting and coalescing buckets as the database grows-shrinks One scheme: Extendible Hashing Hash function generates large values, eg 32 bits –use i bits, change i as database size changes If overflow, double the number of buckets –use i+1 bits of the hash function –but, expensive: read all pages M and distribute records in 2*M pages solution: use a directory and double the size of the directory –only split bucket that overflowed

Advanced DatabasesIndexing Techniques42 Extendible Hashing (1/4) h(18) = 10010 2 01 00 11 10 1620 2 1 2 2 Directory Buckets 37 2 A B C D 18

Advanced DatabasesIndexing Techniques43 Extendible Hashing (2/4) h(4) = 00100 2 01 00 11 10 1620 2 1 2 2 37 2 A B C D 18

Advanced DatabasesIndexing Techniques44 Extendible Hashing (3/4) 2 01 00 11 10 16 3 1 2 2 37 2 A B C D 18 204 3 A1

Advanced DatabasesIndexing Techniques45 Extendible Hashing (4/4) 3 001 000 011 010 16 3 1 2 2 37 2 A B C D 18 204 3 A1 101 100 111 110 Global Depth Local Depth If bucket full: –split bucket –increment LD If GD=LD –increment GD –double directory

Advanced DatabasesIndexing Techniques46 Extendible Hashing: Delete If deletion make bucket empty –merge with split image If directory pointers point to same bucket as split image –directory halved

Advanced DatabasesIndexing Techniques47 Extendible Hashing: Summary Avoids overflow pages Directory can get large Key search requires just 2 page reads Space utilization fluctuates –59-90% for uniformly distributed records

Advanced DatabasesIndexing Techniques48 Extendible Hashing: Exercise Initially GD = LD = 1 M = 2 buckets Hash function: h(k) = k mod 2 i inserts: 14, 18, 22, 3, 9 deletes 9, 22, 3 1 01 00 128 1 5 1

Indexing Techniques. Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index?

Similar presentations

Presentation on theme: "Indexing Techniques. Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Indexing Techniques. Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index?

Similar presentations

Presentation on theme: "Indexing Techniques. Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index?"— Presentation transcript:

Similar presentations

About project

Feedback