Download presentation
Presentation is loading. Please wait.
Published bySurya Kusuma Modified over 6 years ago
1
Yan Huang - CSCI5330 Database Implementation – Access Methods
This is a modified version of Prof. Hector Garcia Molina’s slides. All copy rights belong to the original author. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
2
Yan Huang - CSCI5330 Database Implementation – Access Methods
Basic Concepts Value Search Key - set of attributes used to look up records in a file. search key pointer record ? value 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
3
Index Evaluation Metrics
Access types supported efficiently. E.g., Point query: find “Tom” Range query: find students whose age is between 20-40 Access time Update time Space overhead 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
4
Yan Huang - CSCI5330 Database Implementation – Access Methods
Ordered Indices In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
5
Yan Huang - CSCI5330 Database Implementation – Access Methods
same order Search key 20 10 Primary index Also called clustering index The search key of a primary index is usually but not necessarily the primary key. 10 30 50 70 40 30 90 110 130 150 60 50 80 70 170 190 210 230 100 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
6
Yan Huang - CSCI5330 Database Implementation – Access Methods
different order Search key Secondary index: non-clustering index. 10 20 30 40 50 60 70 ... 50 30 70 20 40 80 10 100 60 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
7
Yan Huang - CSCI5330 Database Implementation – Access Methods
Dense Index Sequential File 20 10 10 20 30 40 Dense Index: contains index records for every search-key values. 40 30 50 60 70 80 60 50 80 70 90 100 110 120 100 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
8
Yan Huang - CSCI5330 Database Implementation – Access Methods
Sparse Index Sequential File 20 10 10 30 50 70 Sparse Index: contains index records for only some search-key values. Applicable when records are sequentially ordered on search-key 40 30 90 110 130 150 60 50 80 70 170 190 210 230 100 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
9
Yan Huang - CSCI5330 Database Implementation – Access Methods
Secondary indexes Sequence field does not make sense! 50 30 30 20 80 100 70 20 Sparse index 90 ... 40 80 10 100 60 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
10
Yan Huang - CSCI5330 Database Implementation – Access Methods
Multilevel Index Sparse 2nd level Sequential File 20 10 10 90 170 250 10 30 50 70 40 30 90 110 130 150 330 410 490 570 60 50 80 70 170 190 210 230 100 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
11
Yan Huang - CSCI5330 Database Implementation – Access Methods
Multilevel Index Secondary indexes Sequence field 10 20 30 40 50 60 70 ... 50 30 10 50 90 ... sparse high level 70 20 40 80 10 100 60 90 Lowest level is dense Other levels are sparse 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
12
Yan Huang - CSCI5330 Database Implementation – Access Methods
Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
13
Yan Huang - CSCI5330 Database Implementation – Access Methods
Outline: Conventional indexes B+-Tree NEXT 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
14
Yan Huang - CSCI5330 Database Implementation – Access Methods
NEXT: Another type of index Give up on sequentiality of index Try to get “balance” 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
15
Yan Huang - CSCI5330 Database Implementation – Access Methods
B+Tree Example n=4 Root 100 120 150 180 30 3 5 11 120 130 180 200 30 35 100 101 110 150 156 179 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
16
Sample non-leaf 57 81 95 to keys to keys to keys to keys
< k<81 81k<95 95 Key is moved (not copied) from lower level non-leaf node to upper level non-leaf node 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
17
Yan Huang - CSCI5330 Database Implementation – Access Methods
Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 with key 57 with key 81 To record with key 85 Key is copied (not moved) from leaf node to non-leaf node 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
18
Yan Huang - CSCI5330 Database Implementation – Access Methods
35 Leaf: Non-leaf: 30 35 30 30 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
19
Yan Huang - CSCI5330 Database Implementation – Access Methods
Size of nodes: n pointers n-1 keys 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
20
Don’t want nodes to be too empty
Use at least Root : 2 pointers Non-leaf: n/2 pointers Leaf : (n-1)/2 keys 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
21
Yan Huang - CSCI5330 Database Implementation – Access Methods
Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 counts even if null 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
22
B+tree rules tree of order n
(1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer” 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
23
Yan Huang - CSCI5330 Database Implementation – Access Methods
(3) Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n n-1 n/2 n/2- 1 Leaf (non-root) n n-1 (n-1)/2 (n-1)/2 Root n n-1 2 1 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
24
Yan Huang - CSCI5330 Database Implementation – Access Methods
Insert into B+tree (a) simple case space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
25
Yan Huang - CSCI5330 Database Implementation – Access Methods
(a) Insert key = 32 n=4 100 30 3 5 11 30 31 32 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
26
Yan Huang - CSCI5330 Database Implementation – Access Methods
(b) Insert key = 7 n=4 100 30 7 3 5 11 30 31 3 5 7 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
27
Yan Huang - CSCI5330 Database Implementation – Access Methods
(c) Insert key = 160 n=4 100 160 120 150 180 180 150 156 179 180 200 160 179 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
28
Yan Huang - CSCI5330 Database Implementation – Access Methods
(d) New root, insert 45 n=4 30 new root 10 20 30 40 1 2 3 10 12 20 25 30 32 40 40 45 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
29
Yan Huang - CSCI5330 Database Implementation – Access Methods
Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
30
Yan Huang - CSCI5330 Database Implementation – Access Methods
(b) Coalesce with sibling Delete 50 n=5 10 40 100 40 10 20 30 40 50 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
31
Yan Huang - CSCI5330 Database Implementation – Access Methods
(c) Redistribute keys Delete 50 n=5 10 40 100 35 10 20 30 35 40 50 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
32
Yan Huang - CSCI5330 Database Implementation – Access Methods
(d) Non-leaf coalesce Delete 37 n=5 25 25 new root 10 20 30 40 40 30 25 26 1 3 10 14 20 22 30 37 40 45 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
33
B+tree deletions in practice
Often, coalescing is not implemented Too hard and not worth it! 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
34
Index Definition in SQL
Create an index create index <index-name> on <relation-name> (<attribute-list>) E.g.: create index gindex on country(gdp); To drop an index drop index <index-name> E.g.: drop index gindex; 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
35
Yan Huang - CSCI5330 Database Implementation – Access Methods
Multi-key Index Motivation: Find records where DEPT = “Toy” AND SAL > 50k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
36
Yan Huang - CSCI5330 Database Implementation – Access Methods
Strategy I: Use one index, say Dept. Get all Dept = “Toy” records and check their salary I1 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
37
Yan Huang - CSCI5330 Database Implementation – Access Methods
Strategy II: Use 2 Indexes; Manipulate Pointers Toy Sal > 50k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
38
Yan Huang - CSCI5330 Database Implementation – Access Methods
Strategy III: Multiple Key Index One idea: I2 I3 I1 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
39
Yan Huang - CSCI5330 Database Implementation – Access Methods
Example Example Record Dept Index Salary 10k 15k Art Sales Toy 17k 21k Name=Joe DEPT=Sales SAL=15k 12k 15k 15k 19k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
40
For which queries is this index good?
Find RECs Dept = “Sales” SAL=20k Find RECs Dept = “Sales” SAL > 20k Find RECs Dept = “Sales” Find RECs SAL = 20k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
41
Interesting application:
Geographic Data DATA: <X1,Y1, Attributes> <X2,Y2, Attributes> y x . . . 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
42
Yan Huang - CSCI5330 Database Implementation – Access Methods
Queries: What city is at <Xi,Yi>? What is within 5 miles from <Xi,Yi>? Which is closest point to <Xi,Yi>? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
43
Yan Huang - CSCI5330 Database Implementation – Access Methods
Example a 25 15 35 20 40 30 10 i d e h Search points near f Search points near b b n f 5 15 l o c j g m k h i a b c d e f g n o m l j k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
44
Yan Huang - CSCI5330 Database Implementation – Access Methods
Queries Find points with Yi > 20 Find points with Xi < 5 Find points “close” to i = <12,38> Find points “close” to b = <7,24> 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
45
Yan Huang - CSCI5330 Database Implementation – Access Methods
Many types of geographic index structures have been suggested Quad Trees R Trees 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
46
Two more types of multi key indexes
Grid Bitmap index 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
47
Yan Huang - CSCI5330 Database Implementation – Access Methods
Grid Index Key 2 X1 X2 …… Xn V1 V2 Key 1 Vn To records with key1=V3, key2=X2 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
48
Yan Huang - CSCI5330 Database Implementation – Access Methods
CLAIM Can quickly find records with key 1 = Vi Key 2 = Xj key 1 = Vi key 2 = Xj And also ranges…. E.g., key 1 Vi key 2 < Xj 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
49
Yan Huang - CSCI5330 Database Implementation – Access Methods
But there is a catch with Grid Indexes! How is Grid Index stored on disk? Like Array... X1 X2 X3 X4 V1 V2 V3 Problem: Need regularity so we can compute position of <Vi,Xj> entry 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
50
Solution: Use Indirection
Buckets V1 V2 V *Grid only V contains pointers to buckets X1 X2 X3 -- -- -- -- -- 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
51
Yan Huang - CSCI5330 Database Implementation – Access Methods
With indirection: Grid can be regular without wasting space We do have price of indirection 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
52
Can also index grid on value ranges
Salary Grid 0-20K 1 20K-50K 2 50K- 8 3 Linear Scale 1 2 3 Toy Sales Personnel 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
53
Yan Huang - CSCI5330 Database Implementation – Access Methods
Grid files Good for multiple-key search Space, management overhead (nothing is free) Need partitioning ranges that evenly split keys + - - 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
54
Example Grid File for account
Divide branch-name into non-uniform intervals ? Branch-name <Central and 10k<=balance<50k two attributes as search key Divide balance into non-uniform intervals What about Central<=branch-name<Townsend and 50k<=balance? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
55
Example Grid File for account
Bj Bk 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
56
Yan Huang - CSCI5330 Database Implementation – Access Methods
Grid Files (Cont.) Linear scales must be chosen to uniformly distribute records across cells. Otherwise there will be too many overflow buckets. Periodic re-organization to increase grid size will help. But reorganization can be very expensive. Space overhead of grid array can be high. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
57
Yan Huang - CSCI5330 Database Implementation – Access Methods
Bitmap Indices Another index could be used for multiple valued search keys 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
58
Bitmap Indices (Cont.) The income-level value of record 3 is L1
Bitmap(size = table size) Unique values of gender Unique values of income-level 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
59
Yan Huang - CSCI5330 Database Implementation – Access Methods
Bitmap Indices (Cont.) Some properties of bitmap indices Number of bitmaps for each attribute? Size of each bitmap? When is the bitmap matrix sparse and what attributes are good for bitmap indices? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
60
Yan Huang - CSCI5330 Database Implementation – Access Methods
Bitmap Indices (Cont.) Bitmap indices generally very small compared with relation size E.g. if record is 100 bytes, space for a single bitmap is 1/800 of space used by relation. If number of distinct attribute values is 8, bitmap is only 1% of relation size What about insertion? Deletion? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
61
Bitmap Indices Queries
Sample query: Males with income level L1 10010 AND = 10000 even faster! What about the number of males with income level L1? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
62
Bitmap Indices Queries
Queries are answered using bitmap operations Intersection (and) Union (or) Complementation (not) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
63
Yan Huang - CSCI5330 Database Implementation – Access Methods
Hashing key h(key) <key> Buckets (typically 1 disk block) . 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
64
Yan Huang - CSCI5330 Database Implementation – Access Methods
Two alternatives . records (1) key h(key) . 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
65
Yan Huang - CSCI5330 Database Implementation – Access Methods
Two alternatives record (2) key h(key) key 1 Index Alt (2) for “secondary” search key 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
66
Yan Huang - CSCI5330 Database Implementation – Access Methods
Example hash function Key = ‘x1 x2 … xn’ n byte character string Have b buckets h: add x1 + x2 + ….. xn compute sum modulo b 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
67
Yan Huang - CSCI5330 Database Implementation – Access Methods
This may not be best function … Good hash Expected number of function: keys/bucket is the same for all buckets 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
68
Yan Huang - CSCI5330 Database Implementation – Access Methods
Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & Inserts/Deletes not too frequent 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
69
Next: example to illustrate inserts, overflows, deletes
h(K) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
70
EXAMPLE 2 records/bucket
INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 1 2 3 d a c b e h(e) = 1 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
71
Yan Huang - CSCI5330 Database Implementation – Access Methods
EXAMPLE: deletion Delete: e f 1 2 3 a d b d c c e maybe move “g” up f g 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
72
Yan Huang - CSCI5330 Database Implementation – Access Methods
Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
73
How do we cope with growth?
Overflows and reorganizations Dynamic hashing Extensible Linear 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
74
Extensible hashing: two ideas
(a) Use i of b bits output by hash function b h(K) use i grows over time…. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
75
Yan Huang - CSCI5330 Database Implementation – Access Methods
(b) Use directory h(K)[i ] to bucket . . 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
76
Example: h(k) is 4 bits; 2 keys/bucket
New directory 2 00 01 10 11 i = 1 i = 0001 1 1 1001 1 1100 1010 1100 Insert 1010 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
77
Yan Huang - CSCI5330 Database Implementation – Access Methods
Example continued 2 0000 0111 0001 i = 2 00 01 10 11 1 0001 0111 2 1001 1010 Insert: 0111 0000 2 1100 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
78
Yan Huang - CSCI5330 Database Implementation – Access Methods
Example continued 000 001 010 011 100 101 110 111 3 i = 0000 2 i = 0001 2 00 01 10 11 0111 2 1001 1010 2 1001 1010 Insert: 1001 2 1100 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
79
Extensible hashing: deletion
No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
80
Yan Huang - CSCI5330 Database Implementation – Access Methods
Deletion example: Run thru insert example in reverse! 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
81
Yan Huang - CSCI5330 Database Implementation – Access Methods
Extensible hashing Summary Can handle growing files - with less wasted space - with no full reorganizations + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
82
Yan Huang - CSCI5330 Database Implementation – Access Methods
Linear hashing Another dynamic hashing scheme Two ideas: (a) Use i low order bits of hash grows b i (b) File grows linearly 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
83
Example b=4 bits, i =2, 2 keys/bucket
0101 can have overflow chains! insert 0101 Future growth buckets 0000 0101 1010 1111 m = 01 (max used block) If h(k)[i ] m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2i -1 Rule 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
84
Example b=4 bits, i =2, 2 keys/bucket
0101 insert 0101 1111 0101 Future growth buckets 11 0000 1010 0101 10 1010 1111 m = 01 (max used block) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
85
Example Continued: How to grow beyond this?
3 i = 2 0000 100 0101 101 0101 1010 1111 0101 . . . m = 11 (max used block) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
86
Yan Huang - CSCI5330 Database Implementation – Access Methods
When do we expand file? Keep track of: # used slots total # of slots = U If U > threshold then increase m (and maybe i ) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
87
Yan Huang - CSCI5330 Database Implementation – Access Methods
Linear Hashing Summary Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing + + Can still have overflow chains - 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
88
Yan Huang - CSCI5330 Database Implementation – Access Methods
Example: BAD CASE Very full Very empty Need to move m here… Would waste space... 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
89
Yan Huang - CSCI5330 Database Implementation – Access Methods
Summary Hashing - How it works - Dynamic hashing - Extensible - Linear 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
90
Yan Huang - CSCI5330 Database Implementation – Access Methods
Indexing vs Hashing Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
91
Yan Huang - CSCI5330 Database Implementation – Access Methods
Indexing vs Hashing INDEXING good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.