Presentation is loading. Please wait.

Presentation is loading. Please wait.

Index Structures Chapter 13 of GUW September 16, 2019

Similar presentations


Presentation on theme: "Index Structures Chapter 13 of GUW September 16, 2019"— Presentation transcript:

1 Index Structures Chapter 13 of GUW September 16, 2019
ICS 541: Index Structures

2 Objectives Different ways of organizing blocks
What is the best way to organize record in blocks to minimize: Query cost Exact match Partial match Range Join Insertion cost Deletion cost Update cost Storage cost September 16, 2019 ICS 541: Index Structures

3 - Lecture outline Basic Concepts Index on Sequential files
Secondary Indexes B-Trees Hash Tables September 16, 2019 ICS 541: Index Structures

4 - Basic Concepts … Disk Index blocks Data blocks Empty blocks
September 16, 2019 ICS 541: Index Structures

5 … - Basic Concepts … Index blocks result Query Data blocks
September 16, 2019 ICS 541: Index Structures

6 … - Basic Concepts Block pointer Record pointer Record pointers take more space than block pointers. Why? September 16, 2019 ICS 541: Index Structures

7 - Indexes Sequential Files
Dense Index Sparse Index Multiple level of Index Index with Duplicate Search Keys Managing Indexes During Data Modification September 16, 2019 ICS 541: Index Structures

8 -- Sequential Files Sequential File 10 20 30 40 50 60 70 80 90 100
September 16, 2019 ICS 541: Index Structures

9 -- Dense Index Sequential File Dense Index 10 20 30 40 50 60 70 80 90
Search key 50 60 70 80 60 50 80 70 90 100 110 120 100 90 September 16, 2019 ICS 541: Index Structures

10 -- Sparse Index Sequential File Sparse Index 10 20 30 40 50 60 70 80
90 110 130 150 60 50 80 70 170 190 210 230 100 90 September 16, 2019 ICS 541: Index Structures

11 -- Dense Vs Sparse Index: Example
Relation Relation R with 1,000,000 tuples A block of size 4096 bytes (4k) 10 R tuples per block Data space required => /10 * 4k = 400MB. Dense index record size: 30 Bytes for search key + 8 bytes for record pointer Can fit 100 index records per block Dense index space = /100 * 4k = 40MB. Binary search cost log2(10000) = 14 disk accesses at most Keeping blocks (1/2, ¼, ¾, 1/8, …) in memory can lower disk access. Sparse index 1000 index blocks = 4MB Binary search cost log2(1000) = 10 disk accesses at most September 16, 2019 ICS 541: Index Structures

12 -- Multiple level of Index
Sequential File Sparse 2nd level 20 10 10 90 170 250 10 30 50 70 40 30 90 110 130 150 330 410 490 570 60 50 80 70 170 190 210 230 100 90 September 16, 2019 ICS 541: Index Structures

13 -- Contiguous sequential file
R1 K1 say: 1024 B per block if we want K3 block: get it at offset (3-1)1024 = 2048 bytes R2 K2 K3 R3 K4 R4 September 16, 2019 ICS 541: Index Structures

14 -- Sparse vs. Dense Tradeoff
Less index space per record can keep more of index in memory Dense Can tell if any record exists without accessing file Must be used for secondary index September 16, 2019 ICS 541: Index Structures

15 --Terms Index sequential file Search key (  primary key)
Primary index (on Sequencing field) Secondary index Dense index (all Search Key values in) Sparse index Multi-level index September 16, 2019 ICS 541: Index Structures

16 -- Duplicate keys … 10 10 20 20 30 30 40 45 September 16, 2019
ICS 541: Index Structures

17 Dense index, one way to implement?
-- Duplicate keys … Dense index, one way to implement? 10 10 10 10 10 10 10 10 20 10 20 10 20 20 30 20 30 20 20 20 30 30 30 30 30 30 30 30 45 40 45 40 September 16, 2019 ICS 541: Index Structures

18 … -- Duplicate keys … 10 10 20 20 30 30 40 45 Dense index, better way?
September 16, 2019 ICS 541: Index Structures

19 careful if looking for 20 or 30! … -- Duplicate keys …. 10 10 20 20 30
Sparse index, one way? careful if looking for 20 or 30! 10 10 10 20 20 10 30 30 20 30 45 40 September 16, 2019 ICS 541: Index Structures

20 place first new key from block
… -- Duplicate keys … place first new key from block 10 should this be 40? 10 20 30 20 10 30 30 20 30 45 40 Sparse index, another way? September 16, 2019 ICS 541: Index Structures

21 a a a b … -- Duplicate keys
Incase of primary index may point to first instance of each value only File Index a a a . b September 16, 2019 ICS 541: Index Structures

22 -- Managing Indexes During Data Modification
Deletion from Sparse Index Insertion into Sparse Index September 16, 2019 ICS 541: Index Structures

23 --- Deletion from sparse index …
20 10 10 30 50 40 30 70 60 50 90 110 130 80 70 150 September 16, 2019 ICS 541: Index Structures

24 … --- Deletion from sparse index …
delete record 40 20 10 10 30 50 40 30 70 60 50 90 110 130 80 70 150 September 16, 2019 ICS 541: Index Structures

25 … --- Deletion from sparse index …
delete record 30 20 10 10 40 30 50 40 30 70 60 50 90 110 130 80 70 150 September 16, 2019 ICS 541: Index Structures

26 … ---- Deletion from sparse index …
delete records 30 & 40 20 10 10 50 70 30 50 40 30 70 60 50 90 110 130 80 70 150 September 16, 2019 ICS 541: Index Structures

27 … --- Deletion from dense index …
20 10 10 20 30 40 30 40 60 50 50 60 70 80 70 80 September 16, 2019 ICS 541: Index Structures

28 … --- Deletion from dense index
delete record 30 20 10 10 20 40 40 30 30 40 40 60 50 50 60 70 80 70 80 September 16, 2019 ICS 541: Index Structures

29 --- Insertion, sparse index case …
20 10 10 30 40 30 60 50 40 60 September 16, 2019 ICS 541: Index Structures

30 … --- Insertion, sparse index case …
insert record 34 20 10 10 30 40 30 34 our lucky day! we have free space where we need it! 60 50 40 60 September 16, 2019 ICS 541: Index Structures

31 … --- Insertion, sparse index case …
insert record 15 20 10 15 20 30 10 30 40 30 60 50 40 Illustrated: Immediate reorganization Variation: insert new block (chained file) update index 60 September 16, 2019 ICS 541: Index Structures

32 … --- Insertion, sparse index case
insert record 25 20 10 25 overflow blocks (reorganize later...) 10 30 40 30 60 50 40 60 September 16, 2019 ICS 541: Index Structures

33 Similar Often more expensive . . . --- Insertion, dense index case
September 16, 2019 ICS 541: Index Structures

34 - Secondary Indexes Design of Secondary Indexes
Duplicate Values and Secondary Indexes Applications of Secondary Indexes Indirection in Secondary Indexes September 16, 2019 ICS 541: Index Structures

35 -- Design of Secondary Indexes …
Sequence field 50 30 70 20 40 80 10 100 60 90 September 16, 2019 ICS 541: Index Structures

36 does not make sense! … -- Design of Secondary Indexes … 30 50 20 70 80
Sequence field Sparse index does not make sense! 50 30 30 20 80 100 70 20 90 ... 40 80 10 100 60 90 September 16, 2019 ICS 541: Index Structures

37 … -- Design of Secondary Indexes …
Sequence field Dense index 10 20 30 40 50 60 70 ... 50 30 10 50 90 ... sparse high level 70 20 40 80 10 100 60 90 September 16, 2019 ICS 541: Index Structures

38 … -- Design of Secondary Indexes
Lowest level is dense Record pointers Other levels are sparse Block pointers September 16, 2019 ICS 541: Index Structures

39 -- Duplicate Values and Secondary Indexes …
10 20 40 20 40 10 40 10 40 30 September 16, 2019 ICS 541: Index Structures

40 -- Duplicate Values and Secondary Indexes …
one option... 10 20 10 20 40 20 Problem: excess overhead! disk space search time 20 30 40 40 10 40 10 40 ... 40 30 September 16, 2019 ICS 541: Index Structures

41 -- Duplicate Values and Secondary Indexes …
another option... 10 20 10 40 20 20 Problem: variable size records in index! 40 10 30 40 40 10 40 30 September 16, 2019 ICS 541: Index Structures

42 -- Duplicate Values and Secondary Indexes …
10 20 10 20 30 40 40 20 50 60 ... 40 10 40 10 40 30 buckets September 16, 2019 ICS 541: Index Structures

43 ---Why “bucket” idea is useful …
Indexes Records Name: primary EMP (name,dept,floor,...) Dept: secondary Floor: secondary September 16, 2019 ICS 541: Index Structures

44 Query: Get employees in (Toy Dept) ^ (2nd floor)
… ---Why “bucket” idea is useful … Query: Get employees in (Toy Dept) ^ (2nd floor) Dept. index EMP Floor index Toy 2nd Intersect toy bucket and 2nd Floor bucket to get set of matching EMP’s Used for text retrieval September 16, 2019 ICS 541: Index Structures

45 This idea used in text information retrieval.
… ---Why “bucket” idea is useful This idea used in text information retrieval. Documents Inverted lists cat dog ...the cat is fat ... ...was raining cats and dogs... ...Fido the dog ... September 16, 2019 ICS 541: Index Structures

46 -- Conventional indexes
Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance September 16, 2019 ICS 541: Index Structures

47 --- Example Index (sequential) 10 39 31 35 36 32 38 34 33 20 30 40 50
continuous free space 10 39 31 35 36 32 38 34 33 overflow area (not sequential) 20 30 40 50 60 70 80 90 September 16, 2019 ICS 541: Index Structures

48 -- Next Another type of index Btree Has Schemes
Give up on sequentiality of index Try to get “balance” Btree Has Schemes September 16, 2019 ICS 541: Index Structures

49 B+Tree Example n=3 Root 100 120 150 180 30 3 5 11 120 130 180 200 30 35 100 101 110 150 156 179 September 16, 2019 ICS 541: Index Structures

50 -- Sample non-leaf 57 81 95 To keys 57> and <=81 To keys
>95 To keys <57 September 16, 2019 ICS 541: Index Structures

51 -- Sample leaf node: 57 81 95 with key 57 with key 81 To record
From non-leaf node to next leaf in sequence 57 81 95 with key 57 with key 81 To record with key 85 September 16, 2019 ICS 541: Index Structures

52 Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data
-- Size of Nodes n keys n + 1 Pointers Use at least Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data September 16, 2019 ICS 541: Index Structures

53 --- Example: n = 3 Full node min. node Non-leaf Leaf 120 150 180 30 3
11 30 35 counts even if null September 16, 2019 ICS 541: Index Structures

54 -- B+tree rules tree of order n
All leaves at same lowest level (balanced tree) Pointers in leaves point to records except for “sequence pointer” September 16, 2019 ICS 541: Index Structures

55 -- Insert into B+tree (a) simple case (b) leaf overflow
space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root September 16, 2019 ICS 541: Index Structures

56 --- Insert key = 32 n=3 100 30 3 5 11 30 31 32 September 16, 2019
ICS 541: Index Structures

57 n=3 --- Insert key = 7 100 30 7 3 5 11 30 31 3 5 7 September 16, 2019
ICS 541: Index Structures

58 n=3 --- Insert key = 160 100 160 120 150 180 180 150 156 179 180 200 160 179 September 16, 2019 ICS 541: Index Structures

59 --- New root, insert 45 n=3 30 new root 10 20 30 40 1 2 3 10 12 20 25
32 40 40 45 September 16, 2019 ICS 541: Index Structures

60 -- Deletion from B+tree
Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf September 16, 2019 ICS 541: Index Structures

61 n=4 --- Coalesce with sibling: Delete 50 10 40 100 40 10 20 30 40 50
September 16, 2019 ICS 541: Index Structures

62 n=4 --- Redistribute key: Delete 50 10 40 100 35 10 20 30 35 40 50
September 16, 2019 ICS 541: Index Structures

63 n=4 --- Non-leaf coalesce: Delete 37 new root 25 25 10 20 30 40 40 30
26 1 3 10 14 20 22 30 37 40 45 September 16, 2019 ICS 541: Index Structures

64 --- B+tree deletions in practice
Often, coalescing is not implemented Too hard and not worth it! September 16, 2019 ICS 541: Index Structures

65 - Hashing key  h(key) <key> Buckets (typically 1 disk block) .
September 16, 2019 ICS 541: Index Structures

66 . . (1) key  h(key) -- Two alternatives … records September 16, 2019
ICS 541: Index Structures

67 (2) key  h(key) … -- Two alternatives record Index
Alt (2) for “secondary” search key September 16, 2019 ICS 541: Index Structures

68 -- Example hash function …
Key = ‘x1 x2 … xn’ n byte character string Have b buckets h: add x1 + x2 + ….. Xn compute sum modulo b September 16, 2019 ICS 541: Index Structures

69 … -- Example hash function
This may not be best function … Read Knuth Vol. 3 if you really need to select a good function. Good hash  Expected number of function: keys/bucket is the same for all buckets September 16, 2019 ICS 541: Index Structures

70 -- Within a bucket Do we keep keys sorted? Yes
if CPU time critical, and Inserts/Deletes not too frequent September 16, 2019 ICS 541: Index Structures

71 -- Example to illustrate inserts, overflows, deletes
h(K) September 16, 2019 ICS 541: Index Structures

72 … -- Example to illustrate insert …
1 2 3 INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 d a c b e h(e) = 1 September 16, 2019 ICS 541: Index Structures

73 Delete: e f c … -- Example to illustrate delete … a b d c d e f g 1 2
1 2 3 a d b d c c e maybe move “g” up f g September 16, 2019 ICS 541: Index Structures

74 --- Rule of thumb Try to keep space utilization between 50% and 80%
Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket September 16, 2019 ICS 541: Index Structures

75 -- How do we cope with growth?
Static hashing Overflows and reorganizations Very expensive Solution Dynamic hashing Extensible Linear September 16, 2019 ICS 541: Index Structures

76 -- Extensible hashing: two ideas ..
(a) Use i of b bits output by hash function b h(K)  use i  grows over time…. September 16, 2019 ICS 541: Index Structures

77 … -- Extensible hashing: two ideas
(b) Use directory h(K)[i ] to bucket . . September 16, 2019 ICS 541: Index Structures

78 --- Example: h(k) is 4 bits; 2 keys/bucket …
New directory 2 00 01 10 11 i = 1 i = 0001 1 1 1001 1 1100 1010 1100 Insert 1010 September 16, 2019 ICS 541: Index Structures

79 … --- Example: h(k) is 4 bits; 2 keys/bucket …
0000 0111 0001 i = 2 00 01 10 11 1 0001 0111 2 1001 1010 Insert: 0111 0000 2 1100 September 16, 2019 ICS 541: Index Structures

80 … --- Example: h(k) is 4 bits; 2 keys/bucket
000 001 010 011 100 101 110 111 3 i = 0000 2 i = 0001 2 00 01 10 11 0111 2 1001 1010 2 1001 1010 Insert: 1001 2 1100 September 16, 2019 ICS 541: Index Structures

81 --- Extensible hashing: deletion
Two option: No merging of blocks Merge blocks and cut directory if possible September 16, 2019 ICS 541: Index Structures

82 ---- Deletion example Run thru insert example in reverse!
September 16, 2019 ICS 541: Index Structures

83 -- Summary: Extensible hashing
Can handle growing files - No full reorganizations + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - September 16, 2019 ICS 541: Index Structures

84 -- Summary: Extensible hashing
Advantage No reorganization is needed One disk access per record Disadvantage Doubling bucket array is expensive The size of the bucket array may no longer fit into memory The number of bucket may be much bigger than the blocks Example: Splitting records can only be done in higher bits. September 16, 2019 ICS 541: Index Structures

85 -- Linear hashing 01110101 Another dynamic hashing scheme Two ideas:
(a) Use i low order bits of hash grows b i (b) File grows linearly September 16, 2019 ICS 541: Index Structures

86 Example b=4 bits, i =2, 2 keys/bucket
0101 can have overflow chains! insert 0101 Future growth buckets 0000 0101 1010 1111 m = 01 (max used block) If h(k)[i ]  m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2i -1 Rule September 16, 2019 ICS 541: Index Structures

87 Example b=4 bits, i =2, 2 keys/bucket
0101 insert 0101 1111 0101 Future growth buckets 11 0000 1010 0101 10 1010 1111 m = 01 (max used block) September 16, 2019 ICS 541: Index Structures

88 Example Continued: How to grow beyond this?
3 i = 2 0000 100 0101 101 0101 1010 1111 0101 . . . m = 11 (max used block) September 16, 2019 ICS 541: Index Structures

89 --- When do we expand file?
Keep track of: # used slots total # of slots = U If U > threshold then increase m (and maybe i ) September 16, 2019 ICS 541: Index Structures

90 -- Summary: Linear Hashing
Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing + + Can still have overflow chains - September 16, 2019 ICS 541: Index Structures

91 -- Hashing Summary Hashing - How it works - Dynamic hashing
- Extensible - Linear September 16, 2019 ICS 541: Index Structures

92 -- Indexing Vs Hashing …
Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 September 16, 2019 ICS 541: Index Structures

93 … -- Indexing vs Hashing
INDEXING (Including B Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 September 16, 2019 ICS 541: Index Structures

94 - Reading list Chapter 13 of GUW B-tree (WebCt) September 16, 2019
ICS 541: Index Structures

95 END September 16, 2019 ICS 541: Index Structures


Download ppt "Index Structures Chapter 13 of GUW September 16, 2019"

Similar presentations


Ads by Google