Download presentation
Presentation is loading. Please wait.
Published byAlban Wilcox Modified over 9 years ago
1
1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman
2
2 Physical Storage Media n Speed of data access n Cost per unit of data n Reliability Data loss (power failure or system crash) Physical failure (storage device) Storage types Volatile storage Non-volatile storage
3
3 Memory Hierarchy DBMS Programs, Main Memory DBMS Tertiary Storage Virtual Memory Disk File System Main Memory Cache
4
4 Disk Access Characteristics Move data to main memory: Position head on cylinder Find and access sector Steps of reading a block: Processor and disk controller processes the request Seek time: position the head Rotation latency: rotate the sector under the head Transfer time: sector/block read by the head
5
5 Disk Access Characteristics Steps of writing a block: Read the block into the main memory Change main memory copy of block Write new content back on disk Verify correctness of write
6
6 How to find records efficiently? Primary key – sequential organization Search key? High I/O cost INDEXING
7
Cost of Indexing Where the time spent on answering a query Fast: processing in memory Slow: fetching from secondary storage Cost of indexing: –Index on several attributes: fast retrieval but slow writes (maintain index structure) 7
8
8 Topics Conventional indexes B-trees Hashing schemes (read only)
9
9 Sequential File 20 10 40 30 60 50 80 70 100 90
10
10 Sequential File 20 10 40 30 60 50 80 70 100 90 Dense Index 10 20 30 40 50 60 70 80 90 100 110 120
11
11 Sequential File 20 10 40 30 60 50 80 70 100 90 Sparse Index 10 30 50 70 90 110 130 150 170 190 210 230
12
12 Sequential File 20 10 40 30 60 50 80 70 100 90 Sparse 2nd level 10 30 50 70 90 110 130 150 170 190 210 230 10 90 170 250 330 410 490 570
13
13 Sparse vs. Dense Tradeoff Sparse: Less index space per record can keep more of index in memory Dense: Can tell if any record exists without accessing file
14
14 Terms Index sequential file Search key ( primary key) Primary index (on Sequencing field) Secondary index Dense index (all Search Key values in) Sparse index Multi-level index
15
15 Next: Duplicate keys Deletion/Insertion Secondary indexes
16
16 Duplicate keys 10 20 10 30 20 30 45 40
17
17 10 20 10 30 20 30 45 40 10 20 30 10 20 10 30 20 30 45 40 10 20 30 Dense index, one way to implement? Duplicate keys
18
18 10 20 10 30 20 30 45 40 10 20 30 40 Dense index, better way? Duplicate keys
19
19 10 20 10 30 20 30 45 40 10 20 30 Sparse index, one way? Duplicate keys careful if looking for 20 or 30!
20
20 10 20 10 30 20 30 45 40 10 20 30 Sparse index, another way? Duplicate keys – place first new key from block should this be 40?
21
21 Duplicate values, primary index Index may point to first instance of each value only File Index Summary a a a b
22
22 Deletion from sparse index 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150
23
23 Deletion from sparse index 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150 – delete record 40
24
24 Deletion from sparse index 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150 – delete record 30 40
25
25 Deletion from sparse index 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150 – delete records 30 & 40 50 70
26
26 Deletion from dense index 20 10 40 30 60 50 80 70 10 20 30 40 50 60 70 80
27
27 Deletion from dense index 20 10 40 30 60 50 80 70 10 20 30 40 50 60 70 80 – delete record 30 40
28
28 Insertion, sparse index case 20 1030 50 4060 10 30 40 60
29
29 Insertion, sparse index case 20 1030 50 4060 10 30 40 60 – insert record 34 34 our lucky day! we have free space where we need it!
30
30 Insertion, sparse index case 20 1030 50 4060 10 30 40 60 – insert record 15 15 20 30 20 Illustrated: Immediate reorganization Variation: – insert new block (chained file) – update index
31
31 Insertion, sparse index case 20 1030 50 4060 10 30 40 60 – insert record 25 25 overflow blocks (reorganize later...)
32
32 Insertion, dense index case Similar Often more expensive...
33
33 Summary so far Conventional index –Basic Ideas: sparse, dense, multi-level… –Duplicate Keys –Deletion/Insertion –Secondary indexes
34
34 Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance
35
35 NEXT: Another type of index –Give up on sequentiality of index –Try to get “balance”
36
36 Root B+Tree Examplen=3 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200
37
37 Sample non-leaf to keysto keysto keys to keys < 5757 k<8181 k<95 95 57 81 95
38
38 Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85
39
39 Size of nodes:n+1 pointers n keys (fixed)
40
40 Don’t want nodes to be too empty Use at least Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data
41
41 Full nodemin. node Non-leaf Leaf n=3 120 150 180 30 3 5 11 30 35 counts even if null
42
42 B+tree rulestree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer”
43
43 (3) Number of pointers/keys for B+tree Non-leaf (non-root) n+1n (n+1)/ 2 (n+1)/ 2 - 1 Leaf (non-root) n+1n Rootn+1n11 Max Max Min Min ptrs keys ptrs data keys (n+ 1) / 2
44
44 Insert into B+tree (read only) (a) simple case –space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root
45
45 (a) Insert key = 32 n=3 3 5 11 30 31 30 100 32
46
46 (a) Insert key = 7 n=3 3 5 11 30 31 30 100 3535 7 7
47
47 (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Deletion from B+tree
48
48 (b) Coalesce with sibling –Delete 50 10 40 100 10 20 30 40 50 n=4 40
49
49 (c) Redistribute keys –Delete 50 10 40 100 10 20 30 35 40 50 n=4 35
50
50 B+tree deletions in practice –Often, coalescing is not implemented –Too hard and not worth it!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.