1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman
2 Physical Storage Media n Speed of data access n Cost per unit of data n Reliability Data loss (power failure or system crash) Physical failure (storage device) Storage types Volatile storage Non-volatile storage
3 Memory Hierarchy DBMS Programs, Main Memory DBMS Tertiary Storage Virtual Memory Disk File System Main Memory Cache
4 Disk Access Characteristics Move data to main memory: Position head on cylinder Find and access sector Steps of reading a block: Processor and disk controller processes the request Seek time: position the head Rotation latency: rotate the sector under the head Transfer time: sector/block read by the head
5 Disk Access Characteristics Steps of writing a block: Read the block into the main memory Change main memory copy of block Write new content back on disk Verify correctness of write
6 How to find records efficiently? Primary key – sequential organization Search key? High I/O cost INDEXING
Cost of Indexing Where the time spent on answering a query Fast: processing in memory Slow: fetching from secondary storage Cost of indexing: –Index on several attributes: fast retrieval but slow writes (maintain index structure) 7
8 Topics Conventional indexes B-trees Hashing schemes (read only)
9 Sequential File
10 Sequential File Dense Index
11 Sequential File Sparse Index
12 Sequential File Sparse 2nd level
13 Sparse vs. Dense Tradeoff Sparse: Less index space per record can keep more of index in memory Dense: Can tell if any record exists without accessing file
14 Terms Index sequential file Search key ( primary key) Primary index (on Sequencing field) Secondary index Dense index (all Search Key values in) Sparse index Multi-level index
15 Next: Duplicate keys Deletion/Insertion Secondary indexes
16 Duplicate keys
Dense index, one way to implement? Duplicate keys
Dense index, better way? Duplicate keys
Sparse index, one way? Duplicate keys careful if looking for 20 or 30!
Sparse index, another way? Duplicate keys – place first new key from block should this be 40?
21 Duplicate values, primary index Index may point to first instance of each value only File Index Summary a a a b
22 Deletion from sparse index
23 Deletion from sparse index – delete record 40
24 Deletion from sparse index – delete record 30 40
25 Deletion from sparse index – delete records 30 &
26 Deletion from dense index
27 Deletion from dense index – delete record 30 40
28 Insertion, sparse index case
29 Insertion, sparse index case – insert record our lucky day! we have free space where we need it!
30 Insertion, sparse index case – insert record Illustrated: Immediate reorganization Variation: – insert new block (chained file) – update index
31 Insertion, sparse index case – insert record overflow blocks (reorganize later...)
32 Insertion, dense index case Similar Often more expensive...
33 Summary so far Conventional index –Basic Ideas: sparse, dense, multi-level… –Duplicate Keys –Deletion/Insertion –Secondary indexes
34 Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance
35 NEXT: Another type of index –Give up on sequentiality of index –Try to get “balance”
36 Root B+Tree Examplen=
37 Sample non-leaf to keysto keysto keys to keys < 5757 k<8181 k<95
38 Sample leaf node: From non-leaf node to next leaf in sequence To record with key 57 To record with key 81 To record with key 85
39 Size of nodes:n+1 pointers n keys (fixed)
40 Don’t want nodes to be too empty Use at least Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data
41 Full nodemin. node Non-leaf Leaf n= counts even if null
42 B+tree rulestree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer”
43 (3) Number of pointers/keys for B+tree Non-leaf (non-root) n+1n (n+1)/ 2 (n+1)/ 2 - 1 Leaf (non-root) n+1n Rootn+1n11 Max Max Min Min ptrs keys ptrs data keys (n+ 1) / 2
44 Insert into B+tree (read only) (a) simple case –space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root
45 (a) Insert key = 32 n=
46 (a) Insert key = 7 n=
47 (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Deletion from B+tree
48 (b) Coalesce with sibling –Delete n=4 40
49 (c) Redistribute keys –Delete n=4 35
50 B+tree deletions in practice –Often, coalescing is not implemented –Too hard and not worth it!