Download presentation
Presentation is loading. Please wait.
Published byAda Powers Modified over 8 years ago
1
Jun-Ki Min
2
Slide 14- 2 Such a multi-level index is a form of search tr ee ◦ However, insertion and deletion of new index entrie s is a severe problem because every level of the ind ex is an ordered file.
3
Slide 14- 3 FIGURE 14.8
4
Slide 14- 4 Most multi-level indexes use B-tree or B+-tree dat a structures because of the insertion and deletion p roblem ◦ This leaves space in each tree node (disk block) to allo w for new index entries These data structures are variations of search trees that allow efficient insertion and deletion of new se arch values. In B-Tree and B+-Tree data structures, each node c orresponds to a disk block Each node is kept between half-full and completely full
5
Slide 14- 5 An insertion into a node that is not full is quit e efficient ◦ If a node is full the insertion causes a split into two nodes Splitting may propagate to other tree levels A deletion is quite efficient if a node does not become less than half full If a deletion causes a node to become less th an half full, it must be merged with neighbori ng nodes
6
6 B-tree (degree = m) ◦ m-way search tree 1.Except root and leaf, the number of subtrees of internal node is at least ⌈m/2⌉, at most, m 1.at most, the number of key is ⌈m/2⌉-1 2.if root is not a leaf, root has two subtree ats least. 3.all leaf is same level balanced tree Note: degree is the maximum number of subtrees
7
Slide 14- 7
8
8
9
9 B-tree ◦ random access : branch by search key ◦ sequential access : in order traversal ◦ Insert/delete : keep balance split : by node overflow merge : by node underflow Insert ◦ Insert done at leaf node has free space : simple insertion overflow(no free space) there m keys in a leaf node 1) split 2) m/2 th key insert parent node 3) remains left, right
10
10 59 insert 57 insert 50 oo’ 57 60 58 ff o 5058 ^ p 60 50 o b 59 o’ p b
11
11 splite by insert 54 in Parent node f, insert 54 60 5860 54 fff’ oo’poo’’o’p 50 oo’o’’ 57 54 goes to parent node f 58 goes to parent node b ^ ^
12
12 parent node b, insert 58 parent node a, insert 43 69 a bc 4369 a bb’c ^ 58 1943 19 bbb’ defdeff’ 43 goes to parent a ^^
13
13 Delete ◦ Delete is done at leaf node ◦ Deletion key is not in leaf node swap with following key deletion ◦ if # of key < ⌈ m/2 ⌉ -1, underflow 1.redistribution sibling node having keys whose number >=⌈m/2⌉ (parent node key → underflow node key) (sibling node key →parent node key) 2.merge can not redistribution (sibling node + parent node + underflow node)
14
14 delete 60 delete 20 6062 506562506560 bb 62 5065 b fff oooppp 50 26 203630 b e lm 40 42 n 30 2636 b e lm 40 42 n
15
Insert 77 72 84 74 75 76 78 2 7 40 89 90 91 72 76 84 74 75 2 7 40 89 90 91 77 78 Split
16
Delete 84 72 76 84 74 75 2 7 40 89 90 91 77 78 72 76 89 74 75 2 7 40 84 90 91 77 78 Swap & Delete
17
Delete 74 72 76 89 74 75 2 7 40 90 91 77 78 72 76 89 74 75 2 7 40 90 91 77 78 Underflow 발생
18
Redistribution Using A Adjacent Sibling whose number of key greater than or equal to ceiling(m/2) 72 76 89 74 75 2 7 40 90 91 77 78 40 76 89 72 75 2 7 90 91 77 78 {2,7, 40, 72, 75} is redistributed, [m/2] th value ( 즉, 40) go to parent node
19
Delete 40 40 76 89 72 75 2 7 90 91 77 78 72 76 89 40 75 2 7 90 91 77 78 Swap cannot Redistribution
20
72 76 89 40 75 2 7 90 91 77 78 72 89 2 7 90 91 75 76 77 78 merge with right sibling and parent
21
21 B + -tree consists of index set and sequence set 1. index set ◦ consists of internal node ◦ support access path to leaf nodes ◦ support direct access 2. sequence set ◦ consists of leaf nodes ◦ leaf nodes store whole keys support sequential access ◦ leaf node and internal node has different structures
22
22 B + -tree with degree m ◦ node structure < n, P 0, K 1, P 1, K 2, P 2, …, P n-1, K n, P n > n : # of keys ( 1≤n<m ) P 0, …, P n :pointer to subtree K 1, …, K n : key value ◦ root has 0, 2~m subtrees ◦ Except root and leaf, internal node has ⌈m/2⌉~ m subtrees ◦ All leaf nodes are same level ◦ key values in nodes is ascending order
23
Slide 14- 23 In a B-tree, pointers to data records exist at a ll levels of the tree In a B+-tree, all pointers to data records exist s at the leaf-level nodes A B+-tree can have less levels (or higher capa city of search values) than the corresponding B-tree
24
24 search ◦ B + -tree index set : m-nary search tree ◦ Record is obtained at leaf node Insert ◦ similar to B-tree Delete ◦ done at leaf (when redistribution/merge) key in index set is not deleted ∵ it act as seperator access path ◦ redristribution: change key in index set ◦ merge : delete key in index set
25
25 Index set sequence set 69 2011043 201540354369559070110125120 a bc defgh
26
26 B + -Tree, delete 43 43 in index set is not removed delete 125 (underflow redistribution) 69 2011043 2015403569559070110125120 69 2090 2015403569559070120110 43
27
27 delete 55(under flow merge) 69 2090 20154035907011069 120
28
Hashing for disk files is called External Hashing The file blocks are divided into M equal-sized buckets, numb ered bucket 0, bucket 1,..., bucket M-1. ◦ Typically, a bucket corresponds to one (or a fixed number o f) disk block. One of the file fields is designated to be the hash key of the fi le. The record with hash key value K is stored in bucket i, where i =h(K), and h is the hashing function. Search is very efficient on the hash key. Collisions occur when a new record hashes to a bucket that is already full. ◦ An overflow file is kept for storing such records. ◦ Overflow records that hash to each bucket can be linked to gether.
29
There are numerous methods for collision resolution, includin g the following: ◦ Open addressing: Proceeding from the occupied position sp ecified by the hash address, the program checks the subseq uent positions in order until an unused (empty) position is f ound. ◦ Chaining: For this method, various overflow locations are ke pt, usually by extending the array with a number of overflo w positions. In addition, a pointer field is added to each rec ord location. A collision is resolved by placing the new reco rd in an unused overflow location and setting the pointer of the occupied hash address location to the address of that o verflow location. ◦ Multiple hashing: The program applies a second hash functi on if the first results in a collision. If another collision result s, the program uses open addressing or applies a third has h function and then uses open addressing if necessary.
30
Hashed Files (contd.)
31
Slide 13- 31 To reduce overflow records, a hash file is typi cally kept 70-80% full. The hash function h should distribute the rec ords uniformly among the buckets ◦ Otherwise, search time will be increased because m any overflow records will exist. Main disadvantages of static external hashing : ◦ Fixed number of buckets M is a problem if the num ber of records in the file grows or shrinks. ◦ Ordered access on the hash key is quite inefficient ( requires sorting the records).
32
Hashed Files - Overflow handling
33
Slide 13- 33 Dynamic and Extendible Hashing Techniques ◦ Hashing techniques are adapted to allow the dynamic growth and shrinking of the number of file records. ◦ These techniques include the following: dynamic hash ing, extendible hashing, and linear hashing. Both dynamic and extendible hashing use the binar y representation of the hash value h(K) in order to a ccess a directory. ◦ In dynamic hashing the directory is a binary tree. ◦ In extendible hashing the directory is an array of size 2 d where d is called the global depth.
34
Slide 13- 34 The directories can be stored on disk, and they exp and or shrink dynamically. ◦ Directory entries point to the disk blocks that contain the stored records. An insertion in a disk block that is full causes the b lock to split into two blocks and the records are red istributed among the two blocks. ◦ The directory is updated appropriately. Dynamic and extendible hashing do not require an overflow area. Linear hashing does require an overflow area but d oes not use a directory. ◦ Blocks are split in linear order as the file expands.
35
Extendible Hashing
36
36 000 001 010 011 100 101 110 111 Directory 3 3 Bucket 3 2 1 pseudokey 000 ··· 001 ··· 01 ··· 1 ···
37
◦ Insert record whose key start with 10 overflow ◦ seperate 4 th bucket ◦ change pointer in directory 000 001 010 011 100 101 110 111 Directory 3 3 bucket 3 2 2 pseudo key 000 ··· 001 ··· 01 ··· 10 ··· 2 11 ···
38
38 When the first bucket (000) is full ◦ increase bucket depth p by 1, and splits bucket ◦ all records whose key start with 0001 move to new bucket ◦ In this case d< (p + 1). ◦ Extend d to d+1. 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 4 4 4 3 2 2 2 0000 ··· 0001 ··· 001 ··· 01 ··· 10 ··· 11 ··· directorybucket pseudo key
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.