Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jun-Ki Min. Slide 14- 2  Such a multi-level index is a form of search tr ee ◦ However, insertion and deletion of new index entrie s is a severe problem.

Similar presentations


Presentation on theme: "Jun-Ki Min. Slide 14- 2  Such a multi-level index is a form of search tr ee ◦ However, insertion and deletion of new index entrie s is a severe problem."— Presentation transcript:

1 Jun-Ki Min

2 Slide 14- 2  Such a multi-level index is a form of search tr ee ◦ However, insertion and deletion of new index entrie s is a severe problem because every level of the ind ex is an ordered file.

3 Slide 14- 3  FIGURE 14.8

4 Slide 14- 4  Most multi-level indexes use B-tree or B+-tree dat a structures because of the insertion and deletion p roblem ◦ This leaves space in each tree node (disk block) to allo w for new index entries  These data structures are variations of search trees that allow efficient insertion and deletion of new se arch values.  In B-Tree and B+-Tree data structures, each node c orresponds to a disk block  Each node is kept between half-full and completely full

5 Slide 14- 5  An insertion into a node that is not full is quit e efficient ◦ If a node is full the insertion causes a split into two nodes  Splitting may propagate to other tree levels  A deletion is quite efficient if a node does not become less than half full  If a deletion causes a node to become less th an half full, it must be merged with neighbori ng nodes

6 6  B-tree (degree = m) ◦ m-way search tree 1.Except root and leaf, the number of subtrees of internal node is at least ⌈m/2⌉, at most, m 1.at most, the number of key is ⌈m/2⌉-1 2.if root is not a leaf, root has two subtree ats least. 3.all leaf is same level  balanced tree Note: degree is the maximum number of subtrees

7 Slide 14- 7

8 8

9 9  B-tree ◦ random access : branch by search key ◦ sequential access : in order traversal ◦ Insert/delete : keep balance  split : by node overflow  merge : by node underflow  Insert ◦ Insert done at leaf node  has free space : simple insertion  overflow(no free space)  there m keys in a leaf node 1) split 2)  m/2  th key  insert parent node 3) remains  left, right

10 10  59 insert  57 insert 50  oo’ 57  60  58 ff o  5058 ^ p 60  50 o b 59 o’ p b

11 11  splite by insert 54  in Parent node f, insert 54  60  5860  54 fff’ oo’poo’’o’p 50  oo’o’’ 57  54 goes to parent node f 58 goes to parent node b ^ ^

12 12  parent node b, insert 58  parent node a, insert 43  69 a bc  4369 a bb’c ^  58  1943  19 bbb’ defdeff’ 43 goes to parent a ^^

13 13  Delete ◦ Delete is done at leaf node ◦ Deletion key is not in leaf node  swap with following key  deletion ◦ if # of key < ⌈ m/2 ⌉ -1, underflow 1.redistribution  sibling node having keys whose number >=⌈m/2⌉ (parent node key → underflow node key) (sibling node key →parent node key) 2.merge  can not redistribution (sibling node + parent node + underflow node)

14 14  delete 60  delete 20 6062 506562506560 bb 62 5065 b fff oooppp 50 26 203630 b e lm 40 42 n 30 2636 b e lm 40 42 n

15  Insert 77 72 84 74 75 76 78 2 7 40 89 90 91 72 76 84 74 75 2 7 40 89 90 91 77 78 Split

16  Delete 84 72 76 84 74 75 2 7 40 89 90 91 77 78 72 76 89 74 75 2 7 40 84 90 91 77 78 Swap & Delete

17  Delete 74 72 76 89 74 75 2 7 40 90 91 77 78 72 76 89 74 75 2 7 40 90 91 77 78 Underflow 발생

18 Redistribution Using A Adjacent Sibling whose number of key greater than or equal to ceiling(m/2) 72 76 89 74 75 2 7 40 90 91 77 78 40 76 89 72 75 2 7 90 91 77 78 {2,7, 40, 72, 75} is redistributed, [m/2] th value ( 즉, 40) go to parent node

19  Delete 40 40 76 89 72 75 2 7 90 91 77 78 72 76 89 40 75 2 7 90 91 77 78 Swap cannot Redistribution

20 72 76 89 40 75 2 7 90 91 77 78 72 89 2 7 90 91 75 76 77 78 merge with right sibling and parent

21 21  B + -tree consists of index set and sequence set 1. index set ◦ consists of internal node ◦ support access path to leaf nodes ◦ support direct access 2. sequence set ◦ consists of leaf nodes ◦ leaf nodes store whole keys  support sequential access ◦ leaf node and internal node has different structures

22 22  B + -tree with degree m ◦ node structure < n, P 0, K 1, P 1, K 2, P 2, …, P n-1, K n, P n >  n : # of keys ( 1≤n<m )  P 0, …, P n :pointer to subtree  K 1, …, K n : key value ◦ root has 0, 2~m subtrees ◦ Except root and leaf, internal node has ⌈m/2⌉~ m subtrees ◦ All leaf nodes are same level ◦ key values in nodes is ascending order

23 Slide 14- 23  In a B-tree, pointers to data records exist at a ll levels of the tree  In a B+-tree, all pointers to data records exist s at the leaf-level nodes  A B+-tree can have less levels (or higher capa city of search values) than the corresponding B-tree

24 24  search ◦ B + -tree index set : m-nary search tree ◦ Record is obtained at leaf node  Insert ◦ similar to B-tree  Delete ◦ done at leaf (when redistribution/merge)  key in index set is not deleted ∵ it act as seperator  access path ◦ redristribution: change key in index set ◦ merge : delete key in index set

25 25 Index set sequence set 69 2011043 201540354369559070110125120 a bc defgh

26 26  B + -Tree, delete 43  43 in index set is not removed  delete 125 (underflow  redistribution) 69 2011043 2015403569559070110125120 69 2090 2015403569559070120110 43

27 27  delete 55(under flow  merge) 69 2090 20154035907011069 120

28  Hashing for disk files is called External Hashing  The file blocks are divided into M equal-sized buckets, numb ered bucket 0, bucket 1,..., bucket M-1. ◦ Typically, a bucket corresponds to one (or a fixed number o f) disk block.  One of the file fields is designated to be the hash key of the fi le.  The record with hash key value K is stored in bucket i, where i =h(K), and h is the hashing function.  Search is very efficient on the hash key.  Collisions occur when a new record hashes to a bucket that is already full. ◦ An overflow file is kept for storing such records. ◦ Overflow records that hash to each bucket can be linked to gether.

29  There are numerous methods for collision resolution, includin g the following: ◦ Open addressing: Proceeding from the occupied position sp ecified by the hash address, the program checks the subseq uent positions in order until an unused (empty) position is f ound. ◦ Chaining: For this method, various overflow locations are ke pt, usually by extending the array with a number of overflo w positions. In addition, a pointer field is added to each rec ord location. A collision is resolved by placing the new reco rd in an unused overflow location and setting the pointer of the occupied hash address location to the address of that o verflow location. ◦ Multiple hashing: The program applies a second hash functi on if the first results in a collision. If another collision result s, the program uses open addressing or applies a third has h function and then uses open addressing if necessary.

30 Hashed Files (contd.)

31 Slide 13- 31  To reduce overflow records, a hash file is typi cally kept 70-80% full.  The hash function h should distribute the rec ords uniformly among the buckets ◦ Otherwise, search time will be increased because m any overflow records will exist.  Main disadvantages of static external hashing : ◦ Fixed number of buckets M is a problem if the num ber of records in the file grows or shrinks. ◦ Ordered access on the hash key is quite inefficient ( requires sorting the records).

32 Hashed Files - Overflow handling

33 Slide 13- 33  Dynamic and Extendible Hashing Techniques ◦ Hashing techniques are adapted to allow the dynamic growth and shrinking of the number of file records. ◦ These techniques include the following: dynamic hash ing, extendible hashing, and linear hashing.  Both dynamic and extendible hashing use the binar y representation of the hash value h(K) in order to a ccess a directory. ◦ In dynamic hashing the directory is a binary tree. ◦ In extendible hashing the directory is an array of size 2 d where d is called the global depth.

34 Slide 13- 34  The directories can be stored on disk, and they exp and or shrink dynamically. ◦ Directory entries point to the disk blocks that contain the stored records.  An insertion in a disk block that is full causes the b lock to split into two blocks and the records are red istributed among the two blocks. ◦ The directory is updated appropriately.  Dynamic and extendible hashing do not require an overflow area.  Linear hashing does require an overflow area but d oes not use a directory. ◦ Blocks are split in linear order as the file expands.

35 Extendible Hashing

36 36 000 001 010 011 100 101 110 111 Directory 3 3 Bucket 3 2 1 pseudokey 000 ··· 001 ··· 01 ··· 1 ···

37 ◦ Insert record whose key start with 10  overflow ◦  seperate 4 th bucket ◦  change pointer in directory 000 001 010 011 100 101 110 111 Directory 3 3 bucket 3 2 2 pseudo key 000 ··· 001 ··· 01 ··· 10 ··· 2 11 ···

38 38  When the first bucket (000) is full ◦ increase bucket depth p by 1, and splits bucket ◦ all records whose key start with 0001 move to new bucket ◦ In this case d< (p + 1). ◦ Extend d to d+1. 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 4 4 4 3 2 2 2 0000 ··· 0001 ··· 001 ··· 01 ··· 10 ··· 11 ··· directorybucket pseudo key


Download ppt "Jun-Ki Min. Slide 14- 2  Such a multi-level index is a form of search tr ee ◦ However, insertion and deletion of new index entrie s is a severe problem."

Similar presentations


Ads by Google