Presentation is loading. Please wait.

Presentation is loading. Please wait.

Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured.

Similar presentations


Presentation on theme: "Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured."— Presentation transcript:

1 Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured Indexes: –Binary tree –B-tree –B + -tree Tree Structure ROOT NODE NODE LEAF NODES Node: branching point

2 Binary Tree Index Each index entry is a node of the tree. The index is a table with four fields: –the true index fields, key value and address, –a left, or less-than, pointer that points to a node with a smaller key value and, –a right, or greater-than, pointer - points to node with larger key value Key value Right pointer Left pointer Data pointer i.e. data file address A binary tree node

3 Binary Tree Index Example 16871354223539 1234567 16 1 87 2 13 3 54 4 35 6 22 5 Root node Data file 161 872 21 2 3 4 5 133 3 544 4 225 5 6 Root node LPKeyAddRP Index as a table 39 7 6 7 356 397 7 - (only key values shown)

4 Binary Tree Index Problems Data pointers are dispersed throughout every level of the tree. This results in: –Unequal access times –Complex tree traversal programming A binary tree is normally unbalanced: –For the tree to be balanced (i.e. equal branch lengths), the key value at each node must be the median of the values in its sub-trees. –This is virtually impossible, as the tree is loaded top-down, i.e. in order of arrival of key values, hence, –the tree becomes un-balanced, and unequal access times are the result.

5 Solution to Balance Problem in Index Tree Structures Load the tree “bottom-up”. That is, after a certain number of key values have been input, choose the median value to be promoted to a higher level so that it can point evenly to its left and right. This leads to the concepts of: –multi-value nodes, i.e. multiple key values stored in sequence in each index node, and, –node-splitting - division of an overfull node into two nodes, taking respectively, the low- end and high-end values of the split node.

6 K1K2K3 A1A2A3 Left pointer - points to node with key values less than K1 Right pointer Points to node whose key values are >K1 and <K2 A B-tree Node Multiple key values per node K1<K2<K3 - i.e. key values in sequence Pointers all point to other nodes, and therefore to ALL of the key values in those nodes

7 Existing node values: 12 23 27 38 New value to be inserted: 19 The split: 12 192327 38 Key value 23 promoted to next highest level to point to other two nodes These values stay in the old node These values move to a new node B-tree Node Splitting

8 Data file has two records - root node of index now full. Data file: Root node: 87 36 2 87 1 1234 Then, new data file record of key value 27 stored in cell 3 The split: 273687 Promoted 36 2 27 3 87 1 New Root Node B-tree Node Split Example

9 36 2 27 3 87 1 K1A1K2A2 1 2 3 4 36223 273 871 Root Node Current State of Index

10 B-tree Pros and Cons Balanced - i.e. every branch is the same length, i.e. descends to the same level. Therefore, the wild variation in access times observable in binary trees is avoided. However, the key values, (and associated addresses), are still dispersed throughout all levels of the structure, leading to: –unequal path lengths, and therefore unequal access times, and, –complex tree-traversal algorithms for logically sequential reading/unloading of the data file.

11 Solution to the Key Dispersal Problem Prohibit storage of data file addresses at all levels above leaf level. Consequently: –all accesses follow the same path length, resulting in equal access times, and, –logically sequential reading of the data file requires access to only the leaf level. That is, complex tree-traversal algorithms are not required.

12 Implementing the Solution Since all key values must appear at leaf level, some key values appear more than once in the index, and therefore, upper-level nodes don’t need address fields, and leaf-level nodes don’t need downward index pointers, the median value to be promoted when a node split occurs must belong to one of the ‘halves’. i.e. the rightmost value of the left half, (leading to less- than-or-equal pointers), or the leftmost value of the right half, (greater-than-or-equal pointers).

13 1234 The Data file: 569724134 The Root Node Leaf Level Nodes 41 9 2 34 5 41 4 56 1 72 3 The left-hand node split when 41 was inserted. The high-order end went to the right-hand node. Hence, the leaf-node pointer. The B+-tree

14 1234 569724134 The Root Node 25 9 3441 The split 41 9 2 25 6 56 1 72 3 34 5 41 4 The Data File: The B+-tree Insertion of data file record of key value 25


Download ppt "Physical Index Structures Logically, the index is a sorted list. Physically, the sorted order is normally maintained by pointers in a table. Tree-structured."

Similar presentations


Ads by Google