Download presentation
Presentation is loading. Please wait.
Published byCornelius Porter Modified over 9 years ago
1
File Processing : Index and Hash 2015, Spring Pusan National University Ki-Joune Li
2
STEMPNU What is index ? Index in a book Index : Keyword Pages Without Index Exhaustive search : Too Expensive Index for a file or database A function or mechanism F Index : S Predicate B (block numbers on hard disk) e.g. find student records where student.GPA > 4.0
3
STEMPNU Data Retrieval Time Data retrieval on disk : Two phases 1 st phase : Search with a condition (Predicate) 2 nd phase : Data access Search Condition { Block# } Search Block Number Database on Disk 1 st Phase 2 nd Phase Data Access Time - File Structure - Disk Placement - Clustering, etc..
4
STEMPNU Blocking Factor B f Blocking Factor Number of Records in a Block Blocking Number and Number of Disk Accesses N D = N record / B f By maximizing blocking factor, we reduce the number of disk accesses
5
STEMPNU How to Accelerate Phase 1 ? Of course, we could accelerate the phase 1 by index or by hash Index vs. Hash Index : a type of data structures Needs additional data structures Hash : a type of mechanism May not need any additional data structure (not exactly true)
6
STEMPNU A Simple Idea on Index Mapping Table from keywords to block numbers Inverted File Why inverted file is better than nothing ? If the table is too large (to fit in main memory) It has to be stored on disk Disk Access for Index Access KeywordBlock# RomeoB26 HamletB22 …… CarmenB212 Juliet
7
STEMPNU Searching Algorithms and Index A good way to accelerate searching Tree : O( logn ) Reorganize Inverted File to Tree Binary Search Tree : Branching Factor = 2 Tree in memory space vs. in disk space Memory space : Number of Comparisons Disk space : Number of Block Accesses 30, b27 14, b1740, b26 34, b1755, b26
8
STEMPNU Paged Tree : m-way search tree 57, b2734103, b28…343, b141, b2944…54, b2158, b1732…96, b127 Number of delimiters Delimiter Block number How to determine m ? One Node : One Disk Page e.g. When 1 disk page is 4 K bytes 4+4m+8(m-1) = 4096 m = 341 Very fat tree
9
STEMPNU Problem of m-Way search tree m-way search tree Search Performance : determined by the height Not balanced Average : O(log n) Worst case : n / B f O(n) Height : determined by insertion order e.g : insertion by ascending order How to make it balanced ? Balanced m-Way search tree : B-tree
10
STEMPNU B-tree B-tree : Balanced m-way search Tree Root Node : no child node or more than one child nodes Internal Node : m/2 ~ m child nodes (block number) External Node : data block number instead of child node Balanced Upward split instead of downward split : Binary Tree
11
STEMPNU Downward Split 1020 Suppose m=3 Insert 10, 20 Insert 30 1020 30 Upward Split overflow Insert 40 10 20 3040 103020
12
STEMPNU Downward Split Insert 50 30 10 2050 10 20 304050 Insert 70 10 20 30 40 5060 70 Insert 60 50 60 40 10 20 30 40 60 70 40 5010 20 30 60 70
13
STEMPNU Meaning of Downward Split Always Balanced Not so much influenced by the order of insertions Internal Nodes : m/2 ~ m child nodes (block number) 40 5010 20 30 60 70 Root Node Internal NodeExternal Node
14
STEMPNU Search by B-tree 40 5010 20 30 60 70 ? 4545 Not Found
15
STEMPNU Performance of B-tree Number of Comparison within a node : Trivial Number of Nodes to visit : Depth
16
STEMPNU Problem of B-tree Types of Search Exact Match Search Range Search E.g. find students where 25<student.GPA<50 B-tree Good for Exact match search Bad for range search 40 5010 20 30 60 70
17
STEMPNU B + -tree A Variant of B-tree Duplicate all elements at leaf nodes (external nodes) Linked List of Leaf Nodes Performance Exact Match Search and Insertion A small fraction of performance sacrifice Range Search : much more powerful than B-tree
18
STEMPNU B+-tree : Example 10203040102030 overflow 40102030 20 4010203050 20 401020305060 20 401020305060 4020 Linked List Duplication
19
STEMPNU Range Search with B + -tree Find students where GPA>3.5 401020305060 4020 35 401020305060 4020 35 401020305060 4020 35 401020305060 4020 35
20
STEMPNU Performance of B + -tree Performance Determined by the Depth Exact Match Search and Insertion (without split) d node (page) accesses Range Search node accesses ( n q : number of records to retrieve)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.