Download presentation
Presentation is loading. Please wait.
Published byElmer Parker Modified over 8 years ago
1
em Spatiotemporal Database Laboratory Pusan National University File Processing : Index and Hash 2004, Spring Pusan National University Ki-Joune Li
2
em Spatiotemporal Database Laboratory Pusan National University What is index ? Index in a book Index : Keyword Pages Without Index Exhaustive search : Too Expensive Index for a file or database A function or mechanism Index : Predicate Blocks (block numbers on hard disk) e.g. find student records where student.GPA > 4.0
3
em Spatiotemporal Database Laboratory Pusan National University Data Retrieval Time Data retrieval on disk : Two phases 1 st phase : Search with a condition (Predicate) 2 nd phase : Data access Search Condition { Block# } Search Block Number Database on Disk 1 st Phase 2 nd Phase Data Access Time - File Structure - Disk Placement - Clustering, etc..
4
em Spatiotemporal Database Laboratory Pusan National University Blocking Factor B f Blocking Factor Number of Records in a Block Blocking Number and Number of Disk Accesses N D = N record / B f By maximizing blocking factor, we reduce the number of disk accesses
5
em Spatiotemporal Database Laboratory Pusan National University How to Accelerate Phase 1 ? Of course, we could accelerate the phase 1 by index or by hash Index vs. Hash Index : a type of data structures Needs additional data structures Hash : a type of mechanism May not need any additional data structure (not exactly true)
6
em Spatiotemporal Database Laboratory Pusan National University A Simple Idea on Index Mapping Table from keywords to block numbers Inverted File Why inverted file is better than nothing ? If the table is too large (to fit in main memory) It have to be stored on disk Disk Access for Index Access KeywordBlock# RomeoB26 HamletB22 …… CarmenB212 Juliet
7
em Spatiotemporal Database Laboratory Pusan National University Searching Algorithms and Index A good way to accelerate searching Tree : O( logn ) Reorganize Inverted File to Tree Binary Search Tree : Branching Factor = 2 Tree in memory space vs. in disk space Memory space : Number of Comparisons Disk space : Number of Block Accesses 30, b27 14, b1740, b26 34, b1755, b26
8
em Spatiotemporal Database Laboratory Pusan National University Paged Tree : m-way search tree 57, b2734103, b28…343, b141, b2944…54, b2158, b1732…96, b127 Number of delimiters Delimiter Block number How to determine m ? One Node : One Disk Page e.g. When 1 disk page is 4 K bytes 4+4m+8(m-1) = 4096 m = 341 Very fat tree
9
em Spatiotemporal Database Laboratory Pusan National University Problem of m-Way search tree m-way search tree Search Performance : determined by the height Not balanced Average : O(log n) Worst case : n / B f O(n) Height : determined by insertion order e.g : insertion by ascending order How to make it balanced ? Balanced m-Way search tree : B-tree
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.