Indexing By: Arnold Mesa
Indexing You can think of an index to a file like a catalogue to a library
There are two kinds... 4 Ordered Indices - sorted ordering of the values. 4 Hash Indices - a uniform distribution of values across a range of buckets. The distribution is based on a hash function.
Key Concepts 4 Access Types - types of access that are supported efficiently 4 Access Time - time it takes to access a particular data item 4 Insertion Time - time it takes to insert a data item 4 Deletion Time - time it takes to delete a data item 4 Space Overhead - additional space occupied by an index structure
4 There are two kinds of ordered indices –Dense Index - An index record appears for every search-key value in the file. The index record contains the search-key value and a pointer to the first data record. The rest of the records with the same search key-value would be sequentially stored after the first record. –Sparse Index - An index record appears for only some of the search key values. So you have a smaller number of index records. Each index contains a search key and a pointer to the first record, as with the dense index.
234Hotel SofitelA HiltonB Hilton C Hilton A WestinC Westin B MarriotB MarriotC The RitzA-007 Hotel Sofitel Hilton Westin Marriot The Ritz Dense Index
234Hotel SofitelA HiltonB Hilton C Hilton A WestinC Westin B MarriotB MarriotC The RitzA-007 Hotel Sofitel Westin The Ritz Sparse Tree
234Hotel SofitelA HiltonB Hilton C Hilton A WestinC Westin B MarriotB MarriotC The RitzA-007 Hotel Sofitel Westin The Ritz Suppose we want to find the Marriot #532...
Efficiency Issues 4 Even if we use a sparse index, the index itself may become too large for efficient processing 4 If an index is sufficiently small to be kept in main memory, the search time would be low 4 If the index is large that is kept on disk, a search may require several disk block reads
How to deal... 4 With a large index we should construct a sparse index on the primary index. 234Hotel SofitelA HiltonB Hilton C Hilton A WestinC Westin B MarriotB MarriotC The RitzA-007 Hotel Sofitel Hilton Westin Marriot The Ritz Hotel Sofitel Marriot
Is this looking familiar? 4 Remember B + -trees –B+ trees are said to be of m-order. A number of the designers choosing. –Each leaf has between m and [m-2] children. –All data is stored at the leaf level. –All leaves are at the same depth
Example?