Presentation is loading. Please wait.

Presentation is loading. Please wait.

PART 4 DATA STORAGE AND QUERY. Chapter 12 Indexing and Hashing.

Similar presentations


Presentation on theme: "PART 4 DATA STORAGE AND QUERY. Chapter 12 Indexing and Hashing."— Presentation transcript:

1 PART 4 DATA STORAGE AND QUERY

2 Chapter 12 Indexing and Hashing

3 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -3 Contents in This Chapter Basic concepts about and classification of indexing ordered indices, hash indices Properties/types of ordered indices primary/clustering indices, secondary/non-clustering indices dense indices, sparse indices single-level indices, multi-level indices (e.g. B + -tree, B-tree) Hash indices hash functions static hash, dynamic hash

4 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -4 §12.1 Basic Concepts How to locate records in DB file quickly? Indexing ( 索引技术 ) mechanisms used to speed up access to desired data e.g. for relation account(account-number, branch-name, balance) shown in Fig.12.1, the index branch-name  physical address of record (i.e.tuple) in DB file account Search Key attributes or a set of attributes used to look up the records in a file e.g. branch-name

5 A-217 A-110 A-101 A-215 A-201 A-218A-102 A-222 A-305 Fig. 12.1 DB indexed file account and its index file Note: the file account is logically a sequential file, but its records may be stored non-contiguously or non-ordered on the disk logical file account physical file account index file (account-number, branch-name, balance) indexed file

6 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -6 §12.1 Basic Concepts (cont.) Indexing mapping from search-key to storage locations of the file records, i.e. search-key  storage locations of the records in disks

7 搜索键 (search key) 索引文件 (index file) 数据文件 (主文件、被索引文件 indexed file ) s1s2s3..si..sj.sns1s2s3..si..sj.sn 散列 函数 H s1s2s3..si..sj.sns1s2s3..si..sj.sn a3a3 a2a2 a1a1 … aiai … ajaj … amam … a n-1 anan b3b3 b2b2 b1b1 … bibi … bjbj … bmbm … b n-1 bnbn s3s3 s2s2 s1s1 … sisi … sjsj … smsm … s n-1 snsn … … … … … … … … … … … … d3d3 d2d2 d1d1 … didi … djdj … dmdm … d n-1 dndn R (A, B, S, ….,, D) … … … … h(s i ) h(s n ) h(s 1 ) { 索引项 } index entry s1s1 s2s2 s3s3 … sisi … smsm 图 12.0.1 索引技术 (indexing) 及其分类 ordered indiceshash indices 搜索键 (search key)

8 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -8 Two basic kinds of indices: ordered indices: the index file is used to store the index entries in which the search key of the records and the address of the records are stored in sorted order hash indices: the “hash function” is used to map the the search key of the records to the address of the records the records are stored in the “buckets”, the number of the bucket is as the address of the records and is determined by the hash function §12.1 Basic Concepts (cont.)

9 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -9 DBS file with indexing mechanism include two parts : indexed file, in which data records are stored index file, in which index entries are included e.g. Fig.12.1 The indexed file can be organized as sequential file heap file hash file clustering file §12.2 Ordered Indices

10 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -10 §12.1 Basic Concepts (cont.) Index file set of index entries of the form Index files are typically much smaller than the original file search-keylocation

11 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -11 Ordered index (in index file) the index entries in the index file are stored in some sorted order, for instance, in accordance with the order of the search key /* 索引项的排列顺序与搜索键的排列顺序一致 e.g. in Fig.12.1( ), the index file is sorted by branch- name 12.2 Ordered Indices (cont.)

12 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -12 Primary/clustering index (in index file) considering a index file and its corresponding indexed file. if the indexed file is a sequential file, and the search key in index file also specifies the sequential order of the indexed file /* 索引文件的搜索键所规定的顺序与被索引的顺序文 件中的纪录顺序一致 e.g. in Fig.12.1, (branch-name, address of records) defines the same orders of the records as that in sequential indexed file account note: also called clustering index 12.2-I Primary and Secondary Indices

13 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -13 The search key of a primary index is usually but not necessarily the primary key e.g. in Fig. 12.1, branch-name is not the primary key of account 12.2-I Primary and Secondary Indices (cont.)

14 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -14 Secondary index an index whose search key specifies an order different from the sequential order of the file. also called non-clustering index e.g. Fig.12.5, secondary index on balance field of account Secondary indices have to be dense indices Index-sequential file ( 索引顺序文件 ) ordered sequential file with a primary index on the search key e.g. Fig.12.1 12.2-I Primary and Secondary Indices (cont.)

15 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -15 Fig.12.5 Secondary Index on balance field of account 12.2-I Primary and Secondary Indices (cont.)

16 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -16 Dense index the index record of the index file appears for every search- key value in the indexed file each value of search-key in the indexed file corresponds to an index entry in the index file e.g. Fig.12.1 In a dense primary index, the index record contains the search- key value and a pointer to the first record with that search-key value the rest of the records with the same search-key value would be stored sequentially after the first record e.g. Fig.12.1, search-key value “Perryridge” corresponds to three records in the file 12.2-II Dense and Sparse Indices

17 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -17 Sparse Index index file contains index entries for only some search-key values in the indexed file e.g. Fig.12.3 With respect to the sparse index, to locate a file record with search-key value K find index entry with largest search-key value ≤ K search file sequentially starting at the record to which this index entry points 12.2-II Dense and Sparse Indices (cont.)

18 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -18 Fig.12.3 Sparse index for the file account index file indexed file 12.2-II Dense and Sparse Indices

19 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -19 The indices in Fig.12.1, Fig.12.3, and Fig.12.5 are single-level indices The index file may be very large, and cannot be entirely kept in memory To access the index file quickly, the index file is stored on disk as a sequential file and construct a sparse index on this sequential index file outer index – a sparse index of primary index file inner index – the primary index file If even outer index is too large to fit in main memory, yet another level of index can be created, and so on. 12.2-III Single-level and Multi-level Indices

20 Fig. 12.4 Two-level sparse index indexed file index file

21 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -21 B + -tree indices and B-tree indices are two types of efficient multi-level indices, and widely used in DBS. 12.2-III Single-level and Multi-level Indices Fig. An Example of B + -tree index

22 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -22 The file records are stored in a set of buckets a bucket is a unit of storage containing one or more records (a bucket is typically a disk block). Hash function h a function from the set of all search-key values K in a file to the set of all bucket addresses, i.e. the set of the addresses of file records typical hash functions perform computation on the internal binary representation of the search-key. Hash file organization obtaining the bucket of a record directly from its search-key value using a hash function §12.5 Hashing Index Files

23 Fig.12.21 Hash file organization of account file, with branch-name as key Hash function: returns the sum of the binary representations of the characters modulo 10

24 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -24 Handling of Bucket Overflows Bucket overflow ( 溢出 ) can occur because of insufficient buckets skew in distribution of records. This can occur due to two reasons: multiple records have same search-key value chosen hash function produces non-uniform distribution of key values Although the probability of bucket overflow can be reduced, it cannot be eliminated; it is handled by using overflow buckets.

25 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -25 Handling of Bucket Overflows (cont.) Overflow chaining – the overflow buckets of a given bucket are chained together in a linked list. Above scheme is called closed hashing. Overflows chaining Fig. 12.22 Overflows chaining in an hash structure

26 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -26 Hash Indices Hashing can be used not only for file organization, but also for index-structure creation. A hash index organizes the search keys, with their associated record pointers, into a hash file structure. Strictly speaking, hash indices are always secondary indices if the file itself is organized using hashing, a separate primary hash index on it using the same search-key is unnecessary. However, we use the term hash index to refer to both secondary index structures and hash organized files.

27 Fig. 12.23 Hash index on search key account_number of account file

28 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -28 Static Hashing vs Dynamic Hashing Static hashing hash function h cannot be modified, while being used Dynamic hashing hash function h to be modified dynamically

29 April 2008Database System Concepts - Chapter 12 Indexing and Hashing -29 Create an index create index or ) E.g.: create index b-index on branch(branch-name) To drop an index drop index §12.8 Index Definition in SQL


Download ppt "PART 4 DATA STORAGE AND QUERY. Chapter 12 Indexing and Hashing."

Similar presentations


Ads by Google