Download presentation
Presentation is loading. Please wait.
Published byTobias French Modified over 8 years ago
1
Index in Database Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授
2
6.1 Introduction
3
12-3 Wei-Pang Yang, Information Management, NDHU The Role of Access Method/Index in DBMS Query in SQL: SELECT CUSTOMER. NAME FROM CUSTOMER, INVOICE WHERE REGION = 'N.Y.' AND AMOUNT > 10000 AND CUSTOMER.C#=INVOICE.C# I nternal Form: ( (S SP) Operator: SCAN C using region index, create C SCAN I using amount index, create I SORT C?and I?on C# JOIN C?and I?on C# EXTRACT name field Calls to Access Method: OPEN SCAN on C with region index GET next tuple. Calls to file system: GET10th to 25th bytes from block #6 of file #5 Language Processor Optimizer Operator Processor Access Method/Index File System database DBMS
4
12-4 Wei-Pang Yang, Information Management, NDHU The Internal Level Main Buffer I/O Disk index CPU index Objectives: -concern the way the data is actually stored. -store data on direct access media. e.g. disk. -minimize the number of disk access (disk I/O). -disk access is much slower than main storage access time.
5
12-5 Wei-Pang Yang, Information Management, NDHU The Internal Level (cont.) Physical database design: Process of choosing an appropriate storage representation for a given database (by DBA). E.g. designing B-tree index or hashing Nontrivial task require a good understanding of how the database will be used. Logical database design: data S P SP S P-SP
6
6.2 Indexing
7
12-7 Wei-Pang Yang, Information Management, NDHU Indexing: Introduction Consider the Supplier table, S. Suppose "Find all suppliers in city xxx" is an important query. i.e. it is frequency executed. => DBA might choose the stored representation as Fig. 6.2. Fig. 6.2: Indexing the supplier file on CITY. S1 S2 S3 S4 S5 Smith Jones Blake Clark Adams 20 10 30 20 30 London Paris London Athens City-Index (index) Athens London Paris S (indexed file)
8
12-8 Wei-Pang Yang, Information Management, NDHU Indexing: Introduction (cont.) Now the DBMS has two possible strategies: Search S, looking for all records with city = 'xxx'. Search City-Index for the desired entry. Advantage: speed up retrieval. index file is sorted. fewer I/O's because index file is smaller. Disadvantages: slow down updates. both index and indexed file should be updated when we insert new tuple.
9
12-9 Wei-Pang Yang, Information Management, NDHU Indexing: Multiple Fields Primary index : index on primary key. s# Secondary index: index on other field. city A given table may have any number of indexes. Fig. 6.3: Indexing the supplier file on both CITY and STATUS. London Paris Athens S1 S2 S3 S4 S5 Smith Jones Blake Clark Adams 20 10 30 20 30 London Paris London Athens 10 20 30 CITY-index S Status-index
10
12-10 Wei-Pang Yang, Information Management, NDHU How Index are used? Direct Access : " Find suppliers in London." list query:" Find suppliers whose city is in London, Paris, and N.Y." Sequential access : accessed in the sequence defined by values of the indexed field. Range query : " Find the suppliers whose city begins with a letter in the range L-R. " Existence test : " Is there any supplier in London ?" Note: It can be done from the index alone. Consider: Athens London Paris City-Index S1 S2 S3 S4 S5 S (indexed file)... London Paris Athens
11
12-11 Wei-Pang Yang, Information Management, NDHU Indexing on Field Combinations To construct an index on the basis of values of two or more fields. City/Status-Index Athens/30 London /20 Paris /10 Paris/30 S1 S2 S3 S4 S5 Smith Jones Blake Clark Adams 20 10 30 20 30 London Paris London Athens S Query: “Find suppliers in Paris with status 30.” - on city/status index: a single scan of a single index. - on two separate indexes: two index scan => still difficult. (Fig. 6.3)
12
12-12 Wei-Pang Yang, Information Management, NDHU Dense V.S. Nondense Indexing Assume the Supplier file (S) is clustered on S#. S1 S2 S3S4 S5 S6 S1...S2... S3... S4... S5... S6... S1 S3 S5 page1 page2 page3 Index (dense) Index (nondense) S A L L P P City_index
13
12-13 Wei-Pang Yang, Information Management, NDHU Dense V.S. Nondense Indexing (cont.) Nondense index: not contain an entry for every record in the indexed file. retrieval steps: scan the index (nondense) to get page #, say p. retrieve page p and scan it in main storage. advantages: occupy less storage than a corresponding dense index quicker to scan. disadvantages: can not perform existence test via index alone. Note: At most only one nondense index can be constructed. (why?) Clustering: logical sequence = physical sequence
14
12-14 Wei-Pang Yang, Information Management, NDHU B-tree Introduction: is a particular type of multi-level (or tree structured) index. proposed by Bayer and McCreight in 1972. the commonest storage structure of all in modern DBMS. Definition: ( from Horowitz "Data Structure" ) A B-tree T of order m is an m-way search tree, such that the root node has at least 2 children. non-leaf nodes have at least [m/2] children. all leave nodes are at the same level. Goal: maintain balance of index tree by dynamically restructuring the tree as updates proceed.
15
12-15 Wei-Pang Yang, Information Management, NDHU B + -tree (Knuth's variation) + 50 82 96 97 99 91 93 94 89 94 83 85 89 71 78 82 60 62 70 51 52 58 58 70 35 40 50 15 18 32 6 8 12 12 32 index set (nondense) Sequence set (with pointers to data records) (dense or nondense) - index set: provides fast direct access to the sequential set and thus to the data too. - sequence set: provides fast sequential access to the indexed data. - Other variations: B*-tree, B'-tree,...
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.