Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.

Slides:



Advertisements
Similar presentations
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Advertisements

1 Lecture 8: Data structures for databases II Jose M. Peña
Chapter 8 File organization and Indices.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Efficient Storage and Retrieval of Data
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Nimesh Shah (nimesh.s) , Amit Bhawnani (amit.b)
Introduction to Database System Wei-Pang Yang, IM.NDHU, Final Test-1 Example: Banking Database 1. branch 2. customer 客戶 ( 存款戶, 貸款戶 ) 5. account.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Index in Database Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.
10/3/2017 Chapter 6 Index Structures.
“ Possible representation (3) “
Data Indexing Herbert A. Evans.
Module 11: File Structure
Indexing Structures for Files and Physical Database Design
CS522 Advanced database Systems
Record Storage, File Organization, and Indexes
CS 540 Database Management Systems
Indexing Goals: Store large files Support multiple search keys
Indexing and hashing.
UNIT 11 Query Optimization
Azita Keshmiri CS 157B Ch 12 indexing and hashing
CS522 Advanced database Systems
Storage and Indexes Chapter 8 & 9
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Lecture 20: Indexing Structures
COMP 430 Intro. to Database Systems
Database Management Systems (CS 564)
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Chapter 15 QUERY EXECUTION.
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
國立臺北科技大學 課程:資料庫系統 fall Chapter 18
File organization and Indexing
Chapter 11: Indexing and Hashing
Lecture 12 Lecture 12: Indexing.
Database Management Systems (CS 564)
Physical Database Design
(Slides by Hector Garcia-Molina,
國立東華大學試題 系所:資訊管理學系 科目:資料庫管理 第1頁/共6頁
Indexing and Hashing Basic Concepts Ordered Indices
Lecture 21: Indexes Monday, November 13, 2000.
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Database Management System
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Chapter 11 Indexing And Hashing (1)
Indexing 1.
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Credit for some of the slides in this lecture goes to
Indexing 4/11/2019.
“ Possible representation (2) “
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
“ Possible representation (1) “
Question 1: Basic Concepts (40 %)
Example: Banking Database
Question 1: Basic Concepts (45 %)
Lecture 20: Indexes Monday, February 27, 2006.
CS4433 Database Systems Indexing.
Indexing, Access and Database System Architecture
Chapter 11: Indexing and Hashing
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授

6.1 Introduction

The Role of Access Method/Index in DBMS Language Processor Optimizer Operator Processor Access Method/Index File System database DBMS Query in SQL: SELECT CUSTOMER. NAME FROM CUSTOMER, INVOICE WHERE REGION = 'N.Y.' AND AMOUNT > 10000 AND CUSTOMER.C#=INVOICE.C# Internal Form: P(s (S SP) Operator: SCAN C using region index, create C SCAN I using amount index, create I SORT C?and I?on C# JOIN C?and I?on C# EXTRACT name field Calls to Access Method: OPEN SCAN on C with region index GET next tuple . Calls to file system: GET10th to 25th bytes from block #6 of file #5

The Internal Level concern the way the data is actually stored. Objectives: concern the way the data is actually stored. store data on direct access media. e.g. disk. minimize the number of disk access (disk I/O). disk access is much slower than main storage access time. Main Buffer I/O Disk index CPU

The Internal Level (cont.) Physical database design: Process of choosing an appropriate storage representation for a given database (by DBA). E.g. designing B-tree index or hashing Nontrivial task require a good understanding of how the database will be used. Logical database design: data S P SP S P-SP

6.2 Indexing

Indexing: Introduction Consider the Supplier table, S. Suppose "Find all suppliers in city xxx" is an important query. i.e. it is frequency executed. => DBA might choose the stored representation as Fig. 6.2. S1 S2 S3 S4 S5 Smith Jones Blake Clark Adams 20 10 30 London Paris Athens City-Index (index) S (indexed file) Fig. 6.2: Indexing the supplier file on CITY.

Indexing: Introduction (cont.) Now the DBMS has two possible strategies: <1> Search S, looking for all records with city = 'xxx'. <2> Search City-Index for the desired entry. Advantage: speed up retrieval. index file is sorted. fewer I/O's because index file is smaller. Disadvantages: slow down updates. both index and indexed file should be updated when we insert new tuple.

Indexing: Multiple Fields Primary index : index on primary key. <e.g> s# Secondary index: index on other field. <e.g> city A given table may have any number of indexes. Fig. 6.3: Indexing the supplier file on both CITY and STATUS. London Paris Athens S1 S2 S3 S4 S5 Smith Jones Blake Clark Adams 20 10 30 CITY-index S Status-index

How Index are used? City-Index S (indexed file) Consider: <1> Sequential access : accessed in the sequence defined by values of the indexed field. <e.g> Range query : "Find the suppliers whose city begins with a letter in the range L-R." <3> Existence test : <e.g> "Is there any supplier in London ?" Note: It can be done from the index alone. Athens London Paris City-Index S1 S2 S3 S4 S5 S (indexed file) ... <2> Direct Access : <e.g> "Find suppliers in London." <e.g> list query:"Find suppliers whose city is in London, Paris, and N.Y."

Indexing on Field Combinations To construct an index on the basis of values of two or more fields. <e.g.> City/Status-Index Athens/30 London /20 Paris /10 Paris/30 S1 S2 S3 S4 S5 Smith Jones Blake Clark Adams 20 10 30 London Paris Athens S Query: “Find suppliers in Paris with status 30.” - on city/status index: a single scan of a single index. - on two separate indexes: two index scan => still difficult. (Fig. 6.3)

Dense V.S. Nondense Indexing Assume the Supplier file (S) is clustered on S# . S1 S2 S3 S4 S5 S6 S1... S2... S3... S4... S5... S6... page1 page2 page3 Index (dense) Index (nondense) S A L L P P City_index

Dense V.S. Nondense Indexing (cont.) Nondense index: not contain an entry for every record in the indexed file. retrieval steps: <1> scan the index (nondense) to get page # , say p. <2> retrieve page p and scan it in main storage. advantages: <1> occupy less storage than a corresponding dense index <2> quicker to scan. disadvantages: can not perform existence test via index alone. Note: At most only one nondense index can be constructed. (why?) Clustering: logical sequence = physical sequence

B-tree A B-tree T of order m is an m-way search tree, such that Introduction: is a particular type of multi-level (or tree structured) index. proposed by Bayer and McCreight in 1972. the commonest storage structure of all in modern DBMS. Definition: (from Horowitz "Data Structure") A B-tree T of order m is an m-way search tree, such that <1> the root node has at least 2 children. <2> non-leaf nodes have at least [m/2] children. <3> all leave nodes are at the same level. Goal: maintain balance of index tree by dynamically restructuring the tree as updates proceed.

B+-tree (Knuth's variation) 50 82 96 97 99 91 93 94 89 94 83 85 89 71 78 82 60 62 70 51 52 58 58 70 35 40 50 15 18 32 6 8 12 12 32 + B+-tree (Knuth's variation) index set (nondense) - Sequence set (with pointers to data records) (dense or nondense) - index set: provides fast direct access to the sequential set and thus to the data too. - sequence set: provides fast sequential access to the indexed data. Other variations: B*-tree, B'-tree,...