LEARNING OBJECTIVES Index files.

Slides:



Advertisements
Similar presentations
Disk Storage, Basic File Structures, and Hashing
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Indexing.
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
File Processing - Organizing file for Performance MVNC1 Organizing Files for Performance Chapter 6 Jim Skon.
Comp 335 File Structures Indexes. The Search for Information When searching for information, the information desired is usually associated with a key.
January 11, Csci 2111: Data and File Structures Week1, Lecture 1 Introduction to the Design and Specification of File Structures.
Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.
CPSC 231 Organizing Files for Performance (D.H.) 1 LEARNING OBJECTIVES Data compression. Reclaiming space in files. Compaction. Searching. Sorting, Keysorting.
Fundamental File Structure Concepts
Chapter 8 File organization and Indices.
Chapter 12 File Management
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 File Structure n File as a stream of characters l No structure l Consider students registered in a course Joe SmithSC Kathy LeeEN Albert.
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
FALL 2004CENG 351 File Structures1 Indexing Reference: Sections
Chapter 7 Indexing Objectives: To get familiar with: Indexing
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
1 Rizwan Rehman Centre for Computer Studies Dibrugarh University.
File StructureSNU-OOPSLA Lab.1 Chap 7. Indexing 서울대학교 컴퓨터공학과 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures by Folk, Zoellick, and Ricarrdi.
Kruse/Ryba ch091 Object Oriented Data Structures Tables and Information Retrieval Rectangular Tables Tables of Various Shapes Radix Sort Hashing.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Indexed Files Part One - Simple Indexes All of this material is stolen from Dr. Foster's CSCI325 Course Notes.
January 11, Files – Chapter 1 Introduction to the Design and Specification of File Structures.
Computers Data Representation Chapter 3, SA. Data Representation and Processing Data and information processors must be able to: Recognize external data.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
File Processing - Indexing MVNC1 Indexing Jim Skon.
1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
External data structures
CS246 Data & File Structures Lecture 1 Introduction to File Systems Instructor: Li Ma Office: NBC 126 Phone: (713)
CS4432: Database Systems II Record Representation 1.
1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
B+ Trees: An IO-Aware Index Structure Lecture 13.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
FILE ORGANIZATION.
Chapter 5 Record Storage and Primary File Organizations
© 2006 Pearson Addison-Wesley. All rights reserved15 A-1 Chapter 15 External Methods.
CS4432: Database Systems II
CPSC 231 Organizing Files for Performance (D.H.)
Indexing Goals: Store large files Support multiple search keys
Indexing and hashing.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Processing Data in External Storage
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Database Implementation Issues
File organization and Indexing
Chapter 11: Indexing and Hashing
Disk storage Index structures for files
Database Management System
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Indexing 4/11/2019.
Chap 7. Indexing.
Database Implementation Issues
Chapter 11: Indexing and Hashing
Advance Database System
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
Database Implementation Issues
Presentation transcript:

LEARNING OBJECTIVES Index files. Operations Required to Maintain an Index File. Primary keys. Secondary keys. CPSC 231 Indexing (D.H.)

Index Index is a tool for finding records in a file. It consists of a key field on which the index is searched and a reference (address or RRN) field that tells where to find the data file record associated with a particular key. CPSC 231 Indexing (D.H.)

Examples of an Index The index to a book (usually at the end of the book) provides a way to find a topic quickly. Imagine a book without an index? The index in a library (an on-line catalog) allows you to locate items by an author, by a title, or by a call number. CPSC 231 Indexing (D.H.)

Index in Databases -example Musical recording store uses an index file to keep track of its inventory. The data file consists of the following fields in each record: Id number Title Composer or composers Artist or artists Label (publisher) CPSC 231 Indexing (D.H.)

recording.h class Recording // a recording with a composite key {public: Recording (); Recording (char * label, char * idNum, char * title, char * composer, char * artist); char IdNum[7]; char Title [30]; char Composer[30]; char Artist[30]; char Label[7]; char * Key () const; Unpack (IOBuffer &); int Pack (IOBuffer &) const; void Print (ostream &, char * label = 0) const; }; CPSC 231 Indexing (D.H.)

Primary key -example The primary key in our example consists of the initials for the company label combined with the product ID. The canonical form of this key will consist of the uppercase form of the Label field followed by the ASCII representation of the ID number. E.G. DG241 CPSC 231 Indexing (D.H.)

Index file Index file is used to provide rapid keyed access to individual records in the data file. Index file consists of the following fields: key (e.g. ANG3795) reference (address) =address of the corresponding record in the data file CPSC 231 Indexing (D.H.)

Operations Required to Maintain an Indexed File Create the original empty index file and data file Load index file into memory before using it (if possible, load the whole file) Rewrite the index file from memory to the permanent storage after modifying it Add data records to the data file Delete data records from the data file Update records in the data file Update the index to reflect changes in the data file CPSC 231 Indexing (D.H.)

Creating Files Create two empty files index file and data record file CPSC 231 Indexing (D.H.)

Loading Index into Main Memory This can be supported with a buffer I/O or with an array. CPSC 231 Indexing (D.H.)

Rewriting the Index File from Memory This can be supported as a part of the close operation for the index file (I.e write the buffer or the array to the disk). CPSC 231 Indexing (D.H.)

Dangers of losing the index file If the index file is: outdated corrupted or lost then there must be some means of reconstructing the index file from the data file! CPSC 231 Indexing (D.H.)

Record addition Adding a new data record to the data file requires that we add a new record to the index file too. Since the index file is usually kept sorted than adding a new record would require rearranging the records in this file. (This should be easy done if the index is kept in main memory). CPSC 231 Indexing (D.H.)

Record deletion Deleting a data record requires deletion of the corresponding index record. Note that in an index file organization all data records are pinned. (WHY?) What are the consequences of this fact? CPSC 231 Indexing (D.H.)

Record Updating There are two categories of updates: the update modifies the value of the key the update does not modify the value of the key If the update modifies (changes) the primary key, then re-ordering of the index file might be required. If the update does not change the primary key it might still require reordering of records in the data file. (WHY?) CPSC 231 Indexing (D.H.)

Indexes that are too large to hold in Memory If the index file is too large to be kept in main memory then it has to be kept on the secondary storage. There are a number of disadvantages of keeping an index file on the disk: searching the index file can be very time consuming index rearrangement can be time consuming too. CPSC 231 Indexing (D.H.)

Possible alternatives to storing index files If the index file is too large to be kept in main memory than the following alternative organizations should be considered: a hashed organization (if access speed is very important) a tree structured organization, or a multilevel index such as a B-tree CPSC 231 Indexing (D.H.)

Pros of a simple index file Even if a simple index file has to be stored on the disk, in some cases it might prove a useful method of data storing. Advantages of the simple index file: allows for use of binary search to obtain a key-access to the record if index entries are much smaller than data records then sorting and maintaining an index is much easier than the data file if the data records are pinned than the index file allows for rearranging the keys without moving the data records CPSC 231 Indexing (D.H.)

Indexing with Multiple Key Access Since the primary key is unique then it is often used as a search keyword. Example of the primary key of the class recording is Label +Id (e.g. ANG3795). But most of the time when one searches for a music CD one would rather provide a title, a composer, or an artist. CPSC 231 Indexing (D.H.)

Secondary key Secondary key is a key for which multiple records may exist in the data file. Example: The composer’s name in the Recording class example (there can be a number of CD’s with Beethoven’s work in a store). The artist name in the Recording class. CPSC 231 Indexing (D.H.)

Secondary Index File A secondary index file might be created for each of the possible secondary indexes. Each entry in the secondary index file should consists of the following two fields: secondary index field (e.g. Beethoven) the corresponding primary index key (e.g. ANG3795) CPSC 231 Indexing (D.H.)

Record Addition Adding a record to the data file implies adding a record to the secondary index file. Costs of that are similar to the cost of adding a record in the primary index file. (e.g. records might have to be shifted) CPSC 231 Indexing (D.H.)

Record Deletion Deleting a record implies removing all references to that record in the file system. After the search on the secondary key, we perform a search on the primary key of the record to be deleted and and remove it from the secondary index file. CPSC 231 Indexing (D.H.)

Record Updating There are three possible situations: The update changes secondary key (if the secondary key is changed, we may have to rearrange the secondary key index so it stays in sorted order) The update changes the primary key (it has a big impact on the primary key index but in the secondary key index we only need to update the affected primary key field) CPSC 231 Indexing (D.H.)

Record Updating Update is confined to other fields: all updates that do not affect either the primary or secondary key fields do not affect the secondary key index, even if the update is substantial. CPSC 231 Indexing (D.H.)

Retrieving Data with Multiple Secondary Keys Example: If we want to find all CDs in a music store that have Beethoven’s Symphony No. 9 then we should search data files by using the following secondary keys: composer AND title. Both of those searches should produce a list of CDs by providing their primary keys. CPSC 231 Indexing (D.H.)

Boolean AND in searches EG. The search by composer could produce the following list of CDs (ANG3795, DG139201, DG18807, RCA2626) and the search by title could produce the following list of CDs (ANG3795, COL31809, DG18807) The CDs that we are interested in will have to belong to both of the above lists. (In other words we are taking an intersection of two sets) WHY? CPSC 231 Indexing (D.H.)

Boolean OR searches If we want to find all CDs by Beethoven and Chopin then we will use OR operation in our secondary key searches. To obtain the list of CDs that we are interested we would have to combine the outcomes of both searches (or use a union of two sets) WHY? CPSC 231 Indexing (D.H.)

Cons of the Current Secondary Index Structure Index file has to be rearranged every time a new record is added to the file. If there are duplicate secondary keys, the secondary key field is repeated for each entry. CPSC 231 Indexing (D.H.)

Improvements to the secondary index key structure Solution 1 Allow for multiple primary keys to be associated with a single secondary key by allocating an array of primary keys for each secondary key entry. Solves the problem of sorting each time when an new entry is added. Suffers from internal fragmentation (WHY?), and the number of allocated entries in the array may prove too small. CPSC 231 Indexing (D.H.)

Improvements to the secondary index key structure Solution 2 Create an inverted list of indexes. Have each secondary key point to a list of primary key references associated with it. This method eliminates most of the problems associated with maintaining a secondary index file. WHY? CPSC 231 Indexing (D.H.)

Selective Index A selective index contains keys for only a portion of the records in the data file. Such an index provides the user with a view of a specific subset of the file’s records. (E.G. all CDs of Beethoven’s work produced in 1998) CPSC 231 Indexing (D.H.)

Binding Binding takes place when a key is associated with a particular physical record in the data file. This can take place either during the preparation of the data file and indexes or later on during program execution. CPSC 231 Indexing (D.H.)