1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.

Slides:



Advertisements
Similar presentations
Disk Storage, Basic File Structures, and Hashing
Advertisements

Organisation Of Data (1) Database Theory
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Indexing.
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
File Processing - Organizing file for Performance MVNC1 Organizing Files for Performance Chapter 6 Jim Skon.
Comp 335 File Structures Indexes. The Search for Information When searching for information, the information desired is usually associated with a key.
1 Lecture 8: Data structures for databases II Jose M. Peña
Chapter 11: File System Implementation
Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.
CPSC 231 Organizing Files for Performance (D.H.) 1 LEARNING OBJECTIVES Data compression. Reclaiming space in files. Compaction. Searching. Sorting, Keysorting.
LEARNING OBJECTIVES Index files.
Chapter 8 File organization and Indices.
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 File Structure n File as a stream of characters l No structure l Consider students registered in a course Joe SmithSC Kathy LeeEN Albert.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
FALL 2004CENG 351 File Structures1 Indexing Reference: Sections
Chapter 7 Indexing Objectives: To get familiar with: Indexing
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
File StructureSNU-OOPSLA Lab.1 Chap 7. Indexing 서울대학교 컴퓨터공학과 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures by Folk, Zoellick, and Ricarrdi.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Kruse/Ryba ch091 Object Oriented Data Structures Tables and Information Retrieval Rectangular Tables Tables of Various Shapes Radix Sort Hashing.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Indexed Files Part One - Simple Indexes All of this material is stolen from Dr. Foster's CSCI325 Course Notes.
Computers Data Representation Chapter 3, SA. Data Representation and Processing Data and information processors must be able to: Recognize external data.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
File Processing - Indexing MVNC1 Indexing Jim Skon.
1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
External data structures
More about Databases. Data Entry through Forms Table View (Data sheet view) is useful for data entry of new records But sometimes customization would.
1 File Management Chapter File Management n File management system consists of system utility programs that run as privileged applications n Concerned.
CS 405G: Introduction to Database Systems 21 Storage Chen Qian University of Kentucky.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
CE Operating Systems Lecture 17 File systems – interface and implementation.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Chapter 5 Record Storage and Primary File Organizations
CS4432: Database Systems II
Storage and File Organization
Why indexing? For efficient searching of a document
Module 11: File Structure
CPSC 231 Organizing Files for Performance (D.H.)
CHP - 9 File Structures.
Indexing and hashing.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
CS522 Advanced database Systems
Subject Name: File Structures
Chapter 11: File System Implementation
Database Implementation Issues
File organization and Indexing
Chapter 11: Indexing and Hashing
Disk storage Index structures for files
DATABASE IMPLEMENTATION ISSUES
Indexing 4/11/2019.
Chap 7. Indexing.
Database Implementation Issues
Chapter 11: Indexing and Hashing
Advance Database System
Database Implementation Issues
Presentation transcript:

1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi

2 Chapter Objectives  Index files.  Operations Required to Maintain an Index File.  Primary keys.  Secondary keys.

3 Index  a tool for finding records in a file  consists of: key field field on which the index is searched reference (address or RRN) field tells where to find the data file record associated with a particular key. 7.1 What is an Index

4 Examples of an Index  book index usually at the end of the book arranged alphabetically by topic The index in a library (an on-line catalog) allows you to locate items by an author, by a title, or by a call number.  photo thumbnails usually represents a link to the actual photo much smaller file, can be loaded quickly actual photo takes much longer to load  if index was actual photos, would take long to load

5 Book Index

6 Example: Index in Databases  University uses an index file to keep track of its courses.  The data file consists of the following fields in each record: Department Title Professor Student List Room & Time

7 Example: Primary key  Department not specific enough  Course Number not unique  Professor not unique  Room & Time possible classes aren’t identified this way  Department + Course Number -> Obvious? 7.2 A Simple Index

8 Index file  It is used to provide rapid access to individual records in the data file via the keys  Example index file consists of the following fields: key (e.g. CIS402) reference (address) =address of the corresponding record in the data file

9

10

Primary Index k1k2k4k5k7k9 k1k2k4k5k7k9 AAAZZZCCCXXXEEEFFF Index File Data File 7.1 What Is an Index?

7.2 A Simple Index for E-S Files Class TextIndex{ public: TextIndex(int maxKeys = 100, int unique = 1); int Insert(const char*ckey, int recAddr); //add to index int Remove(const char* key); //remove key from index int Search(const char* key) const; //search for key, return recAddr void Print (ostream &) const; protected: int MaxKeys; // maximum num of entries int NumKeys;// actual num of entries char **Keys; // array of key values int* RecAddrs; // array of record references int Find (const chat* key) const; int Init (int maxKeys, int unique); int Unique;// if true --> each key must be unique } Index Class Interface

13 Operations on an Indexed File  Create (when data file is created)  Load into memory (whole file, if possible and prudent)  Write updated file to permanent storage  Record(s) added to data file  Record(s) deleted from data file  Update record(s) in data file  Searches

14

15 Creating Files of Data  Create files index file data record file Load Index  via buffer I/O an array. Writing Back Index File  Can be part of the close operation for the index file close function in index object can write the buffer/array to the disk before closing file

16 Record addition  Adding a new data record to the data file requires adding a new record to the index file  If the index file is sorted: adding a new record may require rearranging the records in this file. depends upon index file representation in memory if sort necessary, easily done if the indices are in main memory

17 Record deletion  Deletion of a data record requires deletion of the corresponding index record.  Can space in data file be reclaimed? Difficult, as with index file organization all data records are pinned a pinned data record is one that has a reference to its address in an index file  Other consequences Resorting difficulty Solution: Sort the file via the indices

18 Record Updating  Two categories of updates: modification of key value re-ordering of the index file might be required  two possible situations 1. modifying key reorders file 2. see below modification of non-key value might still require reordering of records in the data file. (WHY?)  size of data record might increase, requiring moving it to space that can hold it  must reset index for that record

19 Indexes too large for Memory  kept on the secondary storage disadvantages time consumption  searching the index file  requires disk accesses instead of just memory accesses  rearranging indexes  requires disk accesses 7.5 Indexes That Are Too Large to Hold in Memory

20 Solutions to Index Files in 2ndary storage  If the index file is too large to be kept in main memory than the following alternative organizations should be considered: a hashed organization (if access speed is very important) a tree structured organization, or a multilevel index such as a B-tree

21 Pros of a simple index file  allows for use of binary search  sorting and maintaining an index is much easier than for a data file true if index entries are much smaller than data records,  if data records are pinned, can rearrange keys without moving data records  apply them to multiple simple indexes...

22 Indexing with Multiple Key Access  unique primary key often used as a search keyword.  Example primary key CS215  What if you’d like to include the prof in the search? Two keys: Course & Prof Could also be: Course & Time Location & Time (?) 7.6 Indexing to Provide Access by Multiple Keys

23 Secondary key  A secondary key is a key for which multiple records may exist in the data file.  Example: Sorting an Excel sheet using two fields (e.g. name & section) A professor teaches more than one class

24 Secondary Index File  create for the possible secondary indexes. secondary keys can be shared primary keys were unique  Example: Professor El-Ramly secondary keys: Primary keys containing this prof:  CS352  CS215 Can access those courses via the secondary key  What if course has multiple sections?

25 Record Addition  Adding a record to the data file likely requires adding a record to the secondary index file.  Costs are similar to the cost of adding a record in the primary index file. records might have to be shifted indexes may have to be rearranged

26

27 Record Deletion  must remove all references to that record in the file system.  search for primary key in primary index file remove index  search in secondary index file for the primary key of the record to be deleted remove index from the secondary index file.  what if secondary keys are maintained? secondary key refers to primary key primary key will have been deleted, and will not exist if we consider this possibility, don’t have to delete secondary key pitfalls?

28 Record Updating  There are three possible situations: secondary key altered may have to rearrange the secondary key index so it stays in sorted order primary key altered big impact on the primary key index in the secondary key index only need to update the affected primary key field confined to non-key fields all updates that do not affect either the primary or secondary key fields do not affect the secondary key index, even if the update is substantial.  recall, can affect primary index, since that refers to location in data file

29

30 Retrieving Data with Multiple Secondary Keys  Example: All courses taught by Spiegel or Gordon Requires two searches searches produce a list of courses by providing primary keys. Spiegel: CIS136, CIS235, CIS402 Gordon: CIS425, CIS520, CIS243

31 Boolean AND in searches  Example: Search for courses: taught by Spiegel located in Lytle Hall Courses found are in intersection of courses taught by Spiegel courses offered in Lytle Hall

32 Boolean OR searches  Example: Search for courses: taught by Spiegel located in Lytle Hall Courses found are in union of courses taught by Spiegel courses offered in Lytle Hall

33 Cons of the Current Secondary Index Structure  index file has to be rearranged every time a new record is added to the file.  for duplicate secondary keys, secondary key field is repeated for each entry. Secondary KeyPrimary Key El-RamlyCS215 El-RamlyCS352 KhattabCS214 KhattabCS Improving the Secondary Index Structure

34 Cons of the Current Secondary Index Structure  Solution A: by an array of references  Solution B: by linking the list of references 7.8 Improving the Secondary Index Structure

35 Improvements to the secondary index key structure  Solution 1 Allow for multiple primary keys to be associated with a single secondary key by allocating a primary key list (STL vector is best; why?) for each secondary key entry. Solves the problem of sorting each time when an new entry is added. According to text: Suffers from internal fragmentation due to fixed nature of list, and the number of allocated entries in the array may prove too small.  STL (or Java) vector fixes this: How?

A. Array of References BEETHOVEN ANG3795 DG DG18807 RCA2626 COREA WAR23699 DVORAK COL31809 PROKOFIEV LON2312 RIMSKY-KORSAKOV MER75016 SPRINGSTEEN COL38358 SWEET HONEY IN THE R FF245 Secondary key Set of primary key references Revised composer index * no need to rearrange * limited reference array * internal fragmentation

37  Allow for multiple primary keys to be associated with a single secondary key by allocating a primary key array (STL vector is best; why?) for each secondary key entry. Solves the problem of sorting each time when an new entry is added. According to text: Suffers from internal fragmentation due to fixed nature of list, and the number of allocated entries in the array may prove too small.  STL (or Java) vector fixes this: How? A. Array of References

38  Solution B Create an inverted list of indexes. Have each secondary key point to a list of primary key references associated with it. This method eliminates most of the problems associated with maintaining a secondary index file.  Which solution is better? B. Inverted List

Inverted Lists  Guidelines for better solution no reorganization when adding no limitation for duplicate key no internal fragmentation  Solution B: by Linking the list of references  A list of primary key references  secondary key field, relative record number of the first corresponding primary key reference PROKOFIEV ANG36193 LON2312

Linking List of References (1) BEETHOVEN COREA PROKOFIEV RIMSKY-KORSAKOV SPINGSTEEN SWEET HONEY IN THE R DVORAK LON2312 RCA2626 ANG23699 COL38358 DG18807 MER75016 COL31809 DG ANG36193 WAR FF245 Secondary Index file Label ID List file Improved revision of the composer index

Linking List of References (2)  The primary key references in a separate, entry-sequenced file  Advantages rearranges only when secondary key changes rearrangement is quick less penalty associated with keeping the secondary index file on secondary storage (less need for sorting) Label ID List file not need to be sorted reusing the space of deleted record is easy

Linking List of References (3)  Disadvantage same secondary key references may not be physically grouped lack of locality could involve a large amount of seeking solution: reside in memory  same Label ID list can hold the lists of a number of secondary index files  if too large in memory, can load only a part of it

43

Selective Indexes  Selective Index: Index on a subset of records  Selective index contains only some part of entire index provide a selective view useful when contents of a file fall into several categories e.g. 20 < Age < 30 and $1000 < Salary e.g. Courses offered after 12 noon 7.9 Selective Indexes