1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the.

Slides:

Advertisements

Similar presentations

Disk Storage, Basic File Structures, and Hashing

Advertisements

CpSc 3220 File and Database Processing Lecture 17 Indexed Files.

File Processing - Organizing file for Performance MVNC1 Organizing Files for Performance Chapter 6 Jim Skon.

File StructureSNU-OOPSLA Lab1 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형주 교수 Chap 5. Managing Files of Records File Structures by Folk, Zoellick, and Ricarrdi.

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.

Comp 335 File Structures Indexes. The Search for Information When searching for information, the information desired is usually associated with a key.

1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.

File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.

File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.

1 Lecture 8: Data structures for databases II Jose M. Peña

LEARNING OBJECTIVES Index files.

Chapter 12 File Management

CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.

2010/3/81 Lecture 8 on Physical Database DBMS has a view of the database as a collection of stored records, and that view is supported by the file manager.

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.

1 File Structure n File as a stream of characters l No structure l Consider students registered in a course Joe SmithSC Kathy LeeEN Albert.

METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.

Efficient Storage and Retrieval of Data

Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.

1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)

1 Friday, July 07, 2006 “Vision without action is a daydream, Action without a vision is a nightmare.” - Japanese Proverb.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.

1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.

B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses.

Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.

Chapter 9 Multilevel Indexing and B-Trees

FALL 2004CENG 351 File Structures1 Indexing Reference: Sections

Chapter 7 Indexing Objectives: To get familiar with: Indexing

DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.

Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.

File StructureSNU-OOPSLA Lab.1 Chap 7. Indexing 서울대학교 컴퓨터공학과 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures by Folk, Zoellick, and Ricarrdi.

CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.

Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.

File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.

File Organization Techniques

1 DATABASE TECHNOLOGIES BUS Abdou Illia, Fall 2007 (Week 3, Tuesday 9/4/2007)

Kruse/Ryba ch091 Object Oriented Data Structures Tables and Information Retrieval Rectangular Tables Tables of Various Shapes Radix Sort Hashing.

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.

1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.

Prof. Yousef B. Mahdy , Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files.

Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.

Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 

File Processing - Indexing MVNC1 Indexing Jim Skon.

March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.

1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.

March 7 & 9, Csci 2111: Data and File Structures Week 8, Lectures 1 & 2 Multi-Level Indexing and B-Trees.

1 File Management Chapter File Management n File management system consists of system utility programs that run as privileged applications n Concerned.

1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.

Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.

File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.

Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.

1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.

CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.

Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.

Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.

Chapter 5 Record Storage and Primary File Organizations

© 2006 Pearson Addison-Wesley. All rights reserved15 A-1 Chapter 15 External Methods.

SVBIT SUBJECT:- Operating System TOPICS:- File Management

Subject Name: File Structures

CHP - 9 File Structures.

Indexing Goals: Store large files Support multiple search keys

Indexing and hashing.

Subject Name: File Structures

Chapter 11: Indexing and Hashing

Indexing 4/11/2019.

Chap 7. Indexing.

Chapter 11: Indexing and Hashing

Advance Database System

Presentation transcript:

1 Chap 7. Indexing

2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the use of a simple linear index to provide rapid access to records in an entry- sequenced, variable-length record file  Investigate the implementation of the use of indexes for file maintenance  Introduce the template features of C++ for object I/O  Describe the object-oriented approach to indexed sequential files

3 Chapter Objectives(2)  Describe the use of indexes to provide access to records by more than one key  Introduce the idea of an inverted list, illustrating Boolean operations on lists  Discuss of when to bind an index key to an address in the data file  Introduce and investigate the implications of self-indexing files

4 Contents(1) 7.1 What is an Index? 7.2 A Simple Index for Entry-Sequenced Files 7.3 Using Template Classes in C++ for Object I/O 7.4 Object-Oriented Support for Indexed, Entry- Sequenced Files of Data Objects 7.5 Indexes That Are Too Large to Hold in Memory

5 Contents(2) 7.6 Indexing to Provide Access by Multiple Keys 7.7 Retrieval Using Combinations of Secondary Keys 7.8 Improving the Secondary Index Structure: Inverted Lists 7.9 Selective Indexes 7.10 Binding

6 Overview: Index(1)  Index: a data structure which associates given key values with corresponding record numbers  It is usually physically separate from the file (unlike for indexed sequential files tight binding).  Linear indexes (like indexes found at the back of books) Index records are ordered by key value as in an ordered relative file Best algorithm for finding a record with a specific key value is binary search Addition requires reorganization

7 Overview: Index(2) k1k2k4k5k7k9 k1k2k4k5k7k9 AAAZZZCCCXXXEEEFFF Index File Data File

8 Overview: Index(3)  Tree Indexes (like those of indexed sequential files) Hierarchical in that each level Beginning with the root level, points to the next record Leaves POINTs only the data file  Indexed Sequential File  Binary Tree Index  AVL Tree Index  B+ tree Index

9 Roles of Index?  Index: keys and reference fields  Fast Random Accesses  Uniform Access Speed  Allow users to impose order on a file without actually rearranging the file  Provide multiple access paths to a file  Give user keyed access to variable-length record files

10 A Simple Index(1) u Datafile u entry-sequenced, variable-length record u primary key : unique for each entry in a file u Search a file with key (popular need) u cannot use binary search in a variable-length record file(can’t know where the middle record) u construct an index object for the file u index object : key field + byte-offset field

11 A Simple Index (2) ANG COL COL DG FF LON MER RCA WAR DG LON|2312|Romeo and Juliet|Prokofiev... RCA|2626|Quarter in C Sharp Minor... WAR|23699|Touchstone|Corea... ANG|3795|Sympony No. 9|Beethoven... COL|38358|Nebeaska|Springsteen... DG|18807|Symphony No. 9|Beethoven... MER|75016|Coq d'or Suite|Rimsky... COL|31809|Symphony No. 9|Dvorak... DG|139201|Violin Concerto|Beethoven... FF|245|Good News|Sweet Honey In The Datafile Actual data record Address of record Reference field Key Indexfile

12 A Simple Index (3)  Index file: fixed-size record, sorted  Datafile: not sorted because it is entry sequenced  Record addition is quick (faster than a sorted file)  Can keep the index in memory find record quickly with index file than with a sorted one  Class TextIndex encapsulates the index data and index operations Key Reference field

Let’s See Figure 7.4 Class TextIndex{ public: TextIndex(int maxKeys = 100, int unique = 1); int Insert(const char*ckey, int recAddr); //add to index int Remove(const char* key); //remove key from index int Search(const char* key) const; //search for key, return recAddr void Print (ostream &) const; protected: int MaxKeys; // maximum num of entries int NumKeys;// actual num of entries char **Keys; // array of key values int* RecAddrs; // array of record references int Find (const chat* key) const; int Init (int maxKeys, int unique); int Unique;// if true --> each key must be unique }

TextIndex::TextIndex TextIndex:: TextIndex (int maxKeys, int unique) : NumKeys (0), Keys(0), RecAddrs(0) {Init (maxKeys, unique);} TextIndex :: ~TextIndex () {delete Keys; delete RecAddrs;}

TextIndex::Init int TextIndex :: Init (int maxKeys, int unique) { Unique = unique != 0; if (maxKeys <= 0) { MaxKeys = 0; return 0; } MaxKeys = maxKeys; Keys = new char *[maxKeys]; RecAddrs = new int [maxKeys]; return 1; }

TextIndex::Insert int TextIndex :: Insert (const char * key, int recAddr) { int i; int index = Find (key); if (Unique && index >= 0) return 0; // key already in if (NumKeys == MaxKeys) return 0; //no room for another key for (i = NumKeys-1; i >= 0; i--) { if (strcmp(key, Keys[i])>0) break; // insert into location i+1 Keys[i+1] = Keys[i]; RecAddrs[i+1] = RecAddrs[i]; } Keys[i+1] = strdup(key); RecAddrs[i+1] = recAddr; NumKeys ++; return 1; }

TextIndex::Remove int TextIndex :: Remove (const char * key) { int index = Find (key); if (index < 0) return 0; // key not in index for (int i = index; i < NumKeys; i++) { Keys[i] = Keys[i+1]; RecAddrs[i] = RecAddrs[i+1]; } NumKeys --; return 1; }

TextIndex::Search int TextIndex :: Search (const char * key) const { int index = Find (key); if (index < 0) return index; return RecAddrs[index]; }

TextIndex::Find int TextIndex :: Find (const char * key) const { for (int i = 0; i < NumKeys; i++) if (strcmp(Keys[i], key)==0) return i;// key found else if (strcmp(Keys[i], key)>0) return -1;// not found return -1;// not found }

Index Implementation  Page 706~709 G.1 Recording.h G.2 Recording.cpp G.3 Makerec.cpp  Page 710~712 G.4 Textind.h G.5 Textind.cpp

IndexRecordingFile int IndexRecordingFile (char * myfile, TextIndex & RecordingIndex) { Recording rec; int recaddr, result; DelimFieldBuffer Buffer; // create a buffer BufferFile RecordingFile(Buffer); result = RecordingFile. Open (myfile,ios::in); if (!result) {cout << "Unable to open file "<<myfile<<endl; return 0; } while (1) // loop until the read fails { recaddr = RecordingFile. Read (); // read next record if (recaddr < 0) break; rec. Unpack (Buffer); RecordingIndex. Insert(rec.Key(), recaddr); cout << recaddr <<'\t'<<rec<<endl; } RecordingIndex. Print (cout); result = RetrieveRecording (rec, "LON2312", RecordingIndex, RecordingFile); cout <<"Found record: "<<rec; }

RetrieveRecording int RetrieveRecording (Recording & recording, char * key, TextIndex & RecordingIndex, BufferFile & RecordingFile) // read and unpack the recording, return TRUE if succeeds {int result; cout <<"Retrieve "<<key<<" at recaddr "<<RecordingIndex.Search(key)<<endl; result = RecordingFile. Read (RecordingIndex.Search(key)); cout <<"read result: "<<result<<endl; if (result == -1) return FALSE; result = recording.Unpack (RecordingFile.GetBuffer()); return result; }

 Template Class RecordFile we want to make the following code possible –Person p; RecordFile pFile; pFile.Read(p); –Recording r; RecordFile rFile; rFile.Read(r); difficult to support files for different record types without having to modify the class Template class which is derived from BufferFile –the actual declarations and calls –RecordFile pFile; pFile.Read(p); –RecordFile rFile; rFile.Read(p); Template Class for I/O Object(1)

Template Class for I/O Object(2)  Template Class RecordFile template class RecordFile : public BufferFile{ public: int Read(RecType& record, int recaddr = -1); int Write(const RecType& record, int recaddr = -1); int Append(const RecType& record); RecordFile(IOBuffer& buffer) : BufferFile(buffer) {} }; //The template parameter RecType must have the following methods //int Pack(IOBuffer &); pack record into buffer //int Unpack(IOBuffer &); unpack record from buffer

 Adding I/O to an existing class RecordFile add methods Pack and Unpack to class Recording create a buffer object to use in the I/O –DelimFieldBuffer Buffer; declare an object of type RecordFile –RecordFile rFile (Buffer);  Declaration and Calls Template Class for I/O Object(3) Recording r1, r2; rFile.Open(“myfile”); rFile.Read(r1); rFile.Write(r2); Directly open a file and read and write objects of class Recording

Object-Oriented Approach to I/O  Class IndexedFile add indexed access to the sequential access provided by class RecordFile extends RecordFile with Update, Append and Read method –Update & Append : maintain a primary key index of data file –Read : supports access to object by key  TextIndex, RecordFile ==> IndexedFile  Issues of IndexedFile –how to make a persistent index of a file –how to guarantee that the index is an accurate reflection of the contents of the data file

27  Create the original empty index and data files  Load the index file into memory  Rewrite the index file from memory  Add records to the data file and index  Delete records from the data file  Update records in the data file  Update the index to reflect changes in the data file  Retrieve records Basic Operations of IndexedFile(1)

28 Basic Operations of TextIndexedFile (1)  Creating the files initially empty files (index file and data file)  created as empty files with header records implementation ( makeind.cpp in Appendix G )  Create method in class BufferFile  Loading the index into memory loading/storing objects are supported in the IOBuffer classes need to choose a particular buffer class to use for an index file ( tindbuff.cpp in Appendix G ) –define class TextIndexBuffer as a derived class of FixedFieldBuffer to support reading and writing of index objects

29  Rewriting the index file from memory part of the Close operation on an IndexedFile write back index object to the index file should protect the index when failure write changes when out-of-date(use status flag) Implementation –Rewind and Write operations of class BufferFile  Record Addition Basic Operations of TextIndexedFile(2) Add an entry to the index Requires rearrangement if in memory, no file access using TextIndex.Insert Add a new record to data file using RecordFile ::Write +

30  Record Deletion data file: the records need not be moved index: delete entry really or just mark it –using TextIndex::Delete  Record Updating (2 categories) ¶the update changes the value of the key field –delete/add approach –reorder both the index and the data file ·the update does not affect the key field –no rearrangement of the index file –may need to reconstruct the data file Basic Operations of TextIndexedFile(3)

Class TextIndexedFile(1)  Members methods – Create, Open, Close, Read (sequential & indexed), Append, and Update operations protected members –ensure the correlation between the index in memory (Index), the index file (IndexFile), and the data file (DataFile) char* key() –the template parameter RecType must have the key method –used to extract the key value from the record

Class TextIndexedFile(2) Template class TextIndexedFile { public: int Read(RecType& record); // read next record int Read(char* key, RecType& record) // read by key int Append(const RecType& record); int Update(char* oldKey, const RecType& record); int Create(char* name, int mode=ios::in|los::out); int Open(char* name, int mode=ios::in|los::out); int Close(); TextIndexedFile(IOBuffer & buffer, int keySize, int maxKeys=100); ~TextIndexedFile(); // close and delete protected: TextIndex Index; BufferFile IndexFile; TextIndexBuffer IndexBuffer; RecordFile DataFile; char * FileName; // base file name for file int SetFileName(char* fName, char*& dFileName, char*&IdxFName); };

TextIndexedFile 생성자 / 소멸자 template TextIndexedFile ::TextIndexedFile (IOBuffer & buffer, int keySize, int maxKeys) : DataFile(buffer), Index (maxKeys), IndexBuffer(keySize, maxKeys), IndexFile(IndexBuffer) { FileName = 0; } template TextIndexedFile ::~TextIndexedFile (){ Close(); }

TextIndexedFile::Create int TextIndexedFile ::Create (char * fileName, int mode) // use fileName.dat and fileName.ind {int result; char * dataFileName, * indexFileName; result = SetFileName (fileName, dataFileName, indexFileName); cout <<"file names "<<dataFileName<<" "<<indexFileName<<endl; if (result == -1) return 0; result = DataFile.Create (dataFileName, mode); if (!result){ FileName = 0; // remove connection return 0; } result = IndexFile.Create (indexFileName, ios::out|ios::in); if (!result){ DataFile. Close(); // close the data file FileName = 0; // remove connection return 0; } return 1; }

TextIndexedFile::Open template int TextIndexedFile ::Open (char * fileName, int mode) // open data and index file and read index file {int result; char * dataFileName, * indexFileName; result = SetFileName (fileName, dataFileName, indexFileName); if (!result) return 0; // open files result = DataFile.Open (dataFileName, mode); if (!result) { FileName = 0; return 0; } result = IndexFile.Open (indexFileName, ios::out); if (!result) { DataFile. Close(); FileName = 0; return 0; } // read index into memory result = IndexFile. Read (); if (result != -1) {result = IndexBuffer. Unpack (Index);if (result != -1) return 1; } DataFile.Close(); IndexFile.Close(); FileName = 0; return 0; }

TextIndexedFile::Read template int TextIndexedFile ::Read (RecType & record) {return result = DataFile. Read (record, -1);} template int TextIndexedFile ::Read (char * key, RecType & record) { int ref = Index.Search(key); if (ref < 0) return -1; int result = DataFile. Read (record, ref); return result; }

TextIndexedFile::Append template int TextIndexedFile ::Append (const RecType & record) { char * key = record.Key(); int ref = Index.Search(key); if (ref != -1) // key already in file return -1; ref = DataFile. Append(record); int result = Index. Insert (key, ref); return ref; }

TextIndexedFile::Close template int TextIndexedFile ::Close () {int result; if (!FileName) return 0; // already closed! DataFile. Close(); IndexFile. Rewind(); IndexBuffer.Pack (Index); result = IndexFile. Write (); cout <<"result of index write: "<<result<<endl; IndexFile. Close (); FileName = 0; return 1; }

TextIndexBuffer class TextIndexBuffer: public FixedFieldBuffer {public: TextIndexBuffer(int keySize, int maxKeys = 100, int extraFields = 0, int extraSize=0); // extraSize is included to allow derived classes to extend // the buffer with extra fields. // Required because the buffer size is exact. int Pack (const TextIndex &); int Unpack (TextIndex &); void Print (ostream &) const; protected: int MaxKeys; int KeySize; char * Dummy; // space for dummy in pack and unpack };

TextIndexBuffer::TextIndexBuffer TextIndexBuffer::TextIndexBuffer (int keySize, int maxKeys, int extraFields, int extraSpace) : FixedFieldBuffer (1+2*maxKeys+extraFields, sizeof(int)+maxKeys*keySize+maxKeys*sizeof(int) + extraSpace) // buffer fields consist of numKeys, actual number of keys // Keys [maxKeys] key fields size = maxKeys * keySize // RecAddrs [maxKeys] record address fields size = maxKeys*sizeof(int) { MaxKeys = maxKeys; KeySize = keySize; AddField (sizeof(int)); for (int i = 0; i < maxKeys; i++) { AddField (KeySize); AddField (sizeof(int)); } Dummy = new char[keySize+1]; }

TextIndexBuffer::Pack int TextIndexBuffer::Pack (const TextIndex & index) { int result; Clear (); result = FixedFieldBuffer::Pack (&index.NumKeys); for (int i = 0; i < index.NumKeys; i++) {// note only pack the actual keys and recaddrs result = result && FixedFieldBuffer::Pack (index.Keys[i]); result = result && FixedFieldBuffer::Pack (&index.RecAddrs[i]); } for (int j = 0; j<index.MaxKeys-index.NumKeys; j++) {// pack dummy values for other fields result = result && FixedFieldBuffer::Pack (Dummy); } return result; }

TextIndexBuffer::Unpack int TextIndexBuffer::Unpack(TextIndex & index) { int result; result = FixedFieldBuffer::Unpack (&index.NumKeys); for (int i = 0; i < index.NumKeys; i++) {// note only pack the actual keys and recaddrs index.Keys[i] = new char[KeySize]; // just to be safe result = result && FixedFieldBuffer::Unpack (index.Keys[i]); result = result && FixedFieldBuffer::Unpack (&index.RecAddrs[i]); } for (int j = 0; j<index.MaxKeys-index.NumKeys; j++) {// pack dummy values for other fields result = result && FixedFieldBuffer::Unpack (Dummy); } return result; }

IndexRecordingFile int IndexRecordingFile (char * myfile,TextIndexedFile & indexFile) {Recording rec; int recaddr, result; DelimFieldBuffer Buffer; // create a buffer BufferFile RecFile(Buffer); result = RecFile. Open (myfile,ios::in); if (!result) {cout << "Unable to open file "<<myfile<<endl; return 0; } while (1) // loop until the read fails {recaddr = RecFile. Read (); // read next record if (recaddr < 0) break; rec. Unpack (Buffer); indexFile. Append(rec); } Recording rec1; result = indexFile.Read ("LON2312", rec1); cout <<"Found record: "<<rec; }

Enhancements to TextIndexedFile(1)  Support other types of keys Restriction: the key type is restricted to string (char *) Relaxation: support a template class SimpleIndex with parameter for key type  Support data object class hierarchies Restriction: every object must be of the same type in RecordFile Relaxation: the type hierarchy supports virtual pack methods

Enhancements to TextIndexedFile(2)  Support multirecord index files Restriction: the entire index fit in a single record Relaxation: add protected method Insert, Delete, and Search to manipulate the arrays of index objects  Active optimization of operations Obvious: the most obvious optimization is to use binary search in the Find method Active: add a flag to the index object to avoid writing the index record back to the index file when it has not been changed

Where are we going?  Plain Stream File  Persistency ==> Buffer support ==> BufferFile Deriving BufferFile using various other classes  Random Access ==> Index support => IndexedFile : Deriving TextIndexedFile using RecordFile and TextIndex

47 Too Large Index(1)  On secondary storage (large linear index)  Disadvantages binary searching of the index requires several seeks(slower than a sorted file) index rearrangement requires shifting or sorting records on second storage  Alternatives (to be considered later) hashed organization tree-structured index (e.g. B-tree)

48 Too Large Index (2)  Advantages over the use of a data file sorted by key even if the index is on the secondary storage can use a binary search sorting and maintaining the index is less expensive than doing the data file can rearrange the keys without moving the data records if there are pinned records

49 Index by Multiple Keys(1)  DB-Schema = ( ID-No, Title, Composer, Artist, Label)  Find the record with ID-NO “COL38358” (primary key - ID-No)  Find all the recordings of “Beethoven” (2ndary key - composer)  Find all the recordings titled “Violin Concerto” (2ndary key - title)

50 Index by Multiple Keys(2)  Most people don’t want to search only by primary key  Secondary Key can be duplicated Figure -->  Secondary Key Index secondary key --> consult one additional index (primary key index) BEETHOVEN DG18807

51 Secondary Index:Basic Operations(1)  Record Addition similar to the case of adding to primary index secondary index is stored in canonical form –fixed length (so it can be truncated) –original name can be obtained from the data file can contain duplicate keys local ordering in the same key group

52 Secondary Index:Basic Operations (2)  Record Deletion (2 cases) ¶Secondary index references directly record –delete both primary index and secondary index –rearrange both indexes ·Secondary index references primary key –delete only primary index –leave intact the reference to the deleted record –advantage : fast –disadvantage : deleted records take up space

53 Secondary Index: Basic Operations (3)  Record Updating primary key index serves as a kind of protective buffer ¶Secondary index references directly record – update all files containing record’s location ·Secondary index references primary key (1) –affect secondary index only when either primary or secondary key is changed Continued.

54 Secondary Index: Basic Operations (4) ·Secondary index references primary key(2) Àwhen changes the secondary key –rearrange the secondary key index Áwhen changes the primary key –update all reference field –may require reordering the secondary index Âwhen confined to other fields –do not affect the secondary key index

55 Retrieval of Records  Types primary key access secondary key access combination of above  Combination of keys using secondary key index, it is easy boolean operation (AND, OR)

56 Inverted Lists(1)  Inverted List a secondary key leads to a set of one or more primary keys  Disadvantages of 2nd-ary index structure rearrange when adding repeated entry when duplicating  Solution A: by an array of references  Solution B: by linking the list of references

57 Array of References BEETHOVEN ANG3795 DG DG18807 RCA2626 COREA WAR23699 DVORAK COL31809 PROKOFIEV LON2312 RIMSKY-KORSAKOV MER75016 SPRINGSTEEN COL38358 SWEET HONEY IN THE R FF245 Secondary key Set of primary key references Revised composer index * no need to rearrange * limited reference array * internal fragmentation

58 Inverted Lists (2)  Guidelines for better solution no reorganization when adding no limitation for duplicate key no internal fragmentation  Solution B: by Linking the list of references  A list of primary key references  secondary key field, relative record number of the first corresponding primary key reference PROKOFIEV ANG36193 LON2312

59 Linking List of References (1) BEETHOVEN COREA PROKOFIEV RIMSKY-KORSAKOV SPINGSTEEN SWEET HONEY IN THE R DVORAK LON2312 RCA2626 ANG23699 COL38358 DG18807 MER75016 COL31809 DG ANG36193 WAR FF245 Secondary Index file Label ID List file Improved revision of the composer index

60 Linking List of References (2)  The primary key references in a separate, entry-sequenced file  Advantages rearranges only when secondary key changes rearrangement is quick less penalty associated with keeping the secondary index file on secondary storage (less need for sorting) Label ID List file not need to be sorted reusing the space of deleted record is easy

61 Linking List of References (3)  Disadvantage same secondary key references may not be physically grouped –lack of locality –could involve a large amount of seeking –solution: reside in memory –same Label ID list can hold the lists of a number of secondary index files –if too large in memory, can load only a part of it

62 Selective Indexes  Selective Index: Index on a subset of records  Selective index contains only some part of entire index provide a selective view useful when contents of a file fall into several categories –e.g. 20 < Age < 30 and $1000 < Salary

63 Index Binding(1)  When to bind the key indexes to the physical address of its associated record? ¶File construction time binding (Tight, in-the-data binding) tight binding & faster access the case of primary key when secondary key is bound to that time –simpler and faster retrieval –reorganization of the data file results in modifications of all bound index files

64 Index Binding (2) ·Postpone binding until a record is actually retrieved (Retrieval-time binding) minimal reorganization & safe approach mostly for secondary key  Tight, in-the-data binding is good when static, little or no changes rapid performance during retrieval mass-produced, read-only optical disk

65 Let’s Review (1) 7.1 What is an Index? 7.2 A Simple Index for Entry-Sequenced Files 7.3 Using Template Classes in C++ for Object I/O 7.4 Object-Oriented Support for Indexed, Entry- Sequenced Files of Data Objects 7.5 Indexes That Are Too Large to Hold in Memory

66 Let’s Review(2) 7.6 Indexing to Provide Access by Multiple Keys 7.7 Retrieval Using Combinations of Secondary Keys 7.8 Improving the Secondary Index Structure: Inverted Lists 7.9 Selective Indexes 7.10 Binding