Indexed Files Part One - Simple Indexes All of this material is stolen from Dr. Foster's CSCI325 Course Notes
Non-Indexed Relative Files Usage direct file manipulation is required when data will not fit into memory Minor Problems: binary searching a file is a bit more difficult than binary searching an array sorting a big file is difficult and slow Major Problems: Time Time - disk operations take a long time!!! Deleting a record from the middle of a file is more difficult than deleted an element from the middle of an array. Adding must be done at end-of-file
Indexed Files An Indexed File is actually two separate, but related, binary files: the Index File the Data File Index File contains information on how to find specific records in the data file. speed Our primary objective is speed searching. adding records gets easier too
Example KeyRRN Eraser4 Folders3 Notebook0 Paper2 Pen1 Product IDDescriptionQtyPrice 37Notebook Pen Paper Folder Eraser This is a simple indexed file. The index to data relationship is 1:1. Can we do this with just one file?
Index File Key field Key field uses a unique identifier same idea as in databases arranged for fast searching e.g., sorted by Key for binary searching Notice that the Index File is much smaller than the Data File. The Index file must fit into memory. What if the Index will not fit into memory? KeyRRN Eraser4 Folders3 Notebook0 Paper2 Pen1
Retrieval Algorithm 1. Read the Index into an array in memory 2. Search the array for the Key 3. File Position = array[index].RRN * sizeof(data record) 4. SeekG (datafile, File Position) 5. Read record from datafile Does step one need to happen for every search? Best Search algorithm? KeyRRN Eraser4 Folders3 Notebook0 Paper2 Pen1 Product IDDescriptionQtyPrice 37Notebook Pen Paper Folder Eraser462.39
Add Record 1. write new record to end of data file 2. add Key and RRN to end of index array 3. sort the index array 4. write index array to index file Does step 4 need to happen for every Add? How do you know the RRN of the New Record? KeyRRN Eraser4 Folders3 Notebook0 Paper2 Pen1 Product IDDescriptionQtyPrice 37Notebook Pen Paper Folder Eraser462.39
Delete Record 1. Locate the appropriate key in the index array 2. move all subsequent array elements up one space 3. Mark record in Data File for deletion 4. Clean up the Data File a) create a new file with only non-deleted records b) adjust RRNs in the Index Array 5. Write new index array into the Index File KeyRRN Eraser4 Folders3 Notebook0 Paper2 Pen1 Product IDDescriptionQtyPrice 37Notebook Pen Paper Folder Eraser When?
Analysis - Indexed v. Non-Indexed Space indexed files use a big chunk of main memory for the index array one more (small) file Time searching an array in memory is much faster than searching a file it is not the comparisons, it is the disk operations Deletion is time consuming, but it is a rare operation
Limitations? Adding Records Deleting Records Searching for Records KeyRRN Eraser4 Folders3 Notebook0 Paper2 Pen1 Product IDDescriptionQtyPrice 37Notebook Pen Paper Folder Eraser462.39