Announcements Today –RAID –Begin Indexes Program 1 due Friday –Office Hours today 2-3 pm –I’ll have limited contact over the weekend –later today I’ll give info for turning in the program
RAID Redundant Arrays of Inexpensive Disks Goal of RAID is to even out rates of disk improvements (small) w/ those in RAM and CPU RAID use multiple physical disks to behave as a single logical disk
Data Striping Data striping stores data across multiple disks There are different granularities bit level granularity block level granularity
Naïve Striping Reduces Reliability Likelihood of failure increases w/ # of disks –Mirroring, error correcting codes are used to increase reliability at the expense of speed But is this statement correct? –(from Section ) “For an array of n disks, the likelihood of failure is n times as much as that for one disk. Hence, if the MTTF of a disk drive is 200,000 hours (22.8 years), that of a bank of 100 disk drives becomes only 2000 hours (83 days)”
RAID Organizations balance speed and reliability
Indexing Structures for Files Chapter 14
“If you don’t find it in the index, look very carefully through the whole catalog” - Sears, Roebuck and Co. consumers’ Guide, 1897
Indexes provide alternative access paths Query: Find record for student “Troy Allen” Index on “name” Step 1: query the index for the RID for the record (hopefully a few IOs) Step 2: query the buffer manager for the appropriate block (1 IO) RID = (3438, 9)“Troy Allen”
An index Is a collection of data entries Is associated with a specific file Is associated with a specific field called the indexing field (sometimes called the search or key field) Contains data so that BlkIDs (or RIDs) whose indexing fields match a given value can be found quickly
Some Considerations What is the organization of the underlying file –Eg, is it ordered on the search key? Are the values of the indexing field unique (ie, is the indexing field a key field)? How are the data entries of the index organized? –Example: make index a hashed file on index field where each record contains (value, RID) pairs
Some Definitions primary index: an index on the ordering key field of a ordered file secondary index: an index on any non-ordering field of the file clustered index: an index whose data entries are ordered in the same way as the underlying file dense index: has an index entry for every search key value (and hence every record) in the data file. sparse index: has index entries for only some of the search values
Primary Index A primary index is an ordered file of pairs A record is stored for each block in the file. The records for Blk B contains the value of the first record on that block
Cost of Maintaining a Clustered Primary Index Inserting of record in the ordered file (already expensive) may require significant updates to the index –Why is this?
Clustering Indexes Recall a clustering index is an index on a non-key ordering field of an ordered file What do we need to store in the index? –as with pri idx, pairs –but now we need a record in the index for every unique value of the indexing key –the blk field of the index gives the first block that a record for value appears on
one way to handle the “insert” problem of ordered files
Secondary Indexes An index on field that is not the ordering field of the underlying file The indexing field may or may not be a key field for the file What is the format for records in a secondary index on a key field? How many records are needed?
More Secondary Indexes What if the indexing field is not a key field? –Option 1: Keep index entry for each record, so we will have multiple index entries for each value –Option 2: Have one record / value and store a “RID list” for each value. Thus the index records are variable length records –Option 3: Mixed type of index records (next slide)
Properties of Index Types
SQL to Create an Index CREATE INDEX idxAge ON Students WITH STRUCTURE = BTREE KEY = (age)
Next time: Multilevel indexes