Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID
Directory of Slots Example
Heap Files Heap files are stored as unordered records –the use of “heap” here is unrelated to the “free store” used for dynamic memory allocation The simplest organization
Heap File Example Paradise, Sal231 Favor, Sue123 Mach, Chris401 Rodgers, Bill616 Smith, Mary Yost, Ned819 Alm, Louis Link, Steve Patch, Linda Jones, Jim Ming, Yao Turing, Alan block 1 of fileblock 2 of fileblock N of file Name ID assuming N data blocks, R records per block, I/O cost of D and “record processing time” of C what are the costs of: 2D+C N(D+RC) N(D+RC)/2 N(D+RC) inserting a record, deleting record given RID (ignore reclaiming space) scan, search for “key” w/ equality selection, search w/ range selection?
Prog 1 Hints check slot numbers to make sure they are valid sizeof(), memcpy(), memmove() –man is your friend test patterns – make sure you handle error cases correctly error reporting
class HFPage { struct slot_t { short offset; short length; }; // equals EMPTY_SLOT if slot is not in use static const int DPFIXED = sizeof(slot_t) + 4 * sizeof(short)+ 3 * sizeof(PageId); short slotCnt; // number of slots in use short usedPtr; // offset of first used byte in data[] short freeSpace; // number of bytes free in data[] short type; // an arbitrary value used by subclasses as needed PageId prevPage; // backward pointer to data page PageId nextPage; // forward pointer to data page PageId curPage; // page number of this page slot_t slot[1]; // first element of slot array. char data[MAX_SPACE - DPFIXED]; // methods...
// ********************************************************** // page class constructor void HFPage::init(PageId pageNo){ nextPage = prevPage = INVALID_PAGE; slotCnt = 0; // no slots in use curPage = pageNo; usedPtr = sizeof(data); // offset of used space in data array freeSpace = sizeof(data) + sizeof(slot_t); // amount of space available // (initially one unused slot) }
init() getNextPage(), setNextPage() getPrevPage(), setPrevPage() insertRecord(), deleteRecord() firstRecord(), nextRecord() getRecord(), returnRecord() available_space() empty()
int HFPage::available_space(void) { // look for an empty slot. if one exists, then freeSpace // bytes are available to hold a record. int i; for (i=0; i < slotCnt; i++) { if (slot[i].length == EMPTY_SLOT) return freeSpace; } // no empty slot exists. must reserve sizeof(slot_t) bytes // from freeSpace to hold new slot. return freeSpace - sizeof(slot_t); }
Ordered Files Also called a sequential file. File records are kept sorted by the values of an ordering field. Insertion is expensive: records must be inserted in the correct order. –It is common to keep a separate unordered overflow (or transaction) file for new records to improve insertion efficiency; this is periodically merged with the main ordered file. A binary search can be used to search for a record on its ordering field value. –This requires reading and searching log 2 of the file blocks on the average, an improvement over linear search. Reading the records in order of the ordering field is quite efficient.
File of Ordered Records
Hashed Files Hashing for disk files is called External Hashing The file blocks are divided into M equal-sized buckets, numbered bucket 0, bucket 1,..., bucket M-1. –Typically, a bucket corresponds to one (or a fixed number of) disk block. One of the file fields is designated to be the hash key of the file. The record with hash key value K is stored in bucket i, where i=h(K), and h is the hashing function. Search is very efficient on the hash key. Collisions occur when a new record hashes to a bucket that is already full. –An overflow file is kept for storing such records. –Overflow records that hash to each bucket can be linked together.
Hashed Files (contd.)
There are numerous methods for collision resolution, including the following: –Open addressing: Proceeding from the occupied position specified by the hash address, the program checks the subsequent positions in order until an unused (empty) position is found. –Chaining: For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. In addition, a pointer field is added to each record location. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location. –Multiple hashing: The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.
Hashed Files (contd.) To reduce overflow records, a hash file is typically kept 70-80% full. The hash function h should distribute the records uniformly among the buckets –Otherwise, search time will be increased because many overflow records will exist. Main disadvantages of static external hashing: –Fixed number of buckets M is a problem if the number of records in the file grows or shrinks. –Ordered access on the hash key is quite inefficient (requires sorting the records).
Hashed Files - Overflow handling
Fill in This Table Heap Sorted Hashed scan equality range insert delete search search