CPSC 461 Final Review I Hessam Zakerzadeh Dina Said
9.1) What is the most important difference between a disk and a tape?
Tapes are sequential devices that do not support direct access to a desired page. We must essentially step through all pages in order. Disks support direct access to a desired page.
Exercise 11.4 Answer the following questions about Linear Hashing: 1. How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure?
Linear Hashing No directory More flexibility wrt time for bucket splits Worse performance than Extendible Hashing if data is skewed. Utilizes a family of Hash function h0,h1,… such that hi(v)=h(v) mod 2 i N – N is the initial number of buckets – If N is power of 2 d0, then apply h and look at the last di bits → di=d0+1
Inserting a Data Entry in LH Find bucket by applying h Level / h Level+1 : If the bucket to insert into is full: Add overflow page and insert data entry. (Maybe) split Next bucket and increment Next. Else simply insert the data entry into the bucket.
Bucket Split A split can be triggered by the addition of a new overflow page conditions such as space utilization Whenever a split is triggered, the Next bucket is split, and hash function h Level+1 redistributes entries between this bucket (say bucket number b) and its split image; the split image is therefore bucket number b+N Level. Next Next + 1.
Example: Insert 44 (11100), 9 (01001) 0 h h 1 Level=0, Next=0, N= PRIMAR Y PAGE S 44 * 36 * 32 * 25 * 9*9* 5*5* 14 * 18 * 10 * 30 * 31 * 35 * 11 * 7*7* ( This info is for illustration only!) Next=0
Example: Insert 43 (101011) 0 h h 1 Level=0, N= Next=0 PRIMAR Y PAGE S 44 * 36 * 32 * 25 * 9*9* 5*5* 14 * 18 * 10 * 30 * 31 * 35 * 11 * 7*7* ( This info is for illustration only!) 0 h h 1 Level= Next=1 PRIMAR Y PAGE S OVERFLO W PAGE S * 36 * 32 * 25 * 9*9* 5*5* 14 * 18 * 10 * 30 * 31 * 35 * 11 * 7*7* 43 * ( This info is for illustration only!) ç
Example: End of a Round 0 h h 1 22* Next= Level=0, Next = 3 PRIMARY PAGES OVERFLOW PAGES 32* 9* 5* 14* 25* 66* 10* 18* 34* 35*31* 7* 11* 43* 44*36* 37*29* 30* 0 h h 1 37* Next= PRIMARY PAGES OVERFLOW PAGES 11 32* 9*25* 66* 18* 10* 34* 35* 11* 44* 36* 5* 29* 43* 14* 30* 22* 31*7* 50* Insert 50 (110010) Level=1, Next = 0
Exercise 11.4 Answer the following questions about Linear Hashing: 1. How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure?
If we start with an index which has B buckets, during the round all the buckets will be split in order, one after the other. A hash function is expected to distribute the search key values uniformly in all the buckets A split can be triggered by C onditions such as space utilization → length of the overflow chain reduces. Therefore, number of overflow pages isn't expect to be more than 1
Exercise 11.4 Answer the following questions about Linear Hashing: Does Linear Hashing guarantee at most one disk access to retrieve a record with a given key value?
Exercise 11.4 Answer the following questions about Linear Hashing: Does Linear Hashing guarantee at most one disk access to retrieve a record with a given key value? No. Overflow chains are part of the structure, so no such guarantees are provided
Exercise 11.4 Answer the following questions about Linear Hashing: If a Linear Hashing index using Alternative (1) for data entries contains N records, with P records per page and an average storage utilization of 80 percent, what is the worst- case cost for an equality search? Under what conditions would this cost be the actual search cost?
Maximum Number of records in each page = 0.8 * P If all keys map to the same bucket We will have (N / 0.8P) pages in that bucket. This is the worst time
Exercise 11.4 Answer the following questions about Linear Hashing: If the hash function distributes data entries over the space of bucket numbers in a very skewed (non-uniform) way, what can you say about the space utilization in data pages?
Space utilization = Total Number of buckets / Total Number of pages If data is skewed: All records are mapped to the same bucket Suppose that we have m main pages All records will be mapped to bucket 0 Each additional overflow will cause split Suppose we added n overflow pages to bucket 0 → we added n buckets Total Number of buckets = n+1 Total Number of pages = m + n +n Space Utilization = (n+1) / (m+2n) < 50% → Very bad
13.4
10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page The page is 4K For Pass 0: Ceil(10*10^6 / 320)= Runs Read Cost per Run = ( *320) Write Cost per Run = ( *320) Total I/O cost = No of Runs * (Cost of read + Cost of Write) = * 2* (15+320) → Cost of Pass 0
10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page The page is 4K Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost) No. of passes = ceil (log noOfWay 31250) = ceil ( ln / ln No. of ways) Read/Write Cost: = No. of blocks * ( * No. of pages per block) No. of blocks= Ceil (10*10^6 / No. of pages per block)
10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page b) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost) No. of passes = = ceil ( ln / ln No. of ways) = ceil ( ln / ln 256) = 2 Read Cost: = 16 *10^7
10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page b) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Write Cost: = No. of blocks * ( * No. of pages per block) = * (15+64) No. of blocks= Ceil (10*10^6 / No. of pages per block) = ceil (10* 10^6 /64) =
10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page b) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost) = 2* (16*10^ * (15+64))
10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page e) Create four ‘input’ buffers of 64 pages each, create an ‘output’ buffer of 64 pages, and do four-way merges. Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost) No. of passes = ceil ( ln / ln No. of ways) =8 Read/Write Cost: = No. of blocks * ( * No. of pages per block) = * (15+64) No. of blocks= Ceil (10*10^6 / No. of pages per block) = ceil (10* 10^6 /64) = Total Cost=8 * (2 * * (15+64))