CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12
secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database
The Main Purpose of Index Structures Speedup the search process 3 index σ a=6 (R) blocks contianing the desired tuples quickly figure out disks otherwise have to scan the entire R Example: B+ trees
Hash Tables 4 hush function h search key k h(k)h(k) buckets A bucket is typically a disk block (probably with overflow blocks) h(k), 0 ≤ k ≤ b-1, gives an easy way to compute the bucket address (direct: address from h(k); indirect: h(k) is the index in a directory.
Hash Tables 5 hush function h search key k h(k)h(k) buckets A bucket is typically a disk block (probably with overflow blocks) h(k), 0 ≤ k ≤ b-1, gives an easy way to compute the bucket address (direct: address from h(k); indirect: h(k) is the index in a directory. Ideally, # tuples in R/ b = 60% ~ 70% of the bucket size
How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 6
Linear hashing Another dynamic hashing scheme Two ideas: (a) Use i low order bits of hash grows b i (b) File grows linearly 7
Linear Hashing: General framework 8 k h(k)h(k)h(k)[i] i i 00…00 00…01 1x…xx = n i buckets b h grow linearly no tuples
Example b=4 bits, 2 keys/bucket Future growth buckets 9
Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) n = future growth buckets 10
Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) i = 2 (# used bits = # bits of n) n = Future growth buckets 11
Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) i = 2 (# used bits = # bits of n) n = Future growth buckets 12 Rules: If h(k)[i] < n, then look at bucket h(k)[i]
Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) i = 2 (# used bits = # bits of n) n = Future growth buckets 13 Rules: If h(k)[i] < n, then look at bucket h(k)[i] If h(k)[i] ≥ n, then look at bucket h(k)[i ] - 2 i -1 (i.e., replacing the leading bit 1 of h(k)[i] by 0)
Insertion: b=4 bits, 2 keys/bucket, n=10, i= n = Future growth buckets 1101 can have overflow chains! insert 1101 Rules: If h(k)[i ] < n, then look at bucket h(k)[i] If h(k)[i ] ≥ n, then look at bucket h(k)[i ] - 2 i -1 (i.e., replacing the leading bit 1 of h(k)[i] by 0) 14
Increase size n: b=4 bits, n=10, i= n = n = 10 Future growth buckets 15
Increase size n: b=4 bits, n=11, i= n = Future growth buckets 16
Insert: b=4 bits, n=11, i= n = Future growth buckets insert 1101
Increase size n: b=4 bits, n=11, i= n = n =100
Increase size n: b=4 bits, n=100, i= n = n =100
Increase size n: b=4 bits, n=100, i= n =100
Increase size n: b=4 bits, n=100, i= n =100
Increase size n: b=4 bits, n=100, i= n =100
If U > threshold then increase m (and maybe i ) When do we expand file? Keep track of: # used slots total # of slots = U 23
Linear hashing: Searching input: a search key k \\ h is the hash function, i is the current bit number, \\ n is the current upper bound. 1. m = the last i bits of h(k); 2. IF m ≥ n THEN m = m – 2 i-1 ; 3. read in the disk block B with the address m \\ If k is not in B, you may have to read the \\ overflow blocks. Summary A. 24
Linear hashing: Insertion input: a tuple t with search key k \\ h is the hash function, i is the current bit number. \\ n is the current upper bound 1. m = the last i bits of h(k); 2. IF m ≥ n THEN m = m – 2 i-1 ; 3. read in the disk block B with the address m; insert t \\ If B is full, you need to use an overflow block. Summary B. 25
Linear hashing: Increasing hash table size \\ H is the hash function, i is the current bit number, \\ n is the current upper bound. 1.read in the disk block B of address n – 2 i -1 ; 2.split (properly) the tuples in B and put them in the block B and the block B’of address n; 3. n = n + 1; 4.IF # bits of n is increased THEN i = i + 1. Summary C. 26
Linear hashing: Decreasing hash table size \\ H is the hash function, i is the current bit number, \\ n is the current upper bound. 1.n = n - 1; 2.IF # bits of n is decreased THEN i = i − 1. 3.move the tuples in the block of address n to the block of address n – 2 i -1. Summary D. 27
Linear Hashing Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing Summary E. + + Can still have overflow chains - 28
Example: BAD CASE Very full Very emptyNeed to move m here… Would waste space… 29
Hashing - How it works - Dynamic hashing - Extensible - Linear Summary 30
secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database
Algorithms implementing the relational algebraic operations: Projection and selection Set and bag operations Join operations Grouping, duplicate elimination, sorting Next 32
secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database