Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 1 Notes #12.

Similar presentations


Presentation on theme: "CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 1 Notes #12."— Presentation transcript:

1 CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu 1 Notes #12

2 secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database

3 The Main Purpose of Index Structures Speedup the search process 3 index σ a=6 (R) blocks contianing the desired tuples quickly figure out disks otherwise have to scan the entire R Example: B+ trees

4 Hash Tables 4 hush function h search key k h(k)h(k) buckets A bucket is typically a disk block (probably with overflow blocks) h(k), 0 ≤ k ≤ b-1, gives an easy way to compute the bucket address (direct: address from h(k); indirect: h(k) is the index in a directory.

5 Hash Tables 5 hush function h search key k h(k)h(k) buckets A bucket is typically a disk block (probably with overflow blocks) h(k), 0 ≤ k ≤ b-1, gives an easy way to compute the bucket address (direct: address from h(k); indirect: h(k) is the index in a directory. Ideally, # tuples in R/ b = 60% ~ 70% of the bucket size

6 How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 6

7 Linear hashing Another dynamic hashing scheme Two ideas: (a) Use i low order bits of hash 01110101 grows b i (b) File grows linearly 7

8 Linear Hashing: General framework 8 k h(k)h(k)h(k)[i] i i 00…00 00…01 1x…xx = n i...... buckets b h grow linearly no tuples

9 Example b=4 bits, 2 keys/bucket 00 01 10 0101 11 0000 10 Future growth buckets 9

10 Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) 00 01 n =10 0101 11 0000 10 future growth buckets 10

11 Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) i = 2 (# used bits = # bits of n) 00 01 n =10 0101 11 0000 10 Future growth buckets 11

12 Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) i = 2 (# used bits = # bits of n) 00 01 n =10 01 11 00 10 Future growth buckets 12 Rules: If h(k)[i] < n, then look at bucket h(k)[i]

13 Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) i = 2 (# used bits = # bits of n) 00 01 n =10 0101 11 0000 10 Future growth buckets 13 Rules: If h(k)[i] < n, then look at bucket h(k)[i] If h(k)[i] ≥ n, then look at bucket h(k)[i ] - 2 i -1 (i.e., replacing the leading bit 1 of h(k)[i] by 0)

14 Insertion: b=4 bits, 2 keys/bucket, n=10, i=2 00 01 n =10 0101 1111 0000 1010 Future growth buckets 1101 can have overflow chains! insert 1101 Rules: If h(k)[i ] < n, then look at bucket h(k)[i] If h(k)[i ] ≥ n, then look at bucket h(k)[i ] - 2 i -1 (i.e., replacing the leading bit 1 of h(k)[i] by 0) 14

15 Increase size n: b=4 bits, n=10, i=2 00 01 n =11 0101 1111 0000 1010 n = 10 Future growth buckets 15

16 Increase size n: b=4 bits, n=11, i=2 00 01 n =11 1010 0101 1111 0000 1010 10 Future growth buckets 16

17 Insert: b=4 bits, n=11, i=2 00 01 n =11 1010 0101 1111 0000 10 Future growth buckets 17 1101 insert 1101

18 Increase size n: b=4 bits, n=11, i=2 00 01 n =11 1010 0101 1111 0000 10 18 1101 n =100

19 Increase size n: b=4 bits, n=100, i=3 00 01 n =11 1010 0101 1111 0000 10 19 1101 n =100

20 Increase size n: b=4 bits, n=100, i=3 00 01 11 1010 0101 1111 0000 10 20 1101 n =100

21 Increase size n: b=4 bits, n=100, i=3 00 01 11 1111 1010 0101 1111 0000 10 21 1101 n =100

22 Increase size n: b=4 bits, n=100, i=3 00 01 11 1111 1010 0101 1101 0000 10 22 0101 n =100

23 If U > threshold then increase m (and maybe i )  When do we expand file? Keep track of: # used slots total # of slots = U 23

24 Linear hashing: Searching input: a search key k \\ h is the hash function, i is the current bit number, \\ n is the current upper bound. 1. m = the last i bits of h(k); 2. IF m ≥ n THEN m = m – 2 i-1 ; 3. read in the disk block B with the address m \\ If k is not in B, you may have to read the \\ overflow blocks. Summary A. 24

25 Linear hashing: Insertion input: a tuple t with search key k \\ h is the hash function, i is the current bit number. \\ n is the current upper bound 1. m = the last i bits of h(k); 2. IF m ≥ n THEN m = m – 2 i-1 ; 3. read in the disk block B with the address m; insert t \\ If B is full, you need to use an overflow block. Summary B. 25

26 Linear hashing: Increasing hash table size \\ H is the hash function, i is the current bit number, \\ n is the current upper bound. 1.read in the disk block B of address n – 2 i -1 ; 2.split (properly) the tuples in B and put them in the block B and the block B’of address n; 3. n = n + 1; 4.IF # bits of n is increased THEN i = i + 1. Summary C. 26

27 Linear hashing: Decreasing hash table size \\ H is the hash function, i is the current bit number, \\ n is the current upper bound. 1.n = n - 1; 2.IF # bits of n is decreased THEN i = i − 1. 3.move the tuples in the block of address n to the block of address n – 2 i -1. Summary D. 27

28 Linear Hashing Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing Summary E. + + Can still have overflow chains - 28

29 Example: BAD CASE Very full Very emptyNeed to move m here… Would waste space… 29

30 Hashing - How it works - Dynamic hashing - Extensible - Linear Summary 30

31 secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database

32 Algorithms implementing the relational algebraic operations: Projection and selection Set and bag operations Join operations Grouping, duplicate elimination, sorting Next 32

33 secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database


Download ppt "CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 1 Notes #12."

Similar presentations


Ads by Google