Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 1 Notes #9.

Similar presentations


Presentation on theme: "CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 1 Notes #9."— Presentation transcript:

1 CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu 1 Notes #9

2 Linear hashing Another dynamic hashing scheme Two ideas: (a) Use i low order bits of hash 01110101 grows b i (b) File grows linearly 2

3 Example b=4 bits, i =2, 2 keys/bucket 00 01 1011 0101 1111 0000 1010 m = 01 (max used block) Future growth buckets 3

4 Example b=4 bits, i =2, 2 keys/bucket 00 01 1011 0101 1111 0000 1010 m = 01 (max used block) Future growth buckets 4

5 Example b=4 bits, i =2, 2 keys/bucket 00 01 1011 01 1111 00 1010 m = 01 (max used block) Future growth buckets If h(k)[i ]  m, then look at bucket h(k)[i ] Rule 5

6 Example b=4 bits, i =2, 2 keys/bucket 00 01 1011 01 11 00 10 m = 01 (max used block) Future growth buckets If h(k)[i ]  m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2 i -1 Rule 6

7 Example b=4 bits, i =2, 2 keys/bucket 00 01 1011 0101 1111 0000 1010 m = 01 (max used block) Future growth buckets 0101 can have overflow chains! insert 0101 If h(k)[i ]  m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2 i -1 Rule 7

8 Note In textbook, n is used instead of m n=m+1 00 01 1011 0101 1111 0000 1010 m = 01 (max used block) Future growth buckets n=10 8

9 Example b=4 bits, i =2, 2 keys/bucket 00 01 1011 0101 1111 0000 1010 m = 01 (max used block) Future growth buckets 10 9

10 Example b=4 bits, i =2, 2 keys/bucket 00 01 1011 0101 1111 0000 1010 m = 01 (max used block) Future growth buckets 10 1010 10

11 Example b=4 bits, i =2, 2 keys/bucket 00 01 1011 0101 1111 0000 1010 m = 01 (max used block) Future growth buckets 10 1010 0101 insert 0101 11

12 Example b=4 bits, i =2, 2 keys/bucket 00 01 1011 0101 1111 0000 1010 m = 01 (max used block) Future growth buckets 10 1010 0101 insert 0101 11 12

13 Example b=4 bits, i =2, 2 keys/bucket 00 01 1011 0101 1111 0000 1010 m = 01 (max used block) Future growth buckets 10 1010 0101 insert 0101 11 1111 0101 13

14 Example Continued: How to grow beyond this? 00 01 1011 111110100101 0000 m = 11 (max used block) i = 2... 14

15 Example Continued: How to grow beyond this? 00 01 1011 111110100101 0000 m = 11 (max used block) i = 2 0000 100 101 110 111 15

16 Example Continued: How to grow beyond this? 00 01 1011 111110100101 0000 m = 11 (max used block) i = 2 0000 100 101 110 111... 100 16

17 Example Continued: How to grow beyond this? 00 01 1011 111110100101 0000 m = 11 (max used block) i = 2 0000 100 101 110 111 3... 100 17

18 Example Continued: How to grow beyond this? 00 01 1011 111110100101 0000 m = 11 (max used block) i = 2 0000 100 101 110 111 3... 100 101 0101 18

19 If U > threshold then increase m (and maybe i )  When do we expand file? Keep track of: # used slots total # of slots = U 19

20 Linear Hashing Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing Summary + + Can still have overflow chains - 20

21 Example: BAD CASE Very full Very emptyNeed to move m here… Would waste space… 21

22 Hashing - How it works - Dynamic hashing - Extensible - Linear Summary 22

23 Next: Indexing vs. hashing Index definition in SQL Multiple key access 23

24 Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 Indexing vs. Hashing 24

25 INDEXING (including B-trees) good for range searches : e.g., SELECT FROM R WHERE R.A > 5 Indexing vs. Hashing 25

26 Index definition in SQL Create index name on rel (attr) Create unique index name on rel (attr) defines candidate key Drop INDEX name 26

27 CANNOT SPECIFY TYPE OF INDEX (e.g. B-tree, hashing, …) OR PARAMETERS (e.g. load factor, size of hash, …) … at least not in SQL … Note 27

28 ATTRIBUTE LIST  MULTIKEY INDEX (next) e.g., CREATE INDEX foo ON R(A,B,C) Note 28

29 Motivation: Find records where DEPT = “Toy” AND SAL > 50k Multi-key Index 29

30 Strategy I: Use one index, say Dept. Get all Dept = “Toy” records and check their salary I1I1 30

31 Use 2 indexes; manipulate pointers ToySal > 50k Strategy II: 31

32 Multiple key index One idea: Strategy III: I1I1 I 21 I 22 32

33 Example Record Dept Index Salary Index Name=Joe DEPT=Sales SAL=15k Art Sales Toy 10k 15k 17k 21k 12k 15k 19k 33

34 For which queries is this index good? Find RECs Dept = “Sales” SAL=20k Find RECs Dept = “Sales” SAL > 20k Find RECs Dept = “Sales” Find RECs SAL = 20k 34

35 Interesting application: Geographic data DATA: x y... 35

36 Queries: What city is at ? What is within 5 miles from ? Which is closest point to ? 36

37 h n b i a c o d Example e g f m l k j 37

38 h n b i a c o d Example e g f m l k j 10 20 38

39 h n b i a c o d Example e g f m l k j 10 20 25 15 3520 40 30 20 10 39

40 h n b i a c o d Example e g f m l k j 10 20 25 15 3520 40 30 20 10 5 15 40

41 h n b i a c o d Example e g f m l k j 10 20 25 15 3520 40 30 20 10 5 15 h i a b c d e f g n o m l j k 41

42 h n b i a c o d Example e g f m l k j 10 20 25 15 3520 40 30 20 10 5 15 h i a b c d e f g n o m l j k Search points near f Search points near b 42

43 Queries Find points with Y i > 20 Find points with X i < 5 Find points “close” to i = Find points “close” to b = 43

44 Many types of geographic index structures have been suggested –Quad trees –R trees 44

45 Two more types of multi key indexes Grid Partitioned hash 45

46 Grid Index Key 2 X 1 X 2 …… X n V 1 V 2 Key 1 V n To records with key 1 =V 3, key 2 =X 2 46

47 CLAIM Can quickly find records with –key 1 = V i  Key 2 = X j –key 1 = V i –key 2 = X j And also ranges… –E.g., key 1  V i  key 2 < X j 47

48  But there is a catch with grid indexes! 48

49  But there is a catch with grid indexes! How is grid index stored on disk? Like Arra y... X1X2X3X4X1X2X3X4X1X2X3X4 V1V2V3V1V2V3 49

50  But there is a catch with grid indexes! How is grid index stored on disk? Like Arra y... X1X2X3X4 X1X2X3X4X1X2X3X4 V1V2V3V1V2V3 Problem: Need regularity so we can compute position of entry 50

51 Solution: Use Indirection Buckets V 1 V 2 V 3 * Grid only V 4 contains pointers to buckets Buckets -- X1 X2 X3 51

52 With indirection: Grid can be regular without wasting space We do have price of indirection 52

53 Can also index grid on value ranges SalaryGrid Linear Scale 123 ToySalesPersonnel 0-20K1 20K-50K2 50K-3 8 53

54 Grid files Good for multiple-key search Space, management overhead (nothing is free) Need partitioning ranges that evenly split keys + - - 54

55 Idea: Key 1 Key 2 Partitioned hash function h1h2 010110 1110010 55

56 h1(toy)=0000 h1(sales)=1001 h1(art)=1010.011. h2(10k)=01100 h2(20k)=11101 h2(30k)=01110 h2(40k)=00111., EX: Insert 56

57 h1(toy)=0000 h1(sales)=1001 h1(art)=1010.011. h2(10k)=01100 h2(20k)=11101 h2(30k)=01110 h2(40k)=00111., EX: Insert 57

58 h1(toy)=0000 h1(sales)=1001 h1(art)=1010.011. h2(10k)=01100 h2(20k)=11101 h2(30k)=01110 h2(40k)=00111. Find Emp. with Dept. = Sales  Sal=40k 58

59 h1(toy)=0000 h1(sales)=1001 h1(art)=1010.011. h2(10k)=01100 h2(20k)=11101 h2(30k)=01110 h2(40k)=00111. Find Emp. with Sal=30k look here! 59

60 h1(toy)=0000 h1(sales)=1001 h1(art)=1010.011. h2(10k)=01100 h2(20k)=11101 h2(30k)=01110 h2(40k)=00111. Find Emp. with Dept. = Sales look here! 60

61 Hashing discussion: - Indexing vs. hashing - SQL index definition - Multiple key access - Multi-key index Variations: grid, geo data - Partitioned hash Summary 61


Download ppt "CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 1 Notes #9."

Similar presentations


Ads by Google