CS CS4432: Database Systems II
CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop INDEX name
CS ATTRIBUTE LIST MULTIKEY INDEX e.g., CREATE INDEX foo ON R(A,B,C) Note
CS Motivation: Find records where DEPT = “Toy” AND SAL > 50k Multi-key Index
CS Strategy I: Use one index, say Dept. Get all Dept = “Toy” records and check their salary I1I1
CS Use 2 Indexes; Manipulate Pointers ToySal > 50k Strategy II:
CS Multiple Key Index One idea: Strategy III: I1I1 I2I2 I3I3
CS Example Record Dept Index Salary Index Name=Joe DEPT=Sales SAL=15k Art Sales Toy 10k 15k 17k 21k 12k 15k 19k
CS For which queries is this index good? Find RECs Dept = “Sales” SAL=20k Find RECs Dept = “Sales” SAL > 20k Find RECs Dept = “Sales” Find RECs SAL = 20k
CS Many alternate methods for indexing
CS key h(key) Hashing Buckets (typically 1 disk block)
CS One example hash function Key = ‘x 1 x 2 … x n ’ n-byte character string Have b buckets Hash function : –h: add (x 1 + x 2 + ….. X n) modulo b
CS This may not be best function … Read Knuth Vol. 3 if you really need to select a good function. Good hash Expected number of function:keys/bucket is the same for all buckets
CS Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & Inserts/Deletes not too frequent
CS Next: example to illustrate inserts, overflows, deletes h(K)
CS EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = d a c b h(e) = 1 e
CS a b c e d EXAMPLE: deletion Delete: e f f g maybe move “g” up c d
CS Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket
CS How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible hashing Others …
CS Extensible hashing : idea 1 (a) Use i of b bits output by hash function b h(K) use i grows over time…. Note: enables future doubling of space !
CS (b) Hash to directory of pointers to buckets (instead of buckets directly) h(K)[i ] to bucket Note : Double space by doubling the directory ! Extensible hashing : idea 2
CS Example: h(k) is 4 bits; 2 keys/bucket i = Insert New directory i =
CS Insert: i = Example continued
CS i = Insert: 1001 Example continued i = 3 3
CS Extensible hashing: deletion Merge blocks and cut directory if possible (Reverse insert procedure)
CS Extensible hashing If directory fits into main memory, then access cost is 1 IO, otherwise 2 IOs Can handle growing files - with less wasted space - with no full reorganizations Summary + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - - +
CS Use what when : Indexing : Tree-Structures vs Hashing
CS Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 Indexing vs Hashing
CS INDEXING (Including B Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 Indexing vs Hashing
CS Reading Chapter 14 Read – and
CS The BIG picture…. Chapters 11 & 12: Storage, records, blocks... Chapter 13 & 14: Access Mechanisms - Indexes - B trees - Hashing - Multi key Chapter 15 & 16: Query Processing NEXT