Chapter 11: Indexing and Hashing

Slides:



Advertisements
Similar presentations
1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.
Advertisements

External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Hashing and Indexing John Ortiz.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Indexing.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
CST203-2 Database Management Systems Lecture 7. Disadvantages on index structure: We must access an index structure to locate data, or must use binary.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
1 Indexing and Hashing Indexing and Hashing Basic Concepts Dense and Sparse Indices B+Trees, B-trees Dynamic Hashing Comparison of Ordered Indexing and.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
Computing & Information Sciences Kansas State University Friday, 24 Oct 2008CIS 560: Database System Concepts Lecture 23 of 42 Friday, 24 October 2008.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins
COMP3017 Advanced Databases
CS 245: Database System Principles
Relational Database Systems 2
Dynamic Hashing (Chapter 12)
CS232A: Database System Principles INDEXING
Lecture 21: Hash Tables Monday, February 28, 2005.
COMP 430 Intro. to Database Systems
Are they better or worse than a B+Tree?
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
Database Management Systems (CS 564)
CS 245: Database System Principles
External Memory Hashing
Hashing Chapter 11.
Chapter 11: Indexing and Hashing
Introduction to Database Systems
Yan Huang - CSCI5330 Database Implementation – Access Methods
External Memory Hashing
Hash-Based Indexes Chapter 10
External Memory Hashing
CS 245: Database System Principles
Index tuning Hash Index.
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Database Design and Programming
2018, Spring Pusan National University Ki-Joune Li
CPSC-608 Database Systems
CPSC-608 Database Systems
Chapter 11 Instructor: Xin Zhang
Chapter 11: Indexing and Hashing
CS4432: Database Systems II
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files Hashing Static Dynamic Hashing More: bitmap indexing

Hashing Static hashing Dynamic hashing

Hashing key  h(key) <key> Buckets (typically 1 disk block) .

Example hash function Key = ‘x1 x2 … xn’ n byte character string Have b buckets h: add x1 + x2 + ….. xn compute sum modulo b

 This may not be best function … Good hash  Expected number of function: keys/bucket is the same for all buckets

Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & Inserts/Deletes not too frequent

Next: example to illustrate inserts, overflows, deletes h(K)

EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 1 2 3 d a c b e h(e) = 1

Delete: e f c EXAMPLE: deletion a b d c d e f g 1 2 3 maybe move 1 2 3 a d b d c c e maybe move “g” up f g

Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket

How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear

Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(K)  use i  grows over time…. 00110101

(b) Use directory h(K)[i ] to bucket . .

Example: h(k) is 4 bits; 2 keys/bucket New directory 2 00 01 10 11 i = 1 i = 0001 1 1 1001 1 1100 1010 1100 Insert 1010

Example continued 2 0000 0111 0001 i = 2 00 01 10 11 1 0001 0111 2 1001 1010 Insert: 0111 0000 2 1100

Example continued i = 3 2 0000 0001 i = 2 2 0111 1001 1010 2 1001 1010 110 111 3 i = 0000 2 i = 0001 2 00 01 10 11 0111 2 1001 1010 2 1001 1010 Insert: 1001 2 1100

Extensible hashing: deletion No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure)

Deletion example: Run thru insert example in reverse!

Summary Extensible hashing Can handle growing files - with less wasted space - with no full reorganizations + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) -

Advanced indexing Multiple attributes Bitmap indexing

Multiple-Key Access Use multiple indices for certain types of queries. Example: select account-number from account where branch-name = “Perryridge” and balance = 1000 Possible strategies?

Indices on Multiple Attributes where branch-name = “PP” and balance = 1000 Suppose we have an index on combined search-key (branch-name, balance). BB,1000 CC,200 PP,800 PP,1500 AB,200 AA,2000 AA,2300 AA,2500 CC,200 DD,200 DD,300 PP,1000 PP,800 PP,1300 PP,1500 PP,1560 AB,200 AC,200 CC,200 PP,300

where branch-name = “PP” and balance < 1000 Suppose we have an index on combined search-key (branch-name, balance). where branch-name = “PP” and balance < 1000 search pp,0 BB,1000 CC,200 PP,800 PP,1500 AB,200 AA,2000 AA,2300 AA,2500 CC,200 DD,200 DD,300 PP,1000 PP,800 PP,1300 PP,1500 PP,1560 AB,200 AC,200 CC,200 PP,300 search pp,1000

where branch-name < “PP” and balance = 1000? Suppose we have an index on combined search-key (branch-name, balance). NO! where branch-name < “PP” and balance = 1000? BB,1000 CC,200 PP,800 PP,1500 AB,200 AA,2000 AA,2300 AA,2500 CC,200 DD,200 DD,300 PP,1000 PP,800 PP,1300 PP,1500 PP,1560 AB,200 AC,200 CC,200 PP,300

Bitmap Indices An index designed for multiple valued search keys

Bitmap Indices (Cont.) The income-level value of record 3 is L1 Bitmap(size = table size) Unique values of gender Unique values of income-level

Bitmap Indices (Cont.) Some properties of bitmap indices Number of bitmaps for each attribute? Size of each bitmap? When is the bitmap matrix sparse and what attributes are good for bitmap indices?

Bitmap Indices (Cont.) Bitmap indices generally very small compared with relation size E.g. if record is 100 bytes, space for a single bitmap is 1/800 of space used by relation. If number of distinct attribute values is 8, bitmap is only 1% of relation size What about insertion? Deletion?

Bitmap Indices Queries Sample query: Males with income level L1 10010 AND 10100 = 10000 even faster! What about the number of males with income level L1?

Bitmap Indices Queries Queries are answered using bitmap operations Intersection (and) Union (or) Complementation (not)