CS4432: Database Systems II

Slides:



Advertisements
Similar presentations
1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.
Advertisements

External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CST203-2 Database Management Systems Lecture 7. Disadvantages on index structure: We must access an index structure to locate data, or must use binary.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 March 4, 2004 INDEXING II Lecture based on [GUW,
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Indexing and Hashing Indexing and Hashing Basic Concepts Dense and Sparse Indices B+Trees, B-trees Dynamic Hashing Comparison of Ordered Indexing and.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
CS 4432lecture #71 CS4432: Database Systems II Lecture #7 Professor Elke A. Rundensteiner.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins
COMP3017 Advanced Databases
CS 245: Database System Principles
Relational Database Systems 2
CS232A: Database System Principles INDEXING
Lecture 21: Hash Tables Monday, February 28, 2005.
Hash-Based Indexes Chapter 11
CPSC-608 Database Systems
CS 245: Database System Principles
External Memory Hashing
Introduction to Database Systems
Yan Huang - CSCI5330 Database Implementation – Access Methods
External Memory Hashing
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hash-Based Indexes Chapter 10
External Memory Hashing
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
CS 245: Database System Principles
Hash-Based Indexes Chapter 11
Index tuning Hash Index.
Database Systems (資料庫系統)
Database Design and Programming
CPS216: Advanced Database Systems
Module 12a: Dynamic Hashing
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
CPSC-608 Database Systems
Hash-Based Indexes Chapter 11
Chapter 11 Instructor: Xin Zhang
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

CS4432: Database Systems II Lecture #12 Professor Elke A. Rundensteiner CS 4432 lecture #11 - indexing & hashing

lecture #11 - indexing & hashing key  h(key) <key> Buckets (typically 1 disk block) . CS 4432 lecture #11 - indexing & hashing

One example hash function Key = ‘x1 x2 … xn’ n-byte character string Have b buckets Example hash function : h: add (x1 + x2 + ….. Xn) modulo b CS 4432 lecture #11 - indexing & hashing

lecture #11 - indexing & hashing  This may not be best function …  Read Knuth Vol. 3 if you really need to select a good function. Good hash  Expected number of function: keys/bucket is the same for all buckets CS 4432 lecture #11 - indexing & hashing

lecture #11 - indexing & hashing Within a bucket: Do we keep keys sorted? Yes, but only if CPU time critical & Inserts/Deletes not too frequent CS 4432 lecture #11 - indexing & hashing

Next: example to illustrate inserts, overflows, deletes h(K) CS 4432 lecture #11 - indexing & hashing

EXAMPLE 2 records/bucket 1 2 3 INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 d a c b e h(e) = 1 CS 4432 lecture #11 - indexing & hashing

lecture #11 - indexing & hashing EXAMPLE: deletion Delete: e f 1 2 3 a d b d c c e maybe move “g” up f g CS 4432 lecture #11 - indexing & hashing

lecture #11 - indexing & hashing Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket CS 4432 lecture #11 - indexing & hashing

How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible hashing Others … CS 4432 lecture #11 - indexing & hashing

Extensible hashing : idea 1 (a) Use i of b bits output by hash function b h(K)  use i  grows over time…. Note: enables future doubling of space ! 00110101 CS 4432 lecture #11 - indexing & hashing

lecture #11 - indexing & hashing Extensible hashing : idea 2 (b) Hash to directory of pointers to buckets (instead of buckets directly) h(K)[i ] to bucket (c) Assume directory of pointers is contiguous Note : Double space by doubling the directory ! . . CS 4432 lecture #11 - indexing & hashing

Example: h(k) is 4 bits; 2 keys/bucket New directory 2 00 01 10 11 i = 1 i = 0001 1 1 1 1001 1 1100 1010 1100 Insert 1010 CS 4432 lecture #11 - indexing & hashing

lecture #11 - indexing & hashing Example continued 2 0000 0111 0001 i = 2 00 01 10 11 1 0001 0111 2 1001 1010 Insert: 0111 0000 2 1100 CS 4432 lecture #11 - indexing & hashing

lecture #11 - indexing & hashing Example continued 000 001 010 011 100 101 110 111 3 i = 0000 2 i = 0001 2 00 01 10 11 0111 2 1001 1010 2 1001 1010 Insert: 1001 2 1100 CS 4432 lecture #11 - indexing & hashing

Extensible hashing: deletion Merge blocks and cut directory if possible (Reverse insert procedure) CS 4432 lecture #11 - indexing & hashing

lecture #11 - indexing & hashing Extensible hashing Summary If directory fits into main memory, then access cost is 1 IO, otherwise 2 IOs Can handle growing files - with less wasted space - with no full reorganizations + + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - CS 4432 lecture #11 - indexing & hashing

lecture #11 - indexing & hashing Use what when : Indexing : Tree-Structures vs Hashing CS 4432 lecture #11 - indexing & hashing

lecture #11 - indexing & hashing Indexing vs Hashing Hashing good for equality probes given key: e.g., SELECT … FROM R WHERE R.A = 5 CS 4432 lecture #11 - indexing & hashing

lecture #11 - indexing & hashing Indexing vs Hashing Tree indexing (Including B Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 CS 4432 lecture #11 - indexing & hashing