Chapter 13.4 Hash Tables Steve Ikeoka ID: 113 CS 257 – Spring 2008.

Slides:



Advertisements
Similar presentations
1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
CSCE 3400 Data Structures & Algorithm Analysis
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
CPSC 404, Laks V.S. Lakshmanan1 Hash-Based Indexes Chapter 11 Ramakrishnan & Gehrke (Sections )
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
Copyright 2003Curt Hill Hash indexes Are they better or worse than a B+Tree?
Data Structures Using C++ 2E
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
COMP 451/651 B-Trees Size and Lookup Chapter 1.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
Hash Table indexing and Secondary Storage Hashing.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
External Memory Hashing. Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key.
Chapter 13 Hash Tables Section 13.4 CS 257 Dr. T.Y.Lin Abhishek Pandya ID
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
HASH TABLES Malathi Mansanpally CS_257 ID-220. Agenda: Extensible Hash Tables Insertion Into Extensible Hash Tables Linear Hash Tables Insertion Into.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
Primary Indexes Dense Indexes
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
1 CMSC 341 Extensible Hashing Chapter 5, Section 6 (pp. 200 – 203)
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
Dynamic Hashing (Chapter 12)
Are they better or worse than a B+Tree?
Hash-Based Indexes Chapter 11
Hashing CENG 351.
CPSC-608 Database Systems
Insert using Linear Hashing
Introduction to Database Systems
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hash-Based Indexes Chapter 10
External Memory Hashing
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hashing.
Hash-Based Indexes Chapter 11
Index tuning Hash Index.
Database Design and Programming
Module 12a: Dynamic Hashing
CPSC-608 Database Systems
Index tuning Hash Index.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Presentation transcript:

Chapter 13.4 Hash Tables Steve Ikeoka ID: 113 CS 257 – Spring 2008

Topics Covered Previously Secondary-Storage Hash Tables Insertion into a Hash Table Hash-Table Deletion Efficiency of Hash Table Indexes Extensible Hash Tables

Insertion into Extensible Hash Tables To insert a record with search key K Compute h(K) Take the first i bits of sequence Go to bucket array indexed by those i bits If the block B has room Put record in block B Else there are two possibilities Depending on j, how many bits of h(K) used to determine membership in block B

Insertion into Extensible Hash Tables (cont’d) If j<i Split block B into two blocks Distribute records based on (j+1)st bit Put j+1 in each block’s “nub” Adjust points in bucket array to point to B or the new block, depending on (j+1)st bit If all records of B go into one of block, repeat process with next higher j on overfull block

Insertion into Extensible Hash Tables (cont’d) If j=i Increment i by 1 Doubles bucket array length to 2 i+1 entries w is sequence of i bits to index bucket array Entries indexed by w0 and w1 each point to block that the w entry pointed to Membership in the block still determined by same number of bits as previously used Split block B as in previous case since j<i

Example Start with the table on the right Insert record with h(K) = 1010 First bit is 1 so put in 2 nd block Block is full, so split it Double bucket array, set i=2 Set j=2 in the split blocks 1001 and 1010 go in block goes in block 11 i= i=

Example (cont’d) Insert records with hash values 0000 and 0111 Both go in the first block Since j=1 < i=2 Just split the block Both blocks now have j= and 0001 stay in the first block 0111 goes to the new block Entry for 01 in bucket array points to new block i=

Example (cont’d) Insert record with h(K) = 1000 The block for 10 overflows Split the bucket array, set i=3 Block 10 splits into blocks for 100 and 101, with j=3 Other blocks stay j=2 i=

Problems with Extensible Hash Tables Substantial work to double bucket array size When the bucket array is doubled in size, it may no longer fit in main memory If # of records per block is small, likely that a block will split before the logical time to do so If 2 records per block and three records share a 20 bit sequence that begins their keys Would need i=20 and a million-bucket array

Linear Hash Tables Avg # of records per block is a fixed fraction, say 80%, of the # of records that fill one block. Overflow blocks are permitted Avg # of overflow blocks will be less than 1 If n buckets, use  log 2 n  rightmost bits of sequence produced by hash function

Linear Hash Tables (cont’d) Suppose i bits of the key is being used m is the i-bit binary integer a 1 a 2 …a i If m<n, bucket m exists, so place the record in that bucket If n≤m<2 i, then bucket m doesn’t exist, so place the record in bucket m-2 i-1 (change a 1, which must be 1, to 0)

Example # of buckets, n = 2 Hash function h produces 4 bits Using only 1 bit Records go in 1 st bucket if hash value ends in 0 Records go in 2 nd bucket if hash value ends in 1 Choose n so r ≤ 1.7n Average occupancy does not 85% of capacity i=1 n=2 r=

Insertion into Linear Hash Tables Compute h(K), where K is key of record Use i bits at end of h(K) as bucket number m If m<n put record in bucket m If m  n put record in bucket m-2 i-1 If no room in bucket, create overflow block After each insert, compare # of records r with the threshold ratio r/n Add next bucket to table if ratio is too high If n exceeds 2 i, increment i by 1

Example Insert record with h(K)=0101 Place record in 2 nd bucket Exceed the threshold ratio Raise n to 3, i =  log 2 3  = 2 Split the bucket 00 Keys ending in 00 stay in 1 st bucket Keys ending in 01 go to new bucket i=2 n=3 r=

Example (cont’d) Insert record with h(K) = 0001 Place in bucket 01, which exists Block is full, so add overflow block 5/3 < 1.7 so no new bucket i=2 n=3 r=

Example (cont’d) Insert record with h(K) = 0111 Bucket 11 does not exist yet Redirect to bucket 01 (change first bit to 0) Fits in bucket’s overflow block Ratio 6/3 > 1.7, so create a new bucket, 11 Split records in bucket 01 Records ending in 01 stay in bucket 01 Records ending in 11 go to bucket 11 Can delete the overflow block

Example (cont’d) Next insert would exceed the 1.7 ratio Would raise n to 5 i would become 3 i=2 n=4 r=

Lookup in a Linear Hash Table Lookup follows same procedure as selecting the bucket for insertion Lookup a record with h(K) = 1010 last two bits are 10 so m = 2 m<n so bucket 10 exists A record with h(K) = 1010 exists Need to examine the complete key of the record to be sure it is the one we want i=2 n=3 r=

Lookup in a Linear Hash Table Lookup a record with h(K) = 1011 last two bits are 11 so m = 3 m  n so bucket 11 does not exist Redirect to bucket 01 by changing leading 1 to 0 Bucket 01 has no record with h(K) = 1011 So desired record is not in the hash table