Lecture 21: Hash Tables Monday, February 28, 2005.

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Additional notes on Hashing And notes on HW4. Selected Answers to the Last Assignment The records will hash to the following buckets: K h(K) (bucket number)
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
Hash Table indexing and Secondary Storage Hashing.
External Memory Hashing. Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key.
Chapter 13 Hash Tables Section 13.4 CS 257 Dr. T.Y.Lin Abhishek Pandya ID
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
HASH TABLES Malathi Mansanpally CS_257 ID-220. Agenda: Extensible Hash Tables Insertion Into Extensible Hash Tables Linear Hash Tables Insertion Into.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
FALL 2005 CENG 351 Data Management and File Structures 1 Hashing.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
CENG Hashing for files. CENG 3512 Introduction Idea: to reference items in a table directly by doing arithmetic operations to transform keys into.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Indexing Goals: Store large files Support multiple search keys
Dynamic Hashing (Chapter 12)
Hash-Based Indexes Chapter 11
Hashing CENG 351.
CPSC-608 Database Systems
Database Management Systems (CS 564)
Dynamic Hashing.
External Memory Hashing
Extendible Indexing Dina Said
Introduction to Database Systems
External Memory Hashing
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hash-Based Indexes Chapter 10
External Memory Hashing
Extendible Hashing Primarily used for storage of files on disk
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hashing.
CSE 544: Lectures 13 and 14 Storing Data, Indexes
Hash-Based Indexes Chapter 11
Index tuning Hash Index.
Lecture 6: Data Storage and Indexes
Database Systems (資料庫系統)
Database Design and Programming
DATABASE IMPLEMENTATION ISSUES
2018, Spring Pusan National University Ki-Joune Li
CPS216: Advanced Database Systems
Module 12a: Dynamic Hashing
CSE 326: Data Structures: Sorting
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
CPSC-608 Database Systems
Monday, 5/13/2002 Hash table indexes, query optimization
Wednesday, 5/8/2002 Hash table indexes, physical operators
Hash-Based Indexes Chapter 11
Database Implementation Issues
CMSC 341 Extensible Hashing.
Chapter 11 Instructor: Xin Zhang
Lecture 20: Indexes Monday, February 27, 2006.
CS4433 Database Systems Indexing.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
CS4432: Database Systems II
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Lecture 21: Hash Tables Monday, February 28, 2005

Outline Hash-tables (13.4) Note: Used as indexes (far less than B+ trees) Used during query processing (a lot !!)

Hash Tables Secondary storage hash tables are much like main memory ones Recall basics: There are n buckets A hash function f(k) maps a key k to {0, 1, …, n-1} Store in bucket f(k) a pointer to record with key k Secondary storage: bucket = block, use overflow blocks when needed

Hash Table Example Assume 1 bucket (block) stores 2 keys + pointers h(e)=0 h(b)=h(f)=1 h(g)=2 h(a)=h(c)=3 e b f g a c 1 2 3 Here: h(x) = x mod 4

Searching in a Hash Table Search for a: Compute h(a)=3 Read bucket 3 1 disk access e b f g a c 1 2 3

Insertion in Hash Table Place in right bucket, if space E.g. h(d)=2 e b f g d a c 1 2 3

Insertion in Hash Table Create overflow block, if no space E.g. h(k)=1 More over- flow blocks may be needed e b f g d a c k 1 2 3

Hash Table Performance Excellent, if no overflow blocks Degrades considerably when number of keys exceeds the number of buckets (I.e. many overflow blocks).

Extensible Hash Table Allows has table to grow, to avoid performance degradation Assume a hash function h that returns numbers in {0, …, 2k – 1} Start with n = 2i << 2k , only look at first i most significant bits

Extensible Hash Table E.g. i=1, n=2i=2, k=4 Note: we only look at the first bit (0 or 1) i=1 0(010) 1 1 1(011) 1

Insertion in Extensible Hash Table 0(010) 1 1 1(011) 1(110) 1

Insertion in Extensible Hash Table Now insert 1010 Need to extend table, split blocks i becomes 2 i=1 0(010) 1 1 1(011) 1(110), 1(010) 1

Insertion in Extensible Hash Table 0(010) 1 00 01 10(11) 10(10) 2 10 11 11(10) 2

Insertion in Extensible Hash Table Now insert 0000, then 0101 Need to split block i=2 0(010) 0(000), 0(101) 1 00 01 10(11) 10(10) 2 10 11 11(10) 2

Insertion in Extensible Hash Table After splitting the block 00(10) 00(00) 2 i=2 01(01) 2 00 01 10(11) 10(10) 2 10 11 11(10) 2

Extensible Hash Table How many buckets (blocks) do we need to touch after an insertion ? How many entries in the hash table do we need to touch after an insertion ?

Performance Extensible Hash Table No overflow blocks: access always one read BUT: Extensions can be costly and disruptive After an extension table may no longer fit in memory

Linear Hash Table Idea: extend only one entry at a time Problem: n= no longer a power of 2 Let i be such that 2i <= n < 2i+1 After computing h(k), use last i bits: If last i bits represent a number > n, change msb from 1 to 0 (get a number <= n)

Linear Hash Table Example (01)00 (11)00 i=2 (01)11 BIT FLIP 00 01 (10)10 10

Linear Hash Table Example Insert 1000: overflow blocks… (01)00 (11)00 (10)00 i=2 (01)11 00 01 (10)10 10

Linear Hash Tables Extension: independent on overflow blocks Extend n:=n+1 when average number of records per block exceeds (say) 80%

Linear Hash Table Extension From n=3 to n=4 Only need to touch one block (which one ?) (01)00 (11)00 (01)00 (11)00 i=2 (01)11 00 (01)11 i=2 01 (10)10 10 (10)10 00 01 (01)11 10 n=11

Linear Hash Table Extension From n=3 to n=4 finished Extension from n=4 to n=5 (new bit) Need to touch every single block (why ?) (01)00 (11)00 i=2 (10)10 00 01 (01)11 10 11