Module 12a: Dynamic Hashing

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hashing Dashiell Fryer CS 157B Dr. Lee. Contents Static Hashing Static Hashing File OrganizationFile Organization Properties of the Hash FunctionProperties.
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function.
Indexing (Cont.) These slides are a modified version of the slides of the book “Database System Concepts” (Chapter 12), 5th Ed., McGraw-Hill,McGraw-Hill.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Indexing.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
Arch (A) Tented Arch (T) Whorl (W) Loop (U or R) The four main classes of fingerprints Loop (60%) Arch/Tented Arch (6%) Whorl (34%) Other (Less than 1%)
B+-tree and Hashing.
1 Indexing and Hashing Indexing and Hashing Basic Concepts Dense and Sparse Indices B+Trees, B-trees Dynamic Hashing Comparison of Ordered Indexing and.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Computing & Information Sciences Kansas State University Friday, 24 Oct 2008CIS 560: Database System Concepts Lecture 23 of 42 Friday, 24 October 2008.
Chapter 12: Indexing and Hashing
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Chapter 5 Record Storage and Primary File Organizations
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Indexing and hashing.
Dynamic Hashing (Chapter 12)
Hashing CENG 351.
Chapter 12: Indexing and Hashing
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
Database Management Systems (CS 564)
Module 12: Indexing.
Dynamic Hashing.
External Memory Hashing
Chapter 11: Indexing and Hashing
Extendible Indexing Dina Said
Chapter 11: Indexing and Hashing
Chapter 12: Indexing and Hashing
Introduction to Database Systems
External Memory Hashing
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
External Memory Hashing
Chapter 12: Indexing and Hashing
Indexing and Hashing Basic Concepts Ordered Indices
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hashing.
Chapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Chapter 24: Advanced Indexing
Database Systems (資料庫系統)
LINEAR HASHING E0 261 Jayant Haritsa Computer Science and Automation
2018, Spring Pusan National University Ki-Joune Li
Chapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing
CPSC-608 Database Systems
Chapter 12: Indexing and Hashing
Hash-Based Indexes Chapter 11
Chapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing
Chapter 11 Instructor: Xin Zhang
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Presentation transcript:

Module 12a: Dynamic Hashing

Dynamic Hashing Allows the hash function to be modified dynamically Extendable hashing – one form of dynamic hashing Hash function generates values over a large range — typically b-bit integers, with b = 32. At any time only a prefix of the hash function is uded to index into a table of bucket addresses. Let the length of the prefix be i bits, 0  i  31. Bucket address table size = 2i. Initially i = 0 Value of i grows and shrinks as the size of the database grows and shrinks. Multiple entries in the bucket address table may point to the same bucket. Thus, actual number of buckets is < 2i The number of buckets also changes dynamically due to coalescing and splitting of buckets.

General Extendable Hash Structure In this structure each bucket hold 2 entries. i2 = i3 = i i1 = i – 1 For entry 0 in ther bucket address table, number of pointers is i1 = i

Use of Extendable Hash Structure Each bucket j is associated with a value ij All the entries in the bucket address table that point to the same bucket have the same values on the first ij bits. To locate the bucket containing search-key K: 1. Compute h(K) = X 2. Use the first i high order bits of X as a displacement into bucket address table, and follow the pointer to appropriate bucket To insert a record with search-key value K Follow same procedure as look-up and locate the bucket, say j. If there is room in the bucket j insert record in the bucket. Else, the bucket must be split and insertion re-attempted (see next slide.) Overflow buckets used in some cases (will see shortly)

Use of Extendable Hash Structure Each bucket j is associated with a value ij All the entries in the bucket address table that point to the same bucket have the same values on the first ij bits. To locate the bucket containing search-key Kj: 1. Compute h(Kj) = X 2. Use the first i high order bits of X as a displacement into bucket address table, and follow the pointer to appropriate bucket To insert a record with search-key value Kj Follow same procedure as look-up and locate the bucket, say j. If there is room in the bucket j insert record in the bucket. Else the bucket must be split and insertion re-attempted (see next slide.) Overflow buckets used in some cases (will see shortly)

Insertion in Extendable Hash Structure Splitting a bucket j when inserting record with search-key value K: If i > ij (more than one pointer to bucket j) Allocate a new bucket z, and set ij = iz = (ij + 1) Update the second half of the bucket address table entries originally pointing to j, to point to z Remove each record in bucket j and reinsert (in j or z) Recompute new bucket for K and insert record in the bucket (further splitting is required if the bucket is still full) If i = ij (only one pointer to bucket j) If i reaches some limit b, or too many splits have happened in this insertion, create an overflow bucket Else Increment i and double the size of the bucket address table. Replace each entry in the table by two entries that point to the same bucket. Recompute new bucket address table entry for K Now i > ij so use the first case above.

Deletion in Extendable Hash Structure To delete a key value, Locate it in its bucket and remove it. The bucket itself can be removed if it becomes empty (with appropriate updates to the bucket address table). Coalescing of buckets can be done (can coalesce only with a “buddy” bucket having same value of ij and same ij –1 prefix, if it is present) Decreasing bucket address table size is also possible Note: decreasing bucket address table size is an expensive operation and should be done only if number of buckets becomes much smaller than the size of the table

Use of Extendable Hash Structure: Example

Example (Cont.) Initial hash structure; bucket size = 2

Example (Cont.) Hash structure after insertion of the records “Mozart”, “Srinivasan”, and “Wu”

Use of Extendable Hash Structure: Example

Example (Cont.) Hash structure after insertion of Einstein record

Example (Cont.) Hash structure after insertion of Gold and El Said records

Example (Cont.) Hash structure after insertion of Katz record

Example (Cont.) And after insertion of eleven records

Use of Extendable Hash Structure: Example

Example (Cont.) And after insertion of Kim record in previous hash structure

Extendable Hashing vs. Other Schemes Benefits of extendable hashing: Hash performance does not degrade with growth of file Minimal space overhead Disadvantages of extendable hashing Extra level of indirection to find desired record Bucket address table may itself become very big (larger than memory) Cannot allocate very large contiguous areas on disk either Solution: B+-tree structure to locate desired record in bucket address table Changing size of bucket address table is an expensive operation Linear hashing is an alternative mechanism Allows incremental growth of its directory (equivalent to bucket address table) At the cost of more bucket overflows

End of Module 12a