Hashing Dashiell Fryer CS 157B Dr. Lee. Contents Static Hashing Static Hashing File OrganizationFile Organization Properties of the Hash FunctionProperties.

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Advertisements

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Indexing (Cont.) These slides are a modified version of the slides of the book “Database System Concepts” (Chapter 12), 5th Ed., McGraw-Hill,McGraw-Hill.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Indexing.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
CST203-2 Database Management Systems Lecture 7. Disadvantages on index structure: We must access an index structure to locate data, or must use binary.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
Arch (A) Tented Arch (T) Whorl (W) Loop (U or R) The four main classes of fingerprints Loop (60%) Arch/Tented Arch (6%) Whorl (34%) Other (Less than 1%)
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
B+-tree and Hashing.
B+-tree and Hash Indexes
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
HASH TABLES Malathi Mansanpally CS_257 ID-220. Agenda: Extensible Hash Tables Insertion Into Extensible Hash Tables Linear Hash Tables Insertion Into.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Indexing and Hashing.
B+ - Tree & B - Tree By Phi Thong Ho.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Indexing and Hashing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Hashing.
Computing & Information Sciences Kansas State University Friday, 24 Oct 2008CIS 560: Database System Concepts Lecture 23 of 42 Friday, 24 October 2008.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Basic Concepts Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute to set of attributes.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 560: Database System Concepts Lecture 22 of 42 Wednesday, 22 October.
Indexing and Hashing By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Chapter 5 Record Storage and Primary File Organizations
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Data Indexing Herbert A. Evans.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Dynamic Hashing (Chapter 12)
Hashing CENG 351.
Database Management Systems (CS 564)
Dynamic Hashing.
Introduction to Database Systems
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
External Memory Hashing
Indexing and Hashing Basic Concepts Ordered Indices
Hash-Based Indexes Chapter 11
Advance Database System
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Database Systems (資料庫系統)
LINEAR HASHING E0 261 Jayant Haritsa Computer Science and Automation
2018, Spring Pusan National University Ki-Joune Li
Module 12a: Dynamic Hashing
Hash-Based Indexes Chapter 11
Presentation transcript:

Hashing Dashiell Fryer CS 157B Dr. Lee

Contents Static Hashing Static Hashing File OrganizationFile Organization Properties of the Hash FunctionProperties of the Hash Function Bucket OverflowBucket Overflow IndicesIndices Dynamic Hashing Dynamic Hashing Underlying Data StructureUnderlying Data Structure Querying and UpdatingQuerying and Updating Comparisons Comparisons Other types of hashingOther types of hashing Ordered Indexing vs. HashingOrdered Indexing vs. Hashing

Static Hashing Hashing provides a means for accessing data without the use of an index structure. Hashing provides a means for accessing data without the use of an index structure. Data is addressed on disk by computing a function on a search key instead. Data is addressed on disk by computing a function on a search key instead.

Organization A bucket in a hash file is unit of storage (typically a disk block) that can hold one or more records. A bucket in a hash file is unit of storage (typically a disk block) that can hold one or more records. The hash function, h, is a function from the set of all search-keys, K, to the set of all bucket addresses, B. The hash function, h, is a function from the set of all search-keys, K, to the set of all bucket addresses, B. Insertion, deletion, and lookup are done in constant time. Insertion, deletion, and lookup are done in constant time.

Querying and Updates To insert a record into the structure compute the hash value h(K i ), and place the record in the bucket address returned. To insert a record into the structure compute the hash value h(K i ), and place the record in the bucket address returned. For lookup operations, compute the hash value as above and search each record in the bucket for the specific record. For lookup operations, compute the hash value as above and search each record in the bucket for the specific record. To delete simply lookup and remove. To delete simply lookup and remove.

Properties of the Hash Function The distribution should be uniform. The distribution should be uniform. An ideal hash function should assign the same number of records in each bucket.An ideal hash function should assign the same number of records in each bucket. The distribution should be random. The distribution should be random. Regardless of the actual search-keys, the each bucket has the same number of records on averageRegardless of the actual search-keys, the each bucket has the same number of records on average Hash values should not depend on any ordering or the search-keysHash values should not depend on any ordering or the search-keys

Bucket Overflow How does bucket overflow occur? How does bucket overflow occur? Not enough buckets to handle dataNot enough buckets to handle data A few buckets have considerably more records then others. This is referred to as skew.A few buckets have considerably more records then others. This is referred to as skew. Multiple records have the same hash value Multiple records have the same hash value Non-uniform hash function distribution. Non-uniform hash function distribution.

Solutions Provide more buckets then are needed. Provide more buckets then are needed. Overflow chaining Overflow chaining If a bucket is full, link another bucket to it. Repeat as necessary.If a bucket is full, link another bucket to it. Repeat as necessary. The system must then check overflow buckets for querying and updates. This is known as closed hashing.The system must then check overflow buckets for querying and updates. This is known as closed hashing.

Alternatives Open hashing Open hashing The number of buckets is fixedThe number of buckets is fixed Overflow is handled by using the next bucket in cyclic order that has space.Overflow is handled by using the next bucket in cyclic order that has space. This is known as linear probing. This is known as linear probing. Compute more hash functions. Compute more hash functions. Note: Closed hashing is preferred in database systems.

Indices A hash index organizes the search keys, with their pointers, into a hash file. A hash index organizes the search keys, with their pointers, into a hash file. Hash indices never primary even though they provide direct access. Hash indices never primary even though they provide direct access.

Example of Hash Index

Dynamic Hashing More effective then static hashing when the database grows or shrinks More effective then static hashing when the database grows or shrinks Extendable hashing splits and coalesces buckets appropriately with the database size. Extendable hashing splits and coalesces buckets appropriately with the database size. i.e. buckets are added and deleted on demand.i.e. buckets are added and deleted on demand.

The Hash Function Typically produces a large number of values, uniformly and randomly. Typically produces a large number of values, uniformly and randomly. Only part of the value is used depending on the size of the database. Only part of the value is used depending on the size of the database.

Data Structure Hash indices are typically a prefix of the entire hash value. Hash indices are typically a prefix of the entire hash value. More then one consecutive index can point to the same bucket. More then one consecutive index can point to the same bucket. The indices have the same hash prefix which can be shorter then the length of the index.The indices have the same hash prefix which can be shorter then the length of the index.

General Extendable Hash Structure In this structure, i 2 = i 3 = i, whereas i 1 = i – 1

Queries and Updates Lookup Lookup Take the first i bits of the hash value.Take the first i bits of the hash value. Following the corresponding entry in the bucket address table.Following the corresponding entry in the bucket address table. Look in the bucket.Look in the bucket.

Queries and Updates (Cont’d) Insertion Insertion Follow lookup procedureFollow lookup procedure If the bucket has space, add the record.If the bucket has space, add the record. If not…If not…

Insertion (Cont’d) Case 1: i = i j Case 1: i = i j Use an additional bit in the hash valueUse an additional bit in the hash value This doubles the size of the bucket address table. This doubles the size of the bucket address table. Makes two entries in the table point to the full bucket. Makes two entries in the table point to the full bucket. Allocate a new bucket, z.Allocate a new bucket, z. Set i j and i z to i Set i j and i z to i Point the second entry to the new bucket Point the second entry to the new bucket Rehash the old bucket Rehash the old bucket Repeat insertion attemptRepeat insertion attempt

Insertion (Cont’d) Case 2: i > i j Case 2: i > i j Allocate a new bucket, zAllocate a new bucket, z Add 1 to i j, set i j and i z to this new valueAdd 1 to i j, set i j and i z to this new value Put half of the entries in the first bucket and half in the otherPut half of the entries in the first bucket and half in the other Rehash records in bucket jRehash records in bucket j Reattempt insertionReattempt insertion

Insertion (Finally) If all the records in the bucket have the same search value, simply use overflow buckets as seen in static hashing. If all the records in the bucket have the same search value, simply use overflow buckets as seen in static hashing.

Use of Extendable Hash Structure: Example Initial Hash structure, bucket size = 2

Example (Cont.) Hash structure after insertion of one Brighton and two Downtown records Hash structure after insertion of one Brighton and two Downtown records

Example (Cont.) Hash structure after insertion of Mianus record

Example (Cont.) Hash structure after insertion of three Perryridge records

Example (Cont.) Hash structure after insertion of Redwood and Round Hill records Hash structure after insertion of Redwood and Round Hill records

Comparison to Other Hashing Methods Advantage: performance does not decrease as the database size increases Advantage: performance does not decrease as the database size increases Space is conserved by adding and removing as necessarySpace is conserved by adding and removing as necessary Disadvantage: additional level of indirection for operations Disadvantage: additional level of indirection for operations Complex implementationComplex implementation

Ordered Indexing vs. Hashing Hashing is less efficient if queries to the database include ranges as opposed to specific values. Hashing is less efficient if queries to the database include ranges as opposed to specific values. In cases where ranges are infrequent hashing provides faster insertion, deletion, and lookup then ordered indexing. In cases where ranges are infrequent hashing provides faster insertion, deletion, and lookup then ordered indexing.