CST203-2 Database Management Systems Lecture 7. Disadvantages on index structure: We must access an index structure to locate data, or must use binary.

Slides:



Advertisements
Similar presentations
Hashing.
Advertisements

External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hashing Dashiell Fryer CS 157B Dr. Lee. Contents Static Hashing Static Hashing File OrganizationFile Organization Properties of the Hash FunctionProperties.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 11 – Hash-based Indexing.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Indexing.
CM20145 Indexing and Hashing
CIS552Indexing and Hashing1 Cost estimation Basic Concepts Ordered Indices B + - Tree Index Files B - Tree Index Files Static Hashing Dynamic Hashing Comparison.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
INDEXING AND HASHING.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. Indexing and Hashing Database Management Systems I Alex Coman, Winter 2006.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Database Management Systems I Alex Coman, Winter 2006
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Indexing and Hashing.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Ch12: Indexing and Hashing  Basic Concepts  Ordered Indices B+-Tree Index Files B+-Tree Index Files B-Tree Index Files B-Tree Index Files  Hashing Static.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Hashing.
Computing & Information Sciences Kansas State University Friday, 24 Oct 2008CIS 560: Database System Concepts Lecture 23 of 42 Friday, 24 October 2008.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Basic Concepts Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute to set of attributes.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 560: Database System Concepts Lecture 22 of 42 Wednesday, 22 October.
Indexing and Hashing By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 560: Database System Concepts Lecture 25 of 42 Monday, 31 March 2008 William.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Indexing.
Chapter 5 Record Storage and Primary File Organizations
PART 4 DATA STORAGE AND QUERY. Chapter 12 Indexing and Hashing.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Dynamic Hashing (Chapter 12)
Chapter 12: Indexing and Hashing
Chapter 11: Indexing and Hashing
Database Management Systems (CS 564)
Indexing and Hashing Basic Concepts Ordered Indices
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Database Design and Programming
2018, Spring Pusan National University Ki-Joune Li
CS4432: Database Systems II
Presentation transcript:

CST203-2 Database Management Systems Lecture 7

Disadvantages on index structure: We must access an index structure to locate data, or must use binary search, and that results in more I/O operations Space taken by index structure Hashing allows us to avoid accessing an index structure

Hash File Organization We obtain the address of the disk block containing a desired record directly by computing a function on the search-key value of the record Bucket is a unit of storage that can store one or more records A bucket is typically a disk block, but could be chosen to be smaller or larger than a disk block.

A hash function h is a function from K to B. h – hash function K i - search key B i = h(Ki) If h(K5) = h(K7) bucket h(K 5 ) contains records with search-key values K 5 and records with search-key values K 7

Deletion Equally straightforward If the search-key value of the record to be deleted is Ki, we compute h(Ki), then search the corresponding bucket for that record, and delete the record from the bucket

Hash Functions The worst possible hash function maps all search-key values to the same bucket An ideal hash function distributes the stored keys uniformly across all the buckets, so that every bucket has the same number of records We want to choose a hash function that assigns search- key values to buckets

The distribution is uniform. The hash function assigns each bucket the same number of search-key values from the set of all possible search-key values The distribution is random Each bucket will have nearly the same number of values assigned to it, regardless of the actual distribution of search- key values. The hash value will not be correlated to any externally visible ordering on the search-key values, such as alphabetic ordering or ordering by the length of the search keys

Example of Hash File Organization (Cont.) Assume…….. There are 26 buckets hash function maps names beginning with the ith letter of the alphabet to the ith bucket Simple but no uniform distribution Assume……… there are 10 buckets, The binary representation of the ith character is assumed to be the integer i. The hash function returns the sum of the binary representations of the characters modulo 10 E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) = 3 Hash file organization of account file, using branch-name as key

int h = 0; for (int i = 0; i < n; i++) { h = h + s.charAt(i); } h = h % 10;

Example of Hash File Organization Hash file organization of account file, using branch-name as key

Handling of Bucket Overflows We have assumed that, when a record is inserted, the bucket to which it is mapped has space to store the record If the bucket does not have enough space, a bucket overflow is said to occur Reasons: Insufficient buckets. Skew 1. Multiple records may have the same search key. 2. The chosen hash function may result in non-uniform distribution of search keys.

We handle bucket overflow by using overflow buckets. If a record must be inserted into a bucket b, and b is already full, the system provides an overflow bucket for b, and inserts the record into the overflow bucket. If the overflow bucket is also full, the system provides another overflow bucket, and so on All the overflow buckets of a given bucket are chained together in a linked list(overflow chaining)

`

Hash Indices A hash index organizes the search keys, with their associated pointers, into a hash file structure We apply a hash function on a search key to identify a bucket, and store the key and its associated pointers in the bucket (or in overflow buckets)

The term hash index is used to denote hash file structures as well as hash indices Hash indices are only secondary index structures But, hash file organization provides the same direct access to records that indexing provides, we pretend that a file organized by hashing also has a primary hash index on it

Dynamic Hashing Most databases grow larger over time For static hashing, Choose a hash function based on the current file size. This option will result in performance degradation as the database grows Choose a hash function based on the anticipated size of the file at some point in the future. Although performance degradation is avoided, a significant amount of space may be wasted initially Periodically reorganize the hash structure in response to file growth. Such a reorganization involves choosing a new hash function, recomputing the hash function on every record in the file, and generating new bucket assignments. This reorganization is a massive, time-consuming operation. Furthermore, it is necessary to forbid access to the file during reorganization.