1 CMSC 341 Extensible Hashing Chapter 5, Section 6 (pp. 200 – 203)

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
CPSC 404, Laks V.S. Lakshmanan1 Hash-Based Indexes Chapter 11 Ramakrishnan & Gehrke (Sections )
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Hash Table indexing and Secondary Storage Hashing.
B+-tree and Hashing.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,
Chapter 13.4 Hash Tables Steve Ikeoka ID: 113 CS 257 – Spring 2008.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
CS4432: Database Systems II
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
CSE AU B-Trees1 B-Trees CSE 373 Data Structures.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
CSE 326: Data Structures Lecture #16 Hashing HUGE Data Sets (and two presents from the Database Fiancée) Steve Wolfman Winter Quarter 2000.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:
Hash Tables - Motivation
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
Static Hashing (using overflow for collision managment e.g., h(key) mod M h key Primary bucket pages 1 0 M-1 Overflow pages(as separate link list) Overflow.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Exam 3 Review Data structures covered: –Hashing and Extensible hashing –Priority queues and binary heaps –Skip lists –B-Tree –Disjoint sets For each of.
Chapter 5 Record Storage and Primary File Organizations
CS4432: Database Systems II
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
CS422 Principles of Database Systems Indexes Chengyu Sun California State University, Los Angeles.
Dynamic Hashing (Chapter 12)
Lecture 21: Hash Tables Monday, February 28, 2005.
Hashing CENG 351.
Database Management Systems (CS 564)
B-Trees CSE 373 Data Structures CSE AU B-Trees.
External Memory Hashing
B- Trees D. Frey with apologies to Tom Anastasio
Indexing and Hashing Basic Concepts Ordered Indices
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
Module 12a: Dynamic Hashing
Index tuning Hash Index.
CMSC 341 Extensible Hashing.
Presentation transcript:

1 CMSC 341 Extensible Hashing Chapter 5, Section 6 (pp. 200 – 203)

2 Motivations Another way to handle data that is too large to be stored in the primary memory –Most records have to be stored in disk –Disk read/write operations much more expensive than operations in main memory e.g., disk access = 200,000 instructions (p.165) –Regular hash tables need to exam several disk blocks when collisions occur –want to limit number of disk accesses for find/insert operations

3 Basic Ideas Basic ideas: –Hash the key of each record into a reasonably long integer to avoid collision adding 0’s to the left so they have the same length –Build a directory The directory is stored in the primary memory Each entry in the directory points to a leaf Directory is extensible –Each leaf contains M records, Stored in one disk block Share the same D leading digits

4 Directory and Leaves Directory: also called “root”, stored in the main memory –D: the number of bits for each entry in the directory –Size of the directory: 2^D Leaf: Each leaf stores up to M elements –M = block size /record size –d L : number of leading digits in common for all elements in leaf L. –d L < = D. (This will become clear shortly)

5 Example N = 12, each key is an integer of 6 bits M = (2) (2) (2) (2) directory D = 2, size of directory 2^D = 4. all records in a leaf have the same 2 leading digits

6 find operation 1.Use the first D digits of the key to find the entry in the directory; 2.Fin the address of the leaf 3.Read the leaf Time performance: –O(1) disk access –Time for searching the record in the leaf in the main memory is negligible.

7 insert operation 1.Find and read the leaf 2.If the leaf has room, insert the record, write back; 3.Else –split the leaf into two; –update the directory if necessary; –write back the leaf or leaves

8 insert operation insert , split (in the middle) the 3rd leaf extend the directory (now D = 3, with8 entries) Other leaves are pointed by two adjacent directory entries (2) (2) (3) (2) (3) records in these leaves share 3 leading digits

9 insert operation insert , split the 1st leaf the directory is not extended since the original leaf is pointed by two directory entries (3) (2) (3) (2) (3) records in these leaves Now share 3 leading digits (3)

10 Some Issues Some time, more than one directory split may be needed when inserting one record. E.g, insert , , then into the original example. (2) (2) (4) (4) original leaf after inserting and , No split after inserting 11100, 2 splits, now D = 4 because 4 digits are needed to distinguish the 5 keys

11 Some Issues Duplicate keys (different original keys hashed to the same integer keys: collision) –Ok if fewer than M duplicates –Doesn’t work if more than M duplicates (one directory entry cannot point to more than one page) Time performance –Expected # of leaves: (N/M)log 2 e. Average leaf is ln 2 = 0.69 full (same as B-trees). –Expected size of the directory (2 D ): O(N 1+1/M /M). May be large if M is small (i.e., records are large) Let leaves store pointers to the records, not records themselves – adding a second disk access