Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 

Slides:



Advertisements
Similar presentations
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Advertisements

Hashing Dashiell Fryer CS 157B Dr. Lee. Contents Static Hashing Static Hashing File OrganizationFile Organization Properties of the Hash FunctionProperties.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
CIS552Indexing and Hashing1 Cost estimation Basic Concepts Ordered Indices B + - Tree Index Files B - Tree Index Files Static Hashing Dynamic Hashing Comparison.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Chapter 14 Indexing Structures for Files Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
B+-tree and Hashing.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Indexing and Hashing.
B+ - Tree & B - Tree By Phi Thong Ho.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Ch12: Indexing and Hashing  Basic Concepts  Ordered Indices B+-Tree Index Files B+-Tree Index Files B-Tree Index Files B-Tree Index Files  Hashing Static.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
CS4432: Database Systems II
Indexing dww-database System.
Indexing and Hashing.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.
Chapter 14-1 Chapter Outline Types of Single-level Ordered Indexes –Primary Indexes –Clustering Indexes –Secondary Indexes Multilevel Indexes Dynamic Multilevel.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 12: Indexing and Hashing
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Indexing Structures for Files by Pinar Senkul resources: mostly froom Elmasri, Navathe and.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Indexing Structures for Files
1 Chapter 2 Indexing Structures for Files Adapted from the slides of “Fundamentals of Database Systems” (Elmasri et al., 2003)
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Nimesh Shah (nimesh.s) , Amit Bhawnani (amit.b)
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Basic Concepts Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute to set of attributes.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Indexing and Hashing By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Chapter 6 Index Structures for Files 1 Indexes as Access Paths 2 Types of Single-level Indexes 2.1Primary Indexes 2.2Clustering Indexes 2.3Secondary Indexes.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
Chapter 14 Indexing Structures for Files Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
Indexing Structures Database System Implementation CSE 507 Some slides adapted from R. Elmasri and S. Navathe, Fundamentals of Database Systems, Sixth.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
Chapter 5 Record Storage and Primary File Organizations
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Chapter 5 Ranking with Indexes. Indexes and Ranking n Indexes are designed to support search  Faster response time, supports updates n Text search engines.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
10/3/2017 Chapter 6 Index Structures.
Data Indexing Herbert A. Evans.
Indexing Structures for Files and Physical Database Design
Azita Keshmiri CS 157B Ch 12 indexing and hashing
11/14/2018.
Indexing and Hashing Basic Concepts Ordered Indices
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Indexing 4/11/2019.
8/31/2019.
Presentation transcript:

Chapter 11 Indexing & Hashing

2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing  An index is specified on one (or more) field(s), called search key field, of the record, which is not necessarily unique  Different index structures associated with different search keys  Allows fast random access to records  Index record (forms an access path to the data record), is of the form

3 Indexing n Dense Index: For every unique search-key value, there is an index record n Sparse Index: Index records are created for some search- key values  Sparse index is slower, but requires less space & overhead n Primary Index:  Defined on an ordered data file, ordered on a search key field & is usually the primary key.  A sequentially ordered file with a primary index is called index-sequential file  A binary search on the index yields a pointer to the record  Index value is the search-key value of the first data record in the block

4 Figure. Dense index Figure. Sparse index

5 Figure. Primary index on the ordering key field of a file

6 Primary Index n Index Deletion  (Dense Index) delete the search-key value (of the deleted record) from the index file if the deleted record is the last record with the search-key value  (Sparse Index) if the deleted record is the last record with the search-key value & the search-key value v (of the deleted record) exists in the index file F, replace v by w, where w is the next search-key value in order; if v and w are in F, simply delete v n Index Insertion  (Dense Index) if the search-key value v (of the new record) does not exist in the index file, insert v  (Sparse Index) if a new data block B is created, then the 1 st search-key value of B is inserted into the index file

7 Multi-Level Indices n First-level index: the original index file n Second-level index: primary index to the original index file n (Rare) Third-level index: top level index (fit in one disk block) n Form a search tree, such as B-tree or B + -tree structures n Insertion/deletion of new indexes are not trivial in indexed files

8 Figure. A two- level primary index

9 Secondary Indices n Defined on an unordered data file, i.e., not by the indexed field order (can be defined on a candidate key/non-key field) n Each pointer often points to a bucket which consists of pointers to records with the same search-key value. The bucket structure can be eliminated if  the index is dense, and the search-key values form a primary key, i.e., unique n Advantages: i. Improve the performance of queries that use candidate keys ii. Eliminate extra pointers within the records iii. Eliminate the need for scanning records sequentially n Disadvantages: overhead/modification n Types of Secondary Indices:  Dense: pointers in a bucket point to records w/ same search-key values  Sparse: a pointer in a bucket points to records w/ search-key values in the appropriate range

10 Figure. A secondary index on a key field of a file.

11 Figure. A secondary index on a non-key field implemented using a level of indirection

12 B + -Tree (Multi-level) Indices n Frequently used index structure in DB n Allow efficient insertion/deletion of new/existing search-key values n A balanced tree structure: all leaf nodes are at the same level (which may form a dense index) n Each node, corresponding to a disk block, has the format: P 1 K 1 P 2 … P n-1 K n-1 P n where P i, 1  i  n, is a pointer K i, 1  i  n-1, is a search-key value & K i < K j, i < j, i.e., search-key values are in order P 1 K 1 … K i-1 P i K i … K n-1 P n n In each leaf node, P i points to either (i) a data record with search-key value K i or (ii) a bucket of pointers, each points to a data record with search-key value K i XXX X < K 1 K i-1  X < K i K n-1  X

13 B + -Tree (Multi-level) Indices n Each leaf node is kept between half full & completely full, i.e., (  (n-1)/2 , n-1) search-key values n Non-leaf nodes form a sparse index n Each non-leaf node (except the root) must have (  n/2 , n) pointers n No. of Block accesses required for searching a search-key level is log  n/2  (K) where K = no. of unique search-key values & n = no. of indices/node n Insertion into a full node causes a split into two nodes which may propagate to higher tree levels Note: if there are n search-key values to be split, put the first (  (n-1)/2  in the existing node & the remaining in a new node n A less than half full node caused by a deletion must be merged with neighboring nodes

14 B + -Tree Algorithms n Algorithm 1. Searching for a record with search-key value K, using a B+-Tree. Begin n  block containing root node of B + -Tree ; read block n; while (n is not a leaf node of the B + -Tree) do begin q  number of tree pointers in node n; if K < n.K 1 /* n.K i refers to the i th search-key value in node n */ then n  n.P 1 /* n.P i refers to the i th pointer in node n */ else if K  n.K q-1 then n  n.P q else begin search node n for an entry i such that n.K i-1  K < n.K i ; n  n.P i ; end; /*ELSE*/ read block n; end; /*WHILE*/ search block n for entry K i with K = K i ; /*search leaf node*/ if found, then read data file block with address P i and retrieve record else record with search-key value K is not in the data file; end. /*Algorithm 1*/

15 B + -Tree Algorithms n Algorithm 2. Inserting a record with search-key value K in a B + -Tree of order p. /* A B + -Tree of order p contains at most p-1 values an p pointers*/ Begin n  block containing root node of B + -Tree ; read block n; set stack S to empty; while (n is not a leaf node of the B + -Tree ) do begin push address of n on stack S; /* S holds parent nodes that are needed in case of split */ q  number of tree pointers in node n; if K < n.K 1 /* n.K i refers to the i th search-key value in node n */ then n  n.P 1 /* n.P i refers to the i th pointer in node n */ else if K  n.K q-1 then n  n.P q else begin search node n for an entry i such that n.K i-1  K < n.K i ; n  n.P i ; end; /* ELSE */ read block n; end; /* WHILE */ search block n for entry K i with K = K i ; /* search leaf node */

16 Algorithm 2 Continue if found then return /*record already in index file - no insertion is needed */ else begin /* insert entry in B + -Tree to point to record */ create entry (P, K), where P points to file block containing new record; if leaf node n is not full then insert entry (P, K) in correct position in leaf node n else begin /* leaf node n is full – split */ copy n to temp; /* temp is an oversize leaf node to hold extra entry */ insert entry (P, K) in temp in correct position; /* temp now holds p+1 entries of the form (P i, K i ) */ new  a new empty leaf node for the tree; *j   p/2  n  first j entries in temp (up to entry (P j, K j )); n.P next  new; /* P next points to the next leaf node*/ new  remaining entries in temp; * K  K j+1 ; /* Now we must move (K, new) and insert in parent internal node. However, if parent is full, split may propagate */ finished  false;

17 Algorithm 2 continue Repeat if stack S is empty, then /*no parent node*/ begin /* new root node is created for the B + -Tree */ root  a new empty internal node for the tree; * root  ; /* set P 1 to n & P 2 to new */ finished  true; end else begin n  pop stack S; if internal node n is not full, then begin /* parent node not full - no split */ insert (K, new) in correct position in internal node n; finished  true end else

18 Algorithm 2 continue begin /* internal node n is full with p tree pointers – split */ copy n to temp; /* temp is an oversize internal node */ insert (K, new) in temp in correct position; /* temp has p+1 tree pointers */ new  a new empty internal node for the tree; * j  (  (p + 1)/2  n  entries up to tree pointer P j in temp; /* n contains */ new  entries from tree pointer P j+1 in temp; /*new contains */ * K  K j ; /* now we must move (K, new) and insert in parent internal node */ end until finished end; /* ELSE */ end. /* Algorithm 2 */

19 Hashing n Uses dense index n Avoids accessing an index structure to locate data n Allocate search-key values to different buckets n (Static Hash Function) given a search-key value v, a hash function h computes (assigns) the address of the desired bucket (which contains a pointer to the record) for v h: K  B where K: set of search-key values B: set of (fixed) bucket addresses n The hash function maps a search-key value to a bucket b and perform a (linear) search of every record in b n An ideal hash function  Uniform distribution of search-key values, i.e., same no. of search-key values in each bucket  Random distribution of search-key values, i.e., each search-key value has the same possibility

20 Dynamic (Extendable) Hash Function (EHF) n Resolves the problems of static hashing  Allowing hash function to be modified dynamically, accommodating changes in DB size (no reserved buckets for future growth)  Minimizing space overhead, i.e., bucket address table (b-a-t) is small n Allows buckets to be split or combined to maintain space efficiency n Buckets are created on demand, as records are inserted.  Result: low performance overhead (reorganization requires one bucket at a time)

21 Dynamic (Extendable) Hash Function (EHF) n EHF uses i bits, which grows and shrinks with DB size, as an offset into b-a-t n i bits (which changes as file grows) of h(K) are required to determine the correct bucket for K n All entries of the i-bit b-a-t pointed to the same bucket j have a common hash prefix (chp) and bucket j is associated with an integer i j to denote the length of the chp No. of entries of b-a-t that point to bucket j = 2 (i - i j )

22 Figure. General extendable hash structure = 2 = 1 = 2 ………………

23 Dynamic (Extendable) Hash Function (EHF) n Lookup K, a search-key value: locate the bucket pointed to by the b-a-t entry which is determined by the first i high-order bits of h(K) n Insert a record r with search-key value K 1. Lookup K and locate bucket j 2. If j is not full, insert the info of K in j and r in the file 3. If j is full, create a new bucket z. There are two cases to be considered:

24 Dynamic (Extendable) Hash Function (EHF) (a) Case i = i j (only one entry in b-a-t points to j): 1. Increase i by 1, i.e., doubling the size of b-a-t. Each entry is replaced by 2 entries which contain the same pointer as the original entry 2. (For the b-a-t entry that causes the split) Set the 2 nd entry created from the entry for j to point to z 3. Set i j = i(new) and i z = i(new) 4. Rehash the records in j based on (new) i and redistribute records in j and r 5. Re-attempt to insert r and repeat the whole process if r and all records in j have the same hash prefix

25 Figure. Sample deposit file Figure. Hash function for branch-name Figure. Initial extendable hash structure (Each bucket can hold up to 2 records) Figure. Hash structure after 3 insertions (Downtown, Round Hill, Perryridge) Downtown Round Hill

26 Figure. Sample deposit file Figure. Hash function for branch-name Figure. Hash structure after four insertions * * *

27 Hashing n Insert a record r with search-key value K (b) Case i > i j (> 1 entry in b-a-t points to j): 1. i z = i j + 1 and i j = i j Adjust entries in b-a-t that point to j: set the first half of entries point to j and the remaining ones to z 3. Rehash and allocate records in j 4. Reattempt to insert r and repeat the whole process (of insertion) if r and all records in j have the same hash prefix n Delete a record r with search-key value K: 1. Lookup K and locate bucket j 2. Remove K from j and r from the file. Remove j if j becomes empty 3. Adjust b-a-t if necessary n Disadvantages  Lookup involves additional level of indirection (must access b-a-t)  Additional complexity in implementation

28 Figure. Sample deposit file Figure. Hash function for branch-name Figure. Extendable hash structure for the deposit file

29 Figure. Sample account file Figure. Hash function for branch-name Figure. Initial extendable hash structure.

30 Figure Hash structure after four insertions

31 Figure Hash structure after seven insertions Redwood A Round Hill A Figure Hash structure after nine insertions

32 Figure Extendable hash structure for the account file

33 Indexing & Hashing n Expected types of queries is critical to the choice between indexing and hashing n Comparison  For query with an equality comparison of an attribute, hashing is preferable  For query with a range of values specified, indexing is preferable  Most DB systems use indexing - difficult to find a good hash function that preserves order to support range queries