Indexing Techniques. Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index?

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Advertisements

Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Hashing and Indexing John Ortiz.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
1 Lecture 8: Data structures for databases II Jose M. Peña
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
B+-tree and Hashing.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Efficient Storage and Retrieval of Data
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Indexing Methods. Storage Requirements of Databases Need data to be stored “permanently” or persistently for long periods of time Usually too big to fit.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
B+ tree & B tree Extracted from Garcia Molina
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
B-Trees, Part 2 Hash-Based Indexes R&G Chapter 10 Lecture 10.
Indexing Structures Database System Implementation CSE 507 Some slides adapted from R. Elmasri and S. Navathe, Fundamentals of Database Systems, Sixth.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
Chapter 5 Record Storage and Primary File Organizations
1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
CS422 Principles of Database Systems Indexes Chengyu Sun California State University, Los Angeles.
CS422 Principles of Database Systems Indexes
Indexing and hashing.
Multiway Search Trees Data may not fit into main memory
Azita Keshmiri CS 157B Ch 12 indexing and hashing
COP Introduction to Database Structures
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Database System Implementation CSE 507
Database Management Systems (CS 564)
Introduction to Database Systems
(Slides by Hector Garcia-Molina,
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Database Systems (資料庫系統)
Database Design and Programming
CPS216: Advanced Database Systems
Hash-Based Indexes Chapter 11
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Indexing Techniques

Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index? ……

Advanced DatabasesIndexing Techniques3 Definitions Index: an auxiliary data structure to speed up record retrieval Search key: the field/s of a table which is/are indexed Storage: index files that contain index records –Each entry storing Actual data record or, search key value k and record ID or, search key value k and list of records IDs Types: ordered and unordered (hash) indices Page iPage i+1 Paul Anna Tim

Advanced DatabasesIndexing Techniques4 Types of Ordered Indices (1/3) Assuming ordered data files Depending on which field is indexed –Primary index: search key is ordering key field Pointer for each page –Secondary index: search key is non ordering field Paul Anna Matt Tim Carol Rob Anna Carol Paul Tim primary secondary

Advanced DatabasesIndexing Techniques5 Types of Ordered Indices (2/3) Depending on the density of index records –Dense index: an index record for each distinct search key value, ie every record –Sparse index: index records for only some search key values search key value for first record in page pointer to page Paul Anna Matt Tim Carol Rob sparse dense

Advanced DatabasesIndexing Techniques6 Types of Ordered Indices (3/3) Ordering field is nonkey (may have duplicates) –Clustered index –Unclustered index Paul Anna Matt Tim Carol Rob Paul Tim Tim Anna Carol Matt Paul Rob Tim clustered unclustered

Advanced DatabasesIndexing Techniques7 Indices Exercise 2 15 records 128 bytes/record 2 10 bytes/page ordered file equality search on ordering field, unspanned organization –without an index –with a primary index on field of size 12 bytes assume pointer 4 bytes long

Advanced DatabasesIndexing Techniques8 Multi-level Indices (1/2) If access using first-level index is still expensive Build a sparse index on the first-level index –Multi-level Index Fan-out: index blocking factor Paul Anna Matt Tim Carol Rob first-level index second-level index

Advanced DatabasesIndexing Techniques9 Multi-level Indices (2/2) 2 6 index records/page (fan-out) 2 15 index records 1st-level –2 9 pages 2nd-level –2 9 index records –2 3 pages 3rd-level –2 3 index records –1 page 1 <= 2 15 / (2 6 ) t t = ceil(log ) = 3 t = ceil(log fo #index-records)

Advanced DatabasesIndexing Techniques10 Dynamic multi-level indices So far assumed indices are physically ordered files –expensive insertions and deletions Dynamic multi-level indices –B trees –B + trees

Advanced DatabasesIndexing Techniques11 Tree-structured Indices For each node: K 1 < K 2 < … K q-1 For each value X in subtree pointed to by P i –K i-1 < X < K i, 1<i<q –X < K i, i=1 –K i-1 < X, i=q P1P1 K1K1 …K i-1 PiPi KiKi …K q-1 PqPq XXX

Advanced DatabasesIndexing Techniques12 B tree Problems: empty nodes, unbalanced trees –solution: B trees ………………………

Advanced DatabasesIndexing Techniques13 B tree: Definition Each node:, P 2,…,, P q > P i tree pointer, K i search value, Pr i data pointer For each node: K 1 < K 2 < … K q-1 For each value X in subtree pointed to by P i –K i-1 < X < K i, 1<i<q –X < K i, i=1 –K i-1 < X, i=q Each node at most q pointers –B tree is order q Each node at least ceil(q/2) tree pointers –except from root Internal node with p pointers has p-1 values All leaves at the same level –balanced tree

Advanced DatabasesIndexing Techniques14 B tree: Example 58 ø1ø3øø6ø7øø9ø12ø tree pointer data pointer ø null pointer

Advanced DatabasesIndexing Techniques15 B + tree Most implementations of B tree are B + tree Data pointers only in leaves –more entries in internal nodes than regular B trees –less internal nodes –less levels –faster access

Advanced DatabasesIndexing Techniques16 B + tree: Definition Internal nodes: Leaf nodes:,,…,, P next > Pr i points a data records or block of pointers of such records leaf order

Advanced DatabasesIndexing Techniques B+ tree: Search At each level, find smallest K i larger than search key Follow associated pointer P i

Advanced DatabasesIndexing Techniques18 B+ tree: Insert Nodes may overflow or underflow Ignoring overflow or underflow Inserting data record with with search key value k –find leaf node –if k found add record to file, create indirect block if there isn’t one add record pointer to indirect block –if k not found add data record to file insert record pointer in leaf node (all search keys in order)

Advanced DatabasesIndexing Techniques19 B+ tree: Delete Ignoring overflow or underflow Find leaf node with search key value k Find data record pointer, delete record delete index record –and indirect block, if any, if empty

Advanced DatabasesIndexing Techniques20 B+ tree: Simple Insert Insert k <

Advanced DatabasesIndexing Techniques21 B+ tree: Leaf Overflow (1/2) Insert k < 100

Advanced DatabasesIndexing Techniques22 B+ tree: Leaf Overflow (2/2) first ceil(n/2) in existing node, rest in new leaf node n=3+1= k <

Advanced DatabasesIndexing Techniques k < B+ tree: Internal Node Overflow (1/3) Insert 210, insert

Advanced DatabasesIndexing Techniques24 B+ tree: Internal Node Overflow (2/3) Leaf Split 930 k <

Advanced DatabasesIndexing Techniques25 B+ tree: Internal Node Overflow (3/3) 930 k <

Advanced DatabasesIndexing Techniques26 B+ tree: New Root (1/2) Insert 210, insert

Advanced DatabasesIndexing Techniques27 B+ tree: New Root (2/2)

Advanced DatabasesIndexing Techniques28 Index Insert Exercise Insert 8, 7,

Advanced DatabasesIndexing Techniques29 B+ tree: Delete Simple delete case Underflow case: –redistribute records –coalesce with siblings –update parents

Advanced DatabasesIndexing Techniques30 B+ tree: Simple Delete (1/2) Delete

Advanced DatabasesIndexing Techniques31 B+ tree: Simple Delete (2/2) Leaf Updated

Advanced DatabasesIndexing Techniques32 B+ tree: Delete Redistribution (1/2) Delete

Advanced DatabasesIndexing Techniques33 B+ tree: Delete Redistribution (2/2) Redistribute entries –left or right sibling

Advanced DatabasesIndexing Techniques34 B+ tree: Delete Coalesce (1/4) Delete

Advanced DatabasesIndexing Techniques35 B+ tree: Delete Coalesce (2/4) Leaf updated No redistribution –sibling coalesce

Advanced DatabasesIndexing Techniques36 B+ tree: Delete Coalesce (3/4) Leaf updated No redistribution –sibling coalesce

Advanced DatabasesIndexing Techniques37 B+ tree: Delete Coalesce (4/4) Redistribution

Hashing Techniques

Advanced DatabasesIndexing Techniques39 Static Hashing (1/2) Store records in buckets with overflow chains Allocate a fixed number of buckets M Problems: –small M long overflow chains, slow search-delete-insert null h

Advanced DatabasesIndexing Techniques40 Static Hashing (2/2) Problems: –large M wasted space, slow scan null h

Advanced DatabasesIndexing Techniques41 Dynamic Hashing Splitting and coalescing buckets as the database grows-shrinks One scheme: Extendible Hashing Hash function generates large values, eg 32 bits –use i bits, change i as database size changes If overflow, double the number of buckets –use i+1 bits of the hash function –but, expensive: read all pages M and distribute records in 2*M pages solution: use a directory and double the size of the directory –only split bucket that overflowed

Advanced DatabasesIndexing Techniques42 Extendible Hashing (1/4) h(18) = Directory Buckets 37 2 A B C D 18

Advanced DatabasesIndexing Techniques43 Extendible Hashing (2/4) h(4) = A B C D 18

Advanced DatabasesIndexing Techniques44 Extendible Hashing (3/4) A B C D A1

Advanced DatabasesIndexing Techniques45 Extendible Hashing (4/4) A B C D A Global Depth Local Depth If bucket full: –split bucket –increment LD If GD=LD –increment GD –double directory

Advanced DatabasesIndexing Techniques46 Extendible Hashing: Delete If deletion make bucket empty –merge with split image If directory pointers point to same bucket as split image –directory halved

Advanced DatabasesIndexing Techniques47 Extendible Hashing: Summary Avoids overflow pages Directory can get large Key search requires just 2 page reads Space utilization fluctuates –59-90% for uniformly distributed records

Advanced DatabasesIndexing Techniques48 Extendible Hashing: Exercise Initially GD = LD = 1 M = 2 buckets Hash function: h(k) = k mod 2 i inserts: 14, 18, 22, 3, 9 deletes 9, 22,