12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections 12.1-12.4, 12.6-12.8, 12.10 Problems 12.1-12.4, 12.7, 12.8, 12.13, 12.15, 12.18.

Slides:



Advertisements
Similar presentations
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Advertisements

Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Indexing (Cont.) These slides are a modified version of the slides of the book “Database System Concepts” (Chapter 12), 5th Ed., McGraw-Hill,McGraw-Hill.
Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static.
CM20145 Indexing and Hashing
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
CIS552Indexing and Hashing1 Cost estimation Basic Concepts Ordered Indices B + - Tree Index Files B - Tree Index Files Static Hashing Dynamic Hashing Comparison.
Index Basic Concepts Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute to set of attributes.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
CST203-2 Database Management Systems Lecture 7. Disadvantages on index structure: We must access an index structure to locate data, or must use binary.
INDEXING AND HASHING.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. Indexing and Hashing Database Management Systems I Alex Coman, Winter 2006.
Chapter 9 of DBMS First we look at a simple (strawman) approach (ISAM). We will see why it is unsatisfactory. This will motivate the B+Tree Read 9.1 to.
1 Indexing and Hashing Indexing and Hashing Basic Concepts Dense and Sparse Indices B+Trees, B-trees Dynamic Hashing Comparison of Ordered Indexing and.
B+-tree and Hash Indexes
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Database Management Systems I Alex Coman, Winter 2006
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
José Alferes Versão modificada de Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 12: Indexing and Hashing.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
Indexing and Hashing.
B+ - Tree & B - Tree By Phi Thong Ho.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Ch12: Indexing and Hashing  Basic Concepts  Ordered Indices B+-Tree Index Files B+-Tree Index Files B-Tree Index Files B-Tree Index Files  Hashing Static.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
Indexing and Hashing.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Hashing.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Chapter 12: Indexing and Hashing
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Basic Concepts Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute to set of attributes.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 560: Database System Concepts Lecture 22 of 42 Wednesday, 22 October.
Indexing and Hashing By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 12: Indexing and Hashing.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 560: Database System Concepts Lecture 25 of 42 Monday, 31 March 2008 William.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Indexing.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and Hashing.
Indexing and hashing.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Indexing And Hashing.
Chapter 11: Indexing and Hashing
Chapter 12: Indexing and Hashing
Indexing and Hashing Basic Concepts Ordered Indices
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Chapter 11 Indexing And Hashing (1)
Chapter 12: Indexing and Hashing
Data Dictionary Storage
Presentation transcript:

12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15, 12.18

Basic Concepts Indexing - to speed up access to data Search key - attribute or attributes used to look up records in a file Index file - records of the form Two kinds  ordered: search keys are stored in some order  hash: search keys are distributed uniformly across “buckets” using a hash function Evaluation criteria  access types supported efficiently, e.g.,  records with a specified value in the attribute  records with a value falling in a specified range of values.  record access, insertion, deletion times  index overhead search-key pointer

Ordered Indices Index entries sorted on the search key value Primary index: in a sequentially ordered file, an index whose search key specifies the sequential order of the file  often the primary key Secondary index: different from the file's sequential order Dense index:

12.4 A Sparse Index How do we insert/delete records when there is an index? E.g. insert record for Othertown? E.g. delete A-110 record? A-215 record?

12.5 Multilevel Index If a primary index does not fit in memory, access becomes $$$ Use a sparse index on a dense index to reduce #disk accesses  outer index – a sparse index of primary index  inner index – the primary index file Store outer index in main memory Insertion/deletion?

12.6 Secondary Indices To search on some attribute other than a primary key E.g. the balance field of account Secondary indices have to be dense

12.7 B + -Tree Index Files Problems with indexed-sequential files:  performance degrades as file grows (many overflow blocks)  periodic reorganization of entire file Typical node (size n) K i : search-key values (ordered in a node) P i : pointers to children (for non-leaf nodes) or buckets of records (for leaf nodes)

12.8 B + -Tree Index Files Properties  all paths from root to leaf are of the same length  root node has between 2 and n children  non-root or leaf nodes have between  n/2  1 and n children (pointers)  leaf nodes have between  (n–1)/2  and n–1 values  insertions/deletions done in log time Automatic reorganization with small, local, changes 1  n/2  is the next integer ≥ n/2

12.9 Non-Leaf Nodes in B + -Trees A multi-level sparse index on the leaf nodes Properties:  all the search-keys in the subtree to which P 1 points are less than K 1  for 2  i  n – 1, all the search-keys in the subtree to which P i points have values greater than or equal to K i–1 and less than K j.  P n points to search keys with values ≥K n-1 E.g. (n=3) components are P 1 K 1 P 2 K 2 P 3

12.10 Leaf Nodes in B + -Trees For i = 1, 2,..., n–1, P i either points to  a file record with search-key value K i, or  a bucket of pointers to file records, each record having search-key value K i (bucket structure only if search-key is not a primary key)

12.11 B + -tree f B + -tree for account (n = 3) Root has at least 2 children Other non-leaf nodes have between 1 and 3 children (  (n/2  and n) Leaf nodes have between 1 and 2 values (  (n–1)/2  and n –1) Queries: how would you find Downtown and Round Hill

12.12 B + -tree with n=5 Leaf nodes have between 2 and 4 values (  (n–1)/2  and n –1, with n = 5) Non-leaf nodes other than root have between 3 and 5 children (  (n/2  and n with n =5) Root has at least 2 children

12.13 Efficiency of Queries on B +- Trees Processing a query: traverse from the root to a leaf node  K search-key values: path ≤  log  n/2  (K)  A node is generally the same size as a disk block With 1 million search key values and n = 100, ≤ log 50 (1,000,000) = 4 nodes are accessed in a lookup Balanced binary tree from CS 132: ~20 nodes are accessed in a lookup  significant since every node access may need a disk I/O

12.14 Insertion in B + -Trees A record for Perryridge?  follow tree and add to bucket A record for Othertown?  put to right of Mianus and add record to database A record for Clearview?  we need to add a new node

12.15 Insertion in B + -Trees Splitting a node:  take the n(search-key value, pointer) pairs (including the one being inserted) in sorted order. Place the first  n/2  in the original node, and the rest in a new node.  let the new node be p, and let k be the least key value in p. Insert (k,p) in the parent of the node being split. If the parent is full, split it and propagate the split further up. The splitting proceeds upwards till a node that is not full is found Worst case the root node is split, increasing the tree height by 1 Result of inserting Clearview in node containing Brighton and Downtown. Now there must be a node for Downtown in the next level up

12.16 Insertion in B + -Trees Before and after inserting “Clearview”. Now try: "Dashfield"

12.17 Deletion in B + -Trees Find the record to be deleted and remove it from the main file and from the bucket (if present) Remove (search-key value, pointer) from the leaf node if there is no bucket or if the bucket has become empty If the node has too few entries due to the removal, and the entries in the node and a sibling fit into a single node, then  insert all the search-key values in the two nodes into a single node (the one on the left), and delete the other node  delete the pair (K i–1, P i ), where P i is the pointer to the deleted node, from its parent, recursively using the above procedure If the node has too few pointers due to the removal, and the entries in the node and a sibling fit into a single node, then  redistribute the pointers between the node and a sibling  update the corresponding search-key value in the node's parent Deletions cascade up until a node with  n/2  or more pointers

12.18 Examples of B + -Tree Deletion Before and after deleting “Downtown” Removing the leaf node containing “Downtown” did not leave its parent with too few pointers. Cascaded deletions didn't go beyond the parent.

12.19 Examples of B + -Tree Deletion (Cont.) Node with “Perryridge” becomes underfull (empty) and merged with its sibling As a result “Perryridge” node’s parent became underfull, and was merged with its sibling (and an entry was deleted from their parent) Root node then had only one child and was deleted Delete “Perryridge”

12.20 Example of B + -tree Deletion (Cont.) Parent of leaf containing Perryridge became underfull, and borrowed a pointer from its left sibling Search-key value in the parent’s parent changes as a result Delete “Perryridge” from earlier example

12.21 B + -Tree File Organization Index file degradation is addressed using B + -Tree indices Data file degradation is addressed using B + -Tree file organization Leaf nodes in a B + -tree file store records, instead of pointers Records use more space than pointers Try to keep at least entries in each sibling (data) node

12.22 B-Tree Index File Similar to B+-tree, but search-key values appear only once B+-tree on same data: Brighton bucket Clearview bucket

12.23 B-Tree Index Files (Cont.) Advantages:  fewer tree nodes  may find search-key value before reaching leaf node Disadvantages  only small fraction of all search-key values are found early  non-leaf nodes are larger, so n is smaller and the B-Tree deeper  insertion and deletion more complicated  implementation harder Typically, advantages of B-Trees do not out weigh disadvantages

12.24 Static Hashing Bucket: unit of storage containing one or more records (typically a disk block) Hash file organization: obtain the bucket of a record directly from its search-key value using a hash function Hash function: h(K) = B.  K a search-key value, B a bucket address Used to locate records for access, insertion, and deletion If records with different search-key values are mapped to the same bucket, search the bucket sequentially to locate a record

12.25 Examples of Hash File Organization Assume 10 buckets Let a →1, b→2,... Method 1: h(k) returns this representation the first letter in k mod 10.  E.g. h(Perryridge) = 6, h(Brighton) = 2  Is this a good hash function? Method 2: h(k) returns the sum of the characters representations mod 10  E.g. h(Perryridge) = 5, h(Brighton) = 3 (B →2, r→8, i→9, g→7, h→8, t→0, o→5, n→4, =3) An ideal hash function  uniform: each bucket is assigned the same number of search-key values from the set of all possible values  random: irrespective of the actual distribution of search-key values

12.26 Example of Hash File Organization Hash file for account, using branch-name as key and method 2

12.27 Handling Bucket Overflows Overflow chaining – the overflow buckets of a given bucket are chained together in a linked list This scheme is called closed hashing  An alternative, open hashing (the data indexed by the hash goes in the next available slot) is not suitable for databases

12.28 Hash Indices Hashing can be used for file organization and to create an index This is a secondary index (not on primary key)

12.29 Deficiencies of Static Hashing Hash function h maps search-keys to a fixed set of bucket addresses  databases grow with time. If initial number of buckets is too small, performance will degrade due to overflows  if file size at some point in the future is anticipated and number of buckets allocated accordingly, significant amount of space will be wasted initially  if database shrinks, space will be wasted Expensive option: periodic file re-organization with new hash function There are also techniques that allow a dynamic # of buckets  good for databases that grow and shrink in size, will skip Hashing usually better at retrieving records with a specified key value Ordered indices preferred if range queries are common Ordered Indexing versus Hashing

12.30 Index Definition in SQL Create an index create index on ( ) E.g. create index b-index on branch(branch-name) create index b-index using btree on branch(branch-name) create index b-index using hash on branch(branch-name) To drop an index drop index