Yan Huang - CSCI5330 Database Implementation – Access Methods

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Advertisements

DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Hashing and Indexing John Ortiz.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 March 4, 2004 INDEXING II Lecture based on [GUW,
Indexing Techniques. Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index?
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
1 Indexing and Hashing Indexing and Hashing Basic Concepts Dense and Sparse Indices B+Trees, B-trees Dynamic Hashing Comparison of Ordered Indexing and.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman.
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins
COMP3017 Advanced Databases
CS 245: Database System Principles
Indexing and hashing.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
CS 245: Database System Principles Notes 4: Indexing
CS 245: Database System Principles Notes 4: Indexing
CS232A: Database System Principles INDEXING
COMP 430 Intro. to Database Systems
CPSC-608 Database Systems
Database Management Systems (CS 564)
CS 245: Database System Principles
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
File organization and Indexing
Chapter 11: Indexing and Hashing
Introduction to Database Systems
(Slides by Hector Garcia-Molina,
Indexing and Hashing Basic Concepts Ordered Indices
CS 245: Database System Principles
Index tuning Hash Index.
CS 245: Database System Principles Notes 4: Indexing
B+Tree Example n=3 Root
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Database Systems (資料庫系統)
Chapter 11 Indexing And Hashing (1)
Database Design and Programming
(a) Insert key = 32 n=
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
Chapter 11 Instructor: Xin Zhang
Chapter 11: Indexing and Hashing
CS4432: Database Systems II
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Yan Huang - CSCI5330 Database Implementation – Access Methods This is a modified version of Prof. Hector Garcia Molina’s slides. All copy rights belong to the original author. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Basic Concepts Value Search Key - set of attributes used to look up records in a file. search key pointer record ? value 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Index Evaluation Metrics Access types supported efficiently. E.g., Point query: find “Tom” Range query: find students whose age is between 20-40 Access time Update time Space overhead 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Ordered Indices In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods same order Search key 20 10 Primary index Also called clustering index The search key of a primary index is usually but not necessarily the primary key. 10 30 50 70 40 30 90 110 130 150 60 50 80 70 170 190 210 230 100 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods different order Search key Secondary index: non-clustering index. 10 20 30 40 50 60 70 ... 50 30 70 20 40 80 10 100 60 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Dense Index Sequential File 20 10 10 20 30 40 Dense Index: contains index records for every search-key values. 40 30 50 60 70 80 60 50 80 70 90 100 110 120 100 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Sparse Index Sequential File 20 10 10 30 50 70 Sparse Index: contains index records for only some search-key values. Applicable when records are sequentially ordered on search-key 40 30 90 110 130 150 60 50 80 70 170 190 210 230 100 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Secondary indexes Sequence field does not make sense! 50 30 30 20 80 100 70 20 Sparse index 90 ... 40 80 10 100 60 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Multilevel Index Sparse 2nd level Sequential File 20 10 10 90 170 250 10 30 50 70 40 30 90 110 130 150 330 410 490 570 60 50 80 70 170 190 210 230 100 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Multilevel Index Secondary indexes Sequence field 10 20 30 40 50 60 70 ... 50 30 10 50 90 ... sparse high level 70 20 40 80 10 100 60 90 Lowest level is dense Other levels are sparse 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Outline: Conventional indexes B+-Tree  NEXT 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods NEXT: Another type of index Give up on sequentiality of index Try to get “balance” 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods B+Tree Example n=4 Root 100 120 150 180 30 3 5 11 120 130 180 200 30 35 100 101 110 150 156 179 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81k<95 95 Key is moved (not copied) from lower level non-leaf node to upper level non-leaf node 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 with key 57 with key 81 To record with key 85 Key is copied (not moved) from leaf node to non-leaf node 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods 35 Leaf: Non-leaf: 30 35 30 30 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Size of nodes: n pointers n-1 keys 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Don’t want nodes to be too empty Use at least Root : 2 pointers Non-leaf: n/2 pointers Leaf : (n-1)/2 keys 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 counts even if null 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

B+tree rules tree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer” 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods (3) Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n n-1 n/2 n/2- 1 Leaf (non-root) n n-1 (n-1)/2 (n-1)/2 Root n n-1 2 1 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Insert into B+tree (a) simple case space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods (a) Insert key = 32 n=4 100 30 3 5 11 30 31 32 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods (b) Insert key = 7 n=4 100 30 7 3 5 11 30 31 3 5 7 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods (c) Insert key = 160 n=4 100 160 120 150 180 180 150 156 179 180 200 160 179 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods (d) New root, insert 45 n=4 30 new root 10 20 30 40 1 2 3 10 12 20 25 30 32 40 40 45 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods (b) Coalesce with sibling Delete 50 n=5 10 40 100 40 10 20 30 40 50 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods (c) Redistribute keys Delete 50 n=5 10 40 100 35 10 20 30 35 40 50 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods (d) Non-leaf coalesce Delete 37 n=5 25 25 new root 10 20 30 40 40 30 25 26 1 3 10 14 20 22 30 37 40 45 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

B+tree deletions in practice Often, coalescing is not implemented Too hard and not worth it! 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Index Definition in SQL Create an index create index <index-name> on <relation-name> (<attribute-list>) E.g.: create index gindex on country(gdp); To drop an index drop index <index-name> E.g.: drop index gindex; 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Multi-key Index Motivation: Find records where DEPT = “Toy” AND SAL > 50k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Strategy I: Use one index, say Dept. Get all Dept = “Toy” records and check their salary I1 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Strategy II: Use 2 Indexes; Manipulate Pointers Toy Sal > 50k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Strategy III: Multiple Key Index One idea: I2 I3 I1 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Example Example Record Dept Index Salary 10k 15k Art Sales Toy 17k 21k Name=Joe DEPT=Sales SAL=15k 12k 15k 15k 19k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

For which queries is this index good? Find RECs Dept = “Sales” SAL=20k Find RECs Dept = “Sales” SAL > 20k Find RECs Dept = “Sales” Find RECs SAL = 20k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Interesting application: Geographic Data DATA: <X1,Y1, Attributes> <X2,Y2, Attributes> y x . . . 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Queries: What city is at <Xi,Yi>? What is within 5 miles from <Xi,Yi>? Which is closest point to <Xi,Yi>? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Example a 25 15 35 20 40 30 10 10 20 i d e h Search points near f Search points near b b n f 5 15 l o c j g m k h i a b c d e f g n o m l j k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Queries Find points with Yi > 20 Find points with Xi < 5 Find points “close” to i = <12,38> Find points “close” to b = <7,24> 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Many types of geographic index structures have been suggested Quad Trees R Trees 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Two more types of multi key indexes Grid Bitmap index 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Grid Index Key 2 X1 X2 …… Xn V1 V2 Key 1 Vn To records with key1=V3, key2=X2 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods CLAIM Can quickly find records with key 1 = Vi  Key 2 = Xj key 1 = Vi key 2 = Xj And also ranges…. E.g., key 1  Vi  key 2 < Xj 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods  But there is a catch with Grid Indexes! How is Grid Index stored on disk? Like Array... X1 X2 X3 X4 V1 V2 V3 Problem: Need regularity so we can compute position of <Vi,Xj> entry 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Solution: Use Indirection Buckets V1 V2 V3 *Grid only V4 contains pointers to buckets X1 X2 X3 -- -- -- -- -- 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods With indirection: Grid can be regular without wasting space We do have price of indirection 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Can also index grid on value ranges Salary Grid 0-20K 1 20K-50K 2 50K- 8 3 Linear Scale 1 2 3 Toy Sales Personnel 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Grid files Good for multiple-key search Space, management overhead (nothing is free) Need partitioning ranges that evenly split keys + - - 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Example Grid File for account Divide branch-name into non-uniform intervals ? Branch-name <Central and 10k<=balance<50k two attributes as search key Divide balance into non-uniform intervals What about Central<=branch-name<Townsend and 50k<=balance? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Example Grid File for account Bj Bk 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Grid Files (Cont.) Linear scales must be chosen to uniformly distribute records across cells. Otherwise there will be too many overflow buckets. Periodic re-organization to increase grid size will help. But reorganization can be very expensive. Space overhead of grid array can be high. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Bitmap Indices Another index could be used for multiple valued search keys 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Bitmap Indices (Cont.) The income-level value of record 3 is L1 Bitmap(size = table size) Unique values of gender Unique values of income-level 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Bitmap Indices (Cont.) Some properties of bitmap indices Number of bitmaps for each attribute? Size of each bitmap? When is the bitmap matrix sparse and what attributes are good for bitmap indices? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Bitmap Indices (Cont.) Bitmap indices generally very small compared with relation size E.g. if record is 100 bytes, space for a single bitmap is 1/800 of space used by relation. If number of distinct attribute values is 8, bitmap is only 1% of relation size What about insertion? Deletion? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Bitmap Indices Queries Sample query: Males with income level L1 10010 AND 10100 = 10000 even faster! What about the number of males with income level L1? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Bitmap Indices Queries Queries are answered using bitmap operations Intersection (and) Union (or) Complementation (not) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Hashing key  h(key) <key> Buckets (typically 1 disk block) . 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Two alternatives . records (1) key  h(key) . 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Two alternatives record (2) key  h(key) key 1 Index Alt (2) for “secondary” search key 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Example hash function Key = ‘x1 x2 … xn’ n byte character string Have b buckets h: add x1 + x2 + ….. xn compute sum modulo b 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods  This may not be best function … Good hash  Expected number of function: keys/bucket is the same for all buckets 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & Inserts/Deletes not too frequent 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Next: example to illustrate inserts, overflows, deletes h(K) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 1 2 3 d a c b e h(e) = 1 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods EXAMPLE: deletion Delete: e f 1 2 3 a d b d c c e maybe move “g” up f g 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(K)  use i  grows over time…. 00110101 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods (b) Use directory h(K)[i ] to bucket . . 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Example: h(k) is 4 bits; 2 keys/bucket New directory 2 00 01 10 11 i = 1 i = 0001 1 1 1001 1 1100 1010 1100 Insert 1010 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Example continued 2 0000 0111 0001 i = 2 00 01 10 11 1 0001 0111 2 1001 1010 Insert: 0111 0000 2 1100 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Example continued 000 001 010 011 100 101 110 111 3 i = 0000 2 i = 0001 2 00 01 10 11 0111 2 1001 1010 2 1001 1010 Insert: 1001 2 1100 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Extensible hashing: deletion No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Deletion example: Run thru insert example in reverse! 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Extensible hashing Summary Can handle growing files - with less wasted space - with no full reorganizations + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Linear hashing Another dynamic hashing scheme Two ideas: (a) Use i low order bits of hash 01110101 grows b i (b) File grows linearly 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Example b=4 bits, i =2, 2 keys/bucket 0101 can have overflow chains! insert 0101 Future growth buckets 0000 0101 1010 1111 00 01 10 11 m = 01 (max used block) If h(k)[i ]  m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2i -1 Rule 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Example b=4 bits, i =2, 2 keys/bucket 0101 insert 0101 1111 0101 Future growth buckets 11 0000 1010 0101 10 1010 1111 00 01 10 11 m = 01 (max used block) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Example Continued: How to grow beyond this? 100 101 110 111 3 i = 2 0000 100 0101 101 0101 1010 1111 0101 00 01 10 11 . . . m = 11 (max used block) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods  When do we expand file? Keep track of: # used slots total # of slots = U If U > threshold then increase m (and maybe i ) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Linear Hashing Summary Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing + + Can still have overflow chains - 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Example: BAD CASE Very full Very empty Need to move m here… Would waste space... 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Summary Hashing - How it works - Dynamic hashing - Extensible - Linear 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Indexing vs Hashing Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

Yan Huang - CSCI5330 Database Implementation – Access Methods Indexing vs Hashing INDEXING good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods