Index Structures Chapter 13 of GUW September 16, 2019

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Advertisements

DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Hashing and Indexing John Ortiz.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 March 4, 2004 INDEXING II Lecture based on [GUW,
CS CS4432: Database Systems II Basic indexing.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
CS 277 – Spring 2002Notes 41 CS 277: Database System Implementation Notes 4: Indexing Arthur Keller.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
CS 4432lecture #71 CS4432: Database Systems II Lecture #7 Professor Elke A. Rundensteiner.
CS 245Notes 41 CS 245: Database System Principles Notes 4: Indexing Hector Garcia-Molina.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Index Tuning Conventional index Secondary index To speed up queries on attributes not within primary key Primary index –Determine.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman.
CS4432: Database Systems II
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
1 Query Processing Part 3: B+Trees. 2 Dense and Sparse Indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Insertions.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins
COMP3017 Advanced Databases
CS 245: Database System Principles
CPS216: Data-intensive Computing Systems
CS 245: Database System Principles Notes 4: Indexing
CS 245: Database System Principles Notes 4: Indexing
CS232A: Database System Principles INDEXING
Hash-Based Indexes Chapter 11
CPSC-608 Database Systems
CS 245: Database System Principles
Introduction to Database Systems
Yan Huang - CSCI5330 Database Implementation – Access Methods
(Slides by Hector Garcia-Molina,
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hash-Based Indexes Chapter 10
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
CS 245: Database System Principles
Hash-Based Indexes Chapter 11
Index tuning Hash Index.
CS 245: Database System Principles Notes 4: Indexing
B+Tree Example n=3 Root
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Database Systems (資料庫系統)
Database Design and Programming
(a) Insert key = 32 n=
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
CPSC-608 Database Systems
Hash-Based Indexes Chapter 11
Chapter 11 Instructor: Xin Zhang
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
CS4432: Database Systems II
Presentation transcript:

Index Structures Chapter 13 of GUW September 16, 2019 ICS 541: Index Structures

Objectives Different ways of organizing blocks What is the best way to organize record in blocks to minimize: Query cost Exact match Partial match Range Join Insertion cost Deletion cost Update cost Storage cost September 16, 2019 ICS 541: Index Structures

- Lecture outline Basic Concepts Index on Sequential files Secondary Indexes B-Trees Hash Tables September 16, 2019 ICS 541: Index Structures

- Basic Concepts … Disk Index blocks Data blocks Empty blocks September 16, 2019 ICS 541: Index Structures

… - Basic Concepts … Index blocks result Query Data blocks September 16, 2019 ICS 541: Index Structures

… - Basic Concepts Block pointer Record pointer Record pointers take more space than block pointers. Why? September 16, 2019 ICS 541: Index Structures

- Indexes Sequential Files Dense Index Sparse Index Multiple level of Index Index with Duplicate Search Keys Managing Indexes During Data Modification September 16, 2019 ICS 541: Index Structures

-- Sequential Files Sequential File 10 20 30 40 50 60 70 80 90 100 September 16, 2019 ICS 541: Index Structures

-- Dense Index Sequential File Dense Index 10 20 30 40 50 60 70 80 90 Search key 50 60 70 80 60 50 80 70 90 100 110 120 100 90 September 16, 2019 ICS 541: Index Structures

-- Sparse Index Sequential File Sparse Index 10 20 30 40 50 60 70 80 90 110 130 150 60 50 80 70 170 190 210 230 100 90 September 16, 2019 ICS 541: Index Structures

-- Dense Vs Sparse Index: Example Relation Relation R with 1,000,000 tuples A block of size 4096 bytes (4k) 10 R tuples per block Data space required => 1000000/10 * 4k = 400MB. Dense index record size: 30 Bytes for search key + 8 bytes for record pointer Can fit 100 index records per block Dense index space = 1000000/100 * 4k = 40MB. Binary search cost log2(10000) = 14 disk accesses at most Keeping blocks (1/2, ¼, ¾, 1/8, …) in memory can lower disk access. Sparse index 1000 index blocks = 4MB Binary search cost log2(1000) = 10 disk accesses at most September 16, 2019 ICS 541: Index Structures

-- Multiple level of Index Sequential File Sparse 2nd level 20 10 10 90 170 250 10 30 50 70 40 30 90 110 130 150 330 410 490 570 60 50 80 70 170 190 210 230 100 90 September 16, 2019 ICS 541: Index Structures

-- Contiguous sequential file R1 K1 say: 1024 B per block if we want K3 block: get it at offset (3-1)1024 = 2048 bytes R2 K2 K3 R3 K4 R4 September 16, 2019 ICS 541: Index Structures

-- Sparse vs. Dense Tradeoff Less index space per record can keep more of index in memory Dense Can tell if any record exists without accessing file Must be used for secondary index September 16, 2019 ICS 541: Index Structures

--Terms Index sequential file Search key (  primary key) Primary index (on Sequencing field) Secondary index Dense index (all Search Key values in) Sparse index Multi-level index September 16, 2019 ICS 541: Index Structures

-- Duplicate keys … 10 10 20 20 30 30 40 45 September 16, 2019 ICS 541: Index Structures

Dense index, one way to implement? -- Duplicate keys … Dense index, one way to implement? 10 10 10 10 10 10 10 10 20 10 20 10 20 20 30 20 30 20 20 20 30 30 30 30 30 30 30 30 45 40 45 40 September 16, 2019 ICS 541: Index Structures

… -- Duplicate keys … 10 10 20 20 30 30 40 45 Dense index, better way? September 16, 2019 ICS 541: Index Structures

careful if looking for 20 or 30! … -- Duplicate keys …. 10 10 20 20 30 Sparse index, one way? careful if looking for 20 or 30! 10 10 10 20 20 10 30 30 20 30 45 40 September 16, 2019 ICS 541: Index Structures

place first new key from block … -- Duplicate keys … place first new key from block 10 should this be 40? 10 20 30 20 10 30 30 20 30 45 40 Sparse index, another way? September 16, 2019 ICS 541: Index Structures

a a a b … -- Duplicate keys Incase of primary index may point to first instance of each value only File Index a a a . b September 16, 2019 ICS 541: Index Structures

-- Managing Indexes During Data Modification Deletion from Sparse Index Insertion into Sparse Index September 16, 2019 ICS 541: Index Structures

--- Deletion from sparse index … 20 10 10 30 50 40 30 70 60 50 90 110 130 80 70 150 September 16, 2019 ICS 541: Index Structures

… --- Deletion from sparse index … delete record 40 20 10 10 30 50 40 30 70 60 50 90 110 130 80 70 150 September 16, 2019 ICS 541: Index Structures

… --- Deletion from sparse index … delete record 30 20 10 10 40 30 50 40 30 70 60 50 90 110 130 80 70 150 September 16, 2019 ICS 541: Index Structures

… ---- Deletion from sparse index … delete records 30 & 40 20 10 10 50 70 30 50 40 30 70 60 50 90 110 130 80 70 150 September 16, 2019 ICS 541: Index Structures

… --- Deletion from dense index … 20 10 10 20 30 40 30 40 60 50 50 60 70 80 70 80 September 16, 2019 ICS 541: Index Structures

… --- Deletion from dense index delete record 30 20 10 10 20 40 40 30 30 40 40 60 50 50 60 70 80 70 80 September 16, 2019 ICS 541: Index Structures

--- Insertion, sparse index case … 20 10 10 30 40 30 60 50 40 60 September 16, 2019 ICS 541: Index Structures

… --- Insertion, sparse index case … insert record 34 20 10 10 30 40 30 34 our lucky day! we have free space where we need it! 60 50 40 60 September 16, 2019 ICS 541: Index Structures

… --- Insertion, sparse index case … insert record 15 20 10 15 20 30 10 30 40 30 60 50 40 Illustrated: Immediate reorganization Variation: insert new block (chained file) update index 60 September 16, 2019 ICS 541: Index Structures

… --- Insertion, sparse index case insert record 25 20 10 25 overflow blocks (reorganize later...) 10 30 40 30 60 50 40 60 September 16, 2019 ICS 541: Index Structures

Similar Often more expensive . . . --- Insertion, dense index case September 16, 2019 ICS 541: Index Structures

- Secondary Indexes Design of Secondary Indexes Duplicate Values and Secondary Indexes Applications of Secondary Indexes Indirection in Secondary Indexes September 16, 2019 ICS 541: Index Structures

-- Design of Secondary Indexes … Sequence field 50 30 70 20 40 80 10 100 60 90 September 16, 2019 ICS 541: Index Structures

does not make sense! … -- Design of Secondary Indexes … 30 50 20 70 80 Sequence field Sparse index does not make sense! 50 30 30 20 80 100 70 20 90 ... 40 80 10 100 60 90 September 16, 2019 ICS 541: Index Structures

… -- Design of Secondary Indexes … Sequence field Dense index 10 20 30 40 50 60 70 ... 50 30 10 50 90 ... sparse high level 70 20 40 80 10 100 60 90 September 16, 2019 ICS 541: Index Structures

… -- Design of Secondary Indexes Lowest level is dense Record pointers Other levels are sparse Block pointers September 16, 2019 ICS 541: Index Structures

-- Duplicate Values and Secondary Indexes … 10 20 40 20 40 10 40 10 40 30 September 16, 2019 ICS 541: Index Structures

-- Duplicate Values and Secondary Indexes … one option... 10 20 10 20 40 20 Problem: excess overhead! disk space search time 20 30 40 40 10 40 10 40 ... 40 30 September 16, 2019 ICS 541: Index Structures

-- Duplicate Values and Secondary Indexes … another option... 10 20 10 40 20 20 Problem: variable size records in index! 40 10 30 40 40 10 40 30 September 16, 2019 ICS 541: Index Structures

-- Duplicate Values and Secondary Indexes … 10 20 10 20 30 40 40 20 50 60 ... 40 10 40 10 40 30 buckets September 16, 2019 ICS 541: Index Structures

---Why “bucket” idea is useful … Indexes Records Name: primary EMP (name,dept,floor,...) Dept: secondary Floor: secondary September 16, 2019 ICS 541: Index Structures

Query: Get employees in (Toy Dept) ^ (2nd floor) … ---Why “bucket” idea is useful … Query: Get employees in (Toy Dept) ^ (2nd floor) Dept. index EMP Floor index Toy 2nd Intersect toy bucket and 2nd Floor bucket to get set of matching EMP’s. Used for text retrieval September 16, 2019 ICS 541: Index Structures

This idea used in text information retrieval. … ---Why “bucket” idea is useful This idea used in text information retrieval. Documents Inverted lists cat dog ...the cat is fat ... ...was raining cats and dogs... ...Fido the dog ... September 16, 2019 ICS 541: Index Structures

-- Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance September 16, 2019 ICS 541: Index Structures

--- Example Index (sequential) 10 39 31 35 36 32 38 34 33 20 30 40 50 continuous free space 10 39 31 35 36 32 38 34 33 overflow area (not sequential) 20 30 40 50 60 70 80 90 September 16, 2019 ICS 541: Index Structures

-- Next Another type of index Btree Has Schemes Give up on sequentiality of index Try to get “balance” Btree Has Schemes September 16, 2019 ICS 541: Index Structures

B+Tree Example n=3 Root 100 120 150 180 30 3 5 11 120 130 180 200 30 35 100 101 110 150 156 179 September 16, 2019 ICS 541: Index Structures

-- Sample non-leaf 57 81 95 To keys 57> and <=81 To keys >95 To keys <57 September 16, 2019 ICS 541: Index Structures

-- Sample leaf node: 57 81 95 with key 57 with key 81 To record From non-leaf node to next leaf in sequence 57 81 95 with key 57 with key 81 To record with key 85 September 16, 2019 ICS 541: Index Structures

Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data -- Size of Nodes n keys n + 1 Pointers Use at least Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data September 16, 2019 ICS 541: Index Structures

--- Example: n = 3 Full node min. node Non-leaf Leaf 120 150 180 30 3 11 30 35 counts even if null September 16, 2019 ICS 541: Index Structures

-- B+tree rules tree of order n All leaves at same lowest level (balanced tree) Pointers in leaves point to records except for “sequence pointer” September 16, 2019 ICS 541: Index Structures

-- Insert into B+tree (a) simple case (b) leaf overflow space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root September 16, 2019 ICS 541: Index Structures

--- Insert key = 32 n=3 100 30 3 5 11 30 31 32 September 16, 2019 ICS 541: Index Structures

n=3 --- Insert key = 7 100 30 7 3 5 11 30 31 3 5 7 September 16, 2019 ICS 541: Index Structures

n=3 --- Insert key = 160 100 160 120 150 180 180 150 156 179 180 200 160 179 September 16, 2019 ICS 541: Index Structures

--- New root, insert 45 n=3 30 new root 10 20 30 40 1 2 3 10 12 20 25 32 40 40 45 September 16, 2019 ICS 541: Index Structures

-- Deletion from B+tree Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf September 16, 2019 ICS 541: Index Structures

n=4 --- Coalesce with sibling: Delete 50 10 40 100 40 10 20 30 40 50 September 16, 2019 ICS 541: Index Structures

n=4 --- Redistribute key: Delete 50 10 40 100 35 10 20 30 35 40 50 September 16, 2019 ICS 541: Index Structures

n=4 --- Non-leaf coalesce: Delete 37 new root 25 25 10 20 30 40 40 30 26 1 3 10 14 20 22 30 37 40 45 September 16, 2019 ICS 541: Index Structures

--- B+tree deletions in practice Often, coalescing is not implemented Too hard and not worth it! September 16, 2019 ICS 541: Index Structures

- Hashing key  h(key) <key> Buckets (typically 1 disk block) . September 16, 2019 ICS 541: Index Structures

. . (1) key  h(key) -- Two alternatives … records September 16, 2019 ICS 541: Index Structures

(2) key  h(key) … -- Two alternatives record Index Alt (2) for “secondary” search key September 16, 2019 ICS 541: Index Structures

-- Example hash function … Key = ‘x1 x2 … xn’ n byte character string Have b buckets h: add x1 + x2 + ….. Xn compute sum modulo b September 16, 2019 ICS 541: Index Structures

… -- Example hash function This may not be best function … Read Knuth Vol. 3 if you really need to select a good function. Good hash  Expected number of function: keys/bucket is the same for all buckets September 16, 2019 ICS 541: Index Structures

-- Within a bucket Do we keep keys sorted? Yes if CPU time critical, and Inserts/Deletes not too frequent September 16, 2019 ICS 541: Index Structures

-- Example to illustrate inserts, overflows, deletes h(K) September 16, 2019 ICS 541: Index Structures

… -- Example to illustrate insert … 1 2 3 INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 d a c b e h(e) = 1 September 16, 2019 ICS 541: Index Structures

Delete: e f c … -- Example to illustrate delete … a b d c d e f g 1 2 1 2 3 a d b d c c e maybe move “g” up f g September 16, 2019 ICS 541: Index Structures

--- Rule of thumb Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket September 16, 2019 ICS 541: Index Structures

-- How do we cope with growth? Static hashing Overflows and reorganizations Very expensive Solution Dynamic hashing Extensible Linear September 16, 2019 ICS 541: Index Structures

-- Extensible hashing: two ideas .. (a) Use i of b bits output by hash function b h(K)  use i  grows over time…. 00110101 September 16, 2019 ICS 541: Index Structures

… -- Extensible hashing: two ideas (b) Use directory h(K)[i ] to bucket . . September 16, 2019 ICS 541: Index Structures

--- Example: h(k) is 4 bits; 2 keys/bucket … New directory 2 00 01 10 11 i = 1 i = 0001 1 1 1001 1 1100 1010 1100 Insert 1010 September 16, 2019 ICS 541: Index Structures

… --- Example: h(k) is 4 bits; 2 keys/bucket … 0000 0111 0001 i = 2 00 01 10 11 1 0001 0111 2 1001 1010 Insert: 0111 0000 2 1100 September 16, 2019 ICS 541: Index Structures

… --- Example: h(k) is 4 bits; 2 keys/bucket 000 001 010 011 100 101 110 111 3 i = 0000 2 i = 0001 2 00 01 10 11 0111 2 1001 1010 2 1001 1010 Insert: 1001 2 1100 September 16, 2019 ICS 541: Index Structures

--- Extensible hashing: deletion Two option: No merging of blocks Merge blocks and cut directory if possible September 16, 2019 ICS 541: Index Structures

---- Deletion example Run thru insert example in reverse! September 16, 2019 ICS 541: Index Structures

-- Summary: Extensible hashing Can handle growing files - No full reorganizations + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - September 16, 2019 ICS 541: Index Structures

-- Summary: Extensible hashing Advantage No reorganization is needed One disk access per record Disadvantage Doubling bucket array is expensive The size of the bucket array may no longer fit into memory The number of bucket may be much bigger than the blocks Example: Splitting records can only be done in higher bits. September 16, 2019 ICS 541: Index Structures

-- Linear hashing 01110101 Another dynamic hashing scheme Two ideas: (a) Use i low order bits of hash 01110101 grows b i (b) File grows linearly September 16, 2019 ICS 541: Index Structures

Example b=4 bits, i =2, 2 keys/bucket 0101 can have overflow chains! insert 0101 Future growth buckets 0000 0101 1010 1111 00 01 10 11 m = 01 (max used block) If h(k)[i ]  m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2i -1 Rule September 16, 2019 ICS 541: Index Structures

Example b=4 bits, i =2, 2 keys/bucket 0101 insert 0101 1111 0101 Future growth buckets 11 0000 1010 0101 10 1010 1111 00 01 10 11 m = 01 (max used block) September 16, 2019 ICS 541: Index Structures

Example Continued: How to grow beyond this? 100 101 110 111 3 i = 2 0000 100 0101 101 0101 1010 1111 0101 00 01 10 11 . . . m = 11 (max used block) September 16, 2019 ICS 541: Index Structures

--- When do we expand file? Keep track of: # used slots total # of slots = U If U > threshold then increase m (and maybe i ) September 16, 2019 ICS 541: Index Structures

-- Summary: Linear Hashing Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing + + Can still have overflow chains - September 16, 2019 ICS 541: Index Structures

-- Hashing Summary Hashing - How it works - Dynamic hashing - Extensible - Linear September 16, 2019 ICS 541: Index Structures

-- Indexing Vs Hashing … Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 September 16, 2019 ICS 541: Index Structures

… -- Indexing vs Hashing INDEXING (Including B Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 September 16, 2019 ICS 541: Index Structures

- Reading list Chapter 13 of GUW B-tree (WebCt) September 16, 2019 ICS 541: Index Structures

END September 16, 2019 ICS 541: Index Structures