CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 1 Notes #12.

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #2.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #10.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #7.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #6.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #4.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
HASH TABLES Malathi Mansanpally CS_257 ID-220. Agenda: Extensible Hash Tables Insertion Into Extensible Hash Tables Linear Hash Tables Insertion Into.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #13.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #14.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes 1.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #6.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes 1.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #6.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Chapter 5 Record Storage and Primary File Organizations
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Scholastic Dishonesty
COMP3017 Advanced Databases
CS 245: Database System Principles
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CS 245: Database System Principles
External Memory Hashing
Yan Huang - CSCI5330 Database Implementation – Access Methods
External Memory Hashing
CS 245: Database System Principles
CPSC-608 Database Systems
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
Scholastic Dishonesty
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CS4432: Database Systems II
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12

secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database

The Main Purpose of Index Structures Speedup the search process 3 index σ a=6 (R) blocks contianing the desired tuples quickly figure out disks otherwise have to scan the entire R Example: B+ trees

Hash Tables 4 hush function h search key k h(k)h(k) buckets A bucket is typically a disk block (probably with overflow blocks) h(k), 0 ≤ k ≤ b-1, gives an easy way to compute the bucket address (direct: address from h(k); indirect: h(k) is the index in a directory.

Hash Tables 5 hush function h search key k h(k)h(k) buckets A bucket is typically a disk block (probably with overflow blocks) h(k), 0 ≤ k ≤ b-1, gives an easy way to compute the bucket address (direct: address from h(k); indirect: h(k) is the index in a directory. Ideally, # tuples in R/ b = 60% ~ 70% of the bucket size

How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 6

Linear hashing Another dynamic hashing scheme Two ideas: (a) Use i low order bits of hash grows b i (b) File grows linearly 7

Linear Hashing: General framework 8 k h(k)h(k)h(k)[i] i i 00…00 00…01 1x…xx = n i buckets b h grow linearly no tuples

Example b=4 bits, 2 keys/bucket Future growth buckets 9

Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) n = future growth buckets 10

Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) i = 2 (# used bits = # bits of n) n = Future growth buckets 11

Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) i = 2 (# used bits = # bits of n) n = Future growth buckets 12 Rules: If h(k)[i] < n, then look at bucket h(k)[i]

Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) i = 2 (# used bits = # bits of n) n = Future growth buckets 13 Rules: If h(k)[i] < n, then look at bucket h(k)[i] If h(k)[i] ≥ n, then look at bucket h(k)[i ] - 2 i -1 (i.e., replacing the leading bit 1 of h(k)[i] by 0)

Insertion: b=4 bits, 2 keys/bucket, n=10, i= n = Future growth buckets 1101 can have overflow chains! insert 1101 Rules: If h(k)[i ] < n, then look at bucket h(k)[i] If h(k)[i ] ≥ n, then look at bucket h(k)[i ] - 2 i -1 (i.e., replacing the leading bit 1 of h(k)[i] by 0) 14

Increase size n: b=4 bits, n=10, i= n = n = 10 Future growth buckets 15

Increase size n: b=4 bits, n=11, i= n = Future growth buckets 16

Insert: b=4 bits, n=11, i= n = Future growth buckets insert 1101

Increase size n: b=4 bits, n=11, i= n = n =100

Increase size n: b=4 bits, n=100, i= n = n =100

Increase size n: b=4 bits, n=100, i= n =100

Increase size n: b=4 bits, n=100, i= n =100

Increase size n: b=4 bits, n=100, i= n =100

If U > threshold then increase m (and maybe i )  When do we expand file? Keep track of: # used slots total # of slots = U 23

Linear hashing: Searching input: a search key k \\ h is the hash function, i is the current bit number, \\ n is the current upper bound. 1. m = the last i bits of h(k); 2. IF m ≥ n THEN m = m – 2 i-1 ; 3. read in the disk block B with the address m \\ If k is not in B, you may have to read the \\ overflow blocks. Summary A. 24

Linear hashing: Insertion input: a tuple t with search key k \\ h is the hash function, i is the current bit number. \\ n is the current upper bound 1. m = the last i bits of h(k); 2. IF m ≥ n THEN m = m – 2 i-1 ; 3. read in the disk block B with the address m; insert t \\ If B is full, you need to use an overflow block. Summary B. 25

Linear hashing: Increasing hash table size \\ H is the hash function, i is the current bit number, \\ n is the current upper bound. 1.read in the disk block B of address n – 2 i -1 ; 2.split (properly) the tuples in B and put them in the block B and the block B’of address n; 3. n = n + 1; 4.IF # bits of n is increased THEN i = i + 1. Summary C. 26

Linear hashing: Decreasing hash table size \\ H is the hash function, i is the current bit number, \\ n is the current upper bound. 1.n = n - 1; 2.IF # bits of n is decreased THEN i = i − 1. 3.move the tuples in the block of address n to the block of address n – 2 i -1. Summary D. 27

Linear Hashing Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing Summary E. + + Can still have overflow chains - 28

Example: BAD CASE Very full Very emptyNeed to move m here… Would waste space… 29

Hashing - How it works - Dynamic hashing - Extensible - Linear Summary 30

secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database

Algorithms implementing the relational algebraic operations: Projection and selection Set and bag operations Join operations Grouping, duplicate elimination, sorting Next 32

secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database