CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 1 Notes #9.

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Multidimensional Indexing
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Hashing and Indexing John Ortiz.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Multidimensional Data
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
CS 4432lecture #11 - indexing & hashing1 CS4432: Database Systems II Lecture #11 Professor Elke A. Rundensteiner.
B+-tree and Hashing.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
HASH TABLES Malathi Mansanpally CS_257 ID-220. Agenda: Extensible Hash Tables Insertion Into Extensible Hash Tables Linear Hash Tables Insertion Into.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
1 CS232A: Database System Principles INDEXING. 2 Given condition on attribute find qualified records Attr = value Condition may also be Attr>value Attr>=value.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
CS 4432query processing1 CS4432: Database Systems II Lecture #11 Professor Elke A. Rundensteiner.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Chapter 5 Multidimensional Indexes. One dimensional index can be used to support multidimensional query. F1=‘abcd’ F2= 123‘abcd#123’
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
1 Dept. of CIS, Temple Univ. CIS661 – Principles of Data Management V. Megalooikonomou Indexing and Hashing II (based on slides by C. Faloutsos at CMU)
Hash-Based Indexes. Introduction uAs for any index, 3 alternatives for data entries k*: w Data record with key value k w w Choice orthogonal to the indexing.
CS4432: Database Systems II
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Indexing and Hashing – part II.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
COMP3017 Advanced Databases
CS 245: Database System Principles
Dynamic Hashing (Chapter 12)
CS232A: Database System Principles INDEXING
COMP 430 Intro. to Database Systems
CPSC-608 Database Systems
CS 245: Database System Principles
External Memory Hashing
Chapter 11: Indexing and Hashing
Yan Huang - CSCI5330 Database Implementation – Access Methods
External Memory Hashing
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
CS 245: Database System Principles
Index tuning Hash Index.
Database Design and Programming
(a) Insert key = 32 n=
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
CPSC-608 Database Systems
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
CS4432: Database Systems II
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9

Linear hashing Another dynamic hashing scheme Two ideas: (a) Use i low order bits of hash grows b i (b) File grows linearly 2

Example b=4 bits, i =2, 2 keys/bucket m = 01 (max used block) Future growth buckets 3

Example b=4 bits, i =2, 2 keys/bucket m = 01 (max used block) Future growth buckets 4

Example b=4 bits, i =2, 2 keys/bucket m = 01 (max used block) Future growth buckets If h(k)[i ]  m, then look at bucket h(k)[i ] Rule 5

Example b=4 bits, i =2, 2 keys/bucket m = 01 (max used block) Future growth buckets If h(k)[i ]  m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2 i -1 Rule 6

Example b=4 bits, i =2, 2 keys/bucket m = 01 (max used block) Future growth buckets 0101 can have overflow chains! insert 0101 If h(k)[i ]  m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2 i -1 Rule 7

Note In textbook, n is used instead of m n=m m = 01 (max used block) Future growth buckets n=10 8

Example b=4 bits, i =2, 2 keys/bucket m = 01 (max used block) Future growth buckets 10 9

Example b=4 bits, i =2, 2 keys/bucket m = 01 (max used block) Future growth buckets

Example b=4 bits, i =2, 2 keys/bucket m = 01 (max used block) Future growth buckets insert

Example b=4 bits, i =2, 2 keys/bucket m = 01 (max used block) Future growth buckets insert

Example b=4 bits, i =2, 2 keys/bucket m = 01 (max used block) Future growth buckets insert

Example Continued: How to grow beyond this? m = 11 (max used block) i =

Example Continued: How to grow beyond this? m = 11 (max used block) i =

Example Continued: How to grow beyond this? m = 11 (max used block) i =

Example Continued: How to grow beyond this? m = 11 (max used block) i =

Example Continued: How to grow beyond this? m = 11 (max used block) i =

If U > threshold then increase m (and maybe i )  When do we expand file? Keep track of: # used slots total # of slots = U 19

Linear Hashing Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing Summary + + Can still have overflow chains - 20

Example: BAD CASE Very full Very emptyNeed to move m here… Would waste space… 21

Hashing - How it works - Dynamic hashing - Extensible - Linear Summary 22

Next: Indexing vs. hashing Index definition in SQL Multiple key access 23

Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 Indexing vs. Hashing 24

INDEXING (including B-trees) good for range searches : e.g., SELECT FROM R WHERE R.A > 5 Indexing vs. Hashing 25

Index definition in SQL Create index name on rel (attr) Create unique index name on rel (attr) defines candidate key Drop INDEX name 26

CANNOT SPECIFY TYPE OF INDEX (e.g. B-tree, hashing, …) OR PARAMETERS (e.g. load factor, size of hash, …) … at least not in SQL … Note 27

ATTRIBUTE LIST  MULTIKEY INDEX (next) e.g., CREATE INDEX foo ON R(A,B,C) Note 28

Motivation: Find records where DEPT = “Toy” AND SAL > 50k Multi-key Index 29

Strategy I: Use one index, say Dept. Get all Dept = “Toy” records and check their salary I1I1 30

Use 2 indexes; manipulate pointers ToySal > 50k Strategy II: 31

Multiple key index One idea: Strategy III: I1I1 I 21 I 22 32

Example Record Dept Index Salary Index Name=Joe DEPT=Sales SAL=15k Art Sales Toy 10k 15k 17k 21k 12k 15k 19k 33

For which queries is this index good? Find RECs Dept = “Sales” SAL=20k Find RECs Dept = “Sales” SAL > 20k Find RECs Dept = “Sales” Find RECs SAL = 20k 34

Interesting application: Geographic data DATA: x y... 35

Queries: What city is at ? What is within 5 miles from ? Which is closest point to ? 36

h n b i a c o d Example e g f m l k j 37

h n b i a c o d Example e g f m l k j

h n b i a c o d Example e g f m l k j

h n b i a c o d Example e g f m l k j

h n b i a c o d Example e g f m l k j h i a b c d e f g n o m l j k 41

h n b i a c o d Example e g f m l k j h i a b c d e f g n o m l j k Search points near f Search points near b 42

Queries Find points with Y i > 20 Find points with X i < 5 Find points “close” to i = Find points “close” to b = 43

Many types of geographic index structures have been suggested –Quad trees –R trees 44

Two more types of multi key indexes Grid Partitioned hash 45

Grid Index Key 2 X 1 X 2 …… X n V 1 V 2 Key 1 V n To records with key 1 =V 3, key 2 =X 2 46

CLAIM Can quickly find records with –key 1 = V i  Key 2 = X j –key 1 = V i –key 2 = X j And also ranges… –E.g., key 1  V i  key 2 < X j 47

 But there is a catch with grid indexes! 48

 But there is a catch with grid indexes! How is grid index stored on disk? Like Arra y... X1X2X3X4X1X2X3X4X1X2X3X4 V1V2V3V1V2V3 49

 But there is a catch with grid indexes! How is grid index stored on disk? Like Arra y... X1X2X3X4 X1X2X3X4X1X2X3X4 V1V2V3V1V2V3 Problem: Need regularity so we can compute position of entry 50

Solution: Use Indirection Buckets V 1 V 2 V 3 * Grid only V 4 contains pointers to buckets Buckets -- X1 X2 X3 51

With indirection: Grid can be regular without wasting space We do have price of indirection 52

Can also index grid on value ranges SalaryGrid Linear Scale 123 ToySalesPersonnel 0-20K1 20K-50K2 50K

Grid files Good for multiple-key search Space, management overhead (nothing is free) Need partitioning ranges that evenly split keys

Idea: Key 1 Key 2 Partitioned hash function h1h

h1(toy)=0000 h1(sales)=1001 h1(art)= h2(10k)=01100 h2(20k)=11101 h2(30k)=01110 h2(40k)=00111., EX: Insert 56

h1(toy)=0000 h1(sales)=1001 h1(art)= h2(10k)=01100 h2(20k)=11101 h2(30k)=01110 h2(40k)=00111., EX: Insert 57

h1(toy)=0000 h1(sales)=1001 h1(art)= h2(10k)=01100 h2(20k)=11101 h2(30k)=01110 h2(40k)= Find Emp. with Dept. = Sales  Sal=40k 58

h1(toy)=0000 h1(sales)=1001 h1(art)= h2(10k)=01100 h2(20k)=11101 h2(30k)=01110 h2(40k)= Find Emp. with Sal=30k look here! 59

h1(toy)=0000 h1(sales)=1001 h1(art)= h2(10k)=01100 h2(20k)=11101 h2(30k)=01110 h2(40k)= Find Emp. with Dept. = Sales look here! 60

Hashing discussion: - Indexing vs. hashing - SQL index definition - Multiple key access - Multi-key index Variations: grid, geo data - Partitioned hash Summary 61