1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.

Slides:



Advertisements
Similar presentations
B+-Trees and Hashing Techniques for Storage and Index Structures
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Tutorial 8 CSI 2132 Database I. Exercise 1 Both disks and main memory support direct access to any desired location (page). On average, main memory accesses.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
1 Tree-Structured Indexes Module 4, Lecture 4. 2 Introduction As for any index, 3 alternatives for data entries k* : 1. Data record with key value k 2.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
CS4432: Database Systems II
Hash Table indexing and Secondary Storage Hashing.
B+-tree and Hashing.
External Memory Hashing. Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key.
1 Lecture 18: Indexes Monday, November 10, Midterm Problem 1a: select student.sname, avg(takes.grade) from student, takes where student.sid =
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
Primary Indexes Dense Indexes
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
1 B+ Trees. 2 Tree-Structured Indices v Tree-structured indexing techniques support both range searches and equality searches. v ISAM : static structure;
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Storage and Indexing February 26 th, 2003 Lecture 19.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Adapted from Mike Franklin
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
1 CSE 326: Data Structures Trees. 2 Today: Splay Trees Fast both in worst-case amortized analysis and in practice Are used in the kernel of NT for keep.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
CS411 Database Systems Kazuhiro Minami 10: Indexing-1.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Lecture 21: Hash Tables Monday, February 28, 2005.
Database Management Systems (CS 564)
B+-Trees and Static Hashing
External Memory Hashing
Lecture 21: Indexes Monday, November 13, 2000.
Lecture 19: Data Storage and Indexes
CSE 544: Lectures 13 and 14 Storing Data, Indexes
Lecture 21: B-Trees Monday, Nov. 19, 2001.
Lecture 6: Data Storage and Indexes
Adapted from Mike Franklin
Lecture 28: Index 3 B+ Trees
Database Design and Programming
CSE 544: Lecture 11 Storing Data, Indexes
CPS216: Advanced Database Systems
Storage and Indexing.
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
Monday, 5/13/2002 Hash table indexes, query optimization
Wednesday, 5/8/2002 Hash table indexes, physical operators
Indexing February 28th, 2003 Lecture 20.
Lecture 11: B+ Trees and Query Execution
Lecture 20: Indexes Monday, February 27, 2006.
CS4433 Database Systems Indexing.
Presentation transcript:

1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003

2 Outline B-trees (13.3) Hash-tables (13.4)

3 B+ Trees Search trees Idea in B Trees: –make 1 node = 1 block Idea in B+ Trees: –Make leaves into a linked list (range queries are easier)

4 Parameter d = the degree Each node has >= d and <= 2d keys (except root) Each leaf has >=d and <= 2d keys: B+ Trees Basics Keys k < 30 Keys 30<=k<120 Keys 120<=k<240Keys 240<=k Next leaf

5 B+ Tree Example d = 2 Find the key  < 40  < 40  40

6 B+ Tree Design How large d ? Example: –Key size = 4 bytes –Pointer size = 8 bytes –Block size = 4096 byes 2d x 4 + (2d+1) x 8 <= 4096 d = 170

7 Searching a B+ Tree Exact key values: –Start at the root –Proceed down, to the leaf Range queries: –As above –Then sequential traversal Select name From people Where age = 25 Select name From people Where age = 25 Select name From people Where 20 <= age and age <= 30 Select name From people Where 20 <= age and age <= 30

B+ Trees in Practice Typical order: 100. Typical fill-factor: 67%. –average fanout = 133 Typical capacities: –Height 4: = 312,900,700 records –Height 3: = 2,352,637 records Can often hold top levels in buffer pool: –Level 1 = 1 page = 8 Kbytes –Level 2 = 133 pages = 1 Mbyte –Level 3 = 17,689 pages = 133 MBytes

9 Insertion in a B+ Tree Insert (K, P) Find leaf where K belongs, insert If no overflow (2d keys or less), halt If overflow (2d+1 keys), split node, insert in parent: If leaf, keep K3 too in right node When root splits, new root has 1 key only K1K2K3K4K5 P0P1P2P3P4p5 K1K2 P0P1P2 K4K5 P3P4p5 parent K3 parent

10 Insertion in a B+ Tree Insert K=19

11 Insertion in a B+ Tree After insertion

12 Insertion in a B+ Tree Now insert 25

13 Insertion in a B+ Tree After insertion 50

14 Insertion in a B+ Tree But now have to split ! 50

15 Insertion in a B+ Tree After the split

16 Deletion from a B+ Tree Delete

17 Deletion from a B+ Tree After deleting May change to 40, or not

18 Deletion from a B+ Tree Now delete

19 Deletion from a B+ Tree After deleting 25 Need to rebalance Rotate

20 Deletion from a B+ Tree Now delete

21 Deletion from a B+ Tree After deleting 40 Rotation not possible Need to merge nodes 50

22 Deletion from a B+ Tree Final tree 50

23 In Class Suppose the B+ tree has depth 4 and degree d=200 How many records does the relation have (maximum) ? How many index blocks do we need to read and/or write during: –A key lookup –An insertion –A delection

24 Hash Tables Secondary storage hash tables are much like main memory ones Recall basics: –There are n buckets –A hash function f(k) maps a key k to {0, 1, …, n-1} –Store in bucket f(k) a pointer to record with key k Secondary storage: bucket = block, use overflow blocks when needed

25 Assume 1 bucket (block) stores 2 keys + pointers h(e)=0 h(b)=h(f)=1 h(g)=2 h(a)=h(c)=3 Hash Table Example e b f g a c

26 Search for a: Compute h(a)=3 Read bucket 3 1 disk access Searching in a Hash Table e b f g a c

27 Place in right bucket, if space E.g. h(d)=2 Insertion in Hash Table e b f g d a c

28 Create overflow block, if no space E.g. h(k)=1 More over- flow blocks may be needed Insertion in Hash Table e b f g d a c k

29 Hash Table Performance Excellent, if no overflow blocks Degrades considerably when number of keys exceeds the number of buckets (I.e. many overflow blocks).

30 Extensible Hash Table Allows has table to grow, to avoid performance degradation Assume a hash function h that returns numbers in {0, …, 2 k – 1} Start with n = 2 i << 2 k, only look at first i most significant bits

31 Extensible Hash Table E.g. i=1, n=2 i =2, k=4 Note: we only look at the first bit (0 or 1) 0(010) 1(011) i=

32 Insertion in Extensible Hash Table Insert (010) 1(011) 1(110) i=

33 Insertion in Extensible Hash Table Now insert 1010 Need to extend table, split blocks i becomes 2 0(010) 1(011) 1(110), 1(010) i=

34 Insertion in Extensible Hash Table 0(010) 10(11) 10(10) i= (10) 2

35 Insertion in Extensible Hash Table Now insert 0000, then 0101 Need to split block 0(010) 0(000), 0(101) 10(11) 10(10) i= (10) 2

36 Insertion in Extensible Hash Table After splitting the block 00(10) 00(00) 10(11) 10(10) i= (10) 2 01(01) 2

37 Extensible Hash Table How many buckets (blocks) do we need to touch after an insertion ? How many entries in the hash table do we need to touch after an insertion ?

38 Performance Extensible Hash Table No overflow blocks: access always one read BUT: –Extensions can be costly and disruptive –After an extension table may no longer fit in memory

39 Linear Hash Table Idea: extend only one entry at a time Problem: n= no longer a power of 2 Let i be such that 2 i <= n < 2 i+1 After computing h(k), use last i bits: –If last i bits represent a number > n, change msb from 1 to 0 (get a number <= n)

40 Linear Hash Table Example n=3 (01)00 (11)00 (10)10 i= (01)11 BIT FLIP

41 Linear Hash Table Example Insert 1000: overflow blocks… (01)00 (11)00 (10)10 i= (01)11 (10)00

42 Linear Hash Tables Extension: independent on overflow blocks Extend n:=n+1 when average number of records per block exceeds (say) 80%

43 Linear Hash Table Extension From n=3 to n=4 Only need to touch one block (which one ?) (01)00 (11)00 (10)10 i= (01)11 i= (10)10 (01)00 (11)00 n=11

44 Linear Hash Table Extension From n=3 to n=4 finished Extension from n=4 to n=5 (new bit) Need to touch every single block (why ?) (01)11 i= (10)10 (01)00 (11)00 11