B+-trees and Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: page = b bytes or B records (or block) If r is the size of a.

B+-trees and Hashing

Model of Computation Data stored on disk(s) Minimum transfer unit: page = b bytes or B records (or block) If r is the size of a record, then B=b/r N records -> N/B pages I/O complexity: in number of pages CPUMemory Disk

I/O complexity An ideal index has space O(N/B), update overhead O(1) or O(log B (N/B)) and search complexity O(a/B) or O(log B (N/B) + a/B) where a is the number of records in the answer But, sometimes CPU performance is also important… minimize cache misses -> don ’ t waste CPU cycles

Buffer manager- Context Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB

Buffer Management Keep pages in a part of memory (buffer), read directly from there What happens if you need to bring a new page into buffer and buffer is full: you have to evict one page Replacement policy: LRU : Least Recently Used (CLOCK) MRU: Most Recently Used Toss-immediate : remove a page if you know that you will not need it again Pinning (needed in recovery, index based processing,etc) Other DB specific RPs: DBMIN, LRU-k, 2Q

Buffer Management in a DBMS Data must be in RAM for DBMS to operate on it! Buffer Mgr hides the fact that not all data is in RAM DB MAIN MEMORY DISK disk page free frame Page Requests from Higher Levels BUFFER POOL choice of frame dictated by replacement policy

When a Page is Requested... Buffer pool information table contains: If requested page is not in pool and no free frame: Choose a frame for replacement. Only “ un-pinned ” pages are candidates! If frame is “ dirty ”, write it to disk Read requested page into chosen frame Pin the page and return its address. * If requests can be predicted (e.g., sequential scans) pages can be pre-fetched several pages at a time!

More on Buffer Management Requestor of page must eventually unpin it, and indicate whether page has been modified: dirty bit is used for this. Page in pool may be requested many times, a pin count is used. To pin a page, pin_count++ A page is a candidate for replacement iff pin count == 0 ( “ unpinned ” ) CC & recovery may entail additional I/O when a frame is chosen for replacement. Write-Ahead Log protocol; more later!

Buffer Replacement Policy Frame is chosen for replacement by a replacement policy: Least-recently-used (LRU), MRU, Clock, etc. Policy can have big impact on # of I/O ’ s; depends on the access pattern.

LRU Replacement Policy Least Recently Used (LRU) for each page in buffer pool, keep track of time when last unpinned replace the frame which has the oldest (earliest) time very common policy: intuitive and simple Works well for repeated accesses to popular pages Problems?

“ Clock ” Replacement Policy An approximation of LRU Arrange frames into a cycle, store one reference bit per frame Can think of this as the 2nd chance bit When pin count reduces to 0, turn on ref. bit When replacement necessary do for each page in cycle { if (pincount == 0 && ref bit is on) turn off ref bit; else if (pincount == 0 && ref bit is off) choose this page for replacement; } until a page is chosen; A(1) B(p) C(1) D(1)

B+-tree Records must be ordered over an attribute, SSN, Name, etc. Queries: exact match and range queries over the indexed attribute: “ find the name of the student with ID=087-34-7892 ” or “ find all students with gpa between 3.00 and 3.5 ”

B+-tree:properties Insert/delete at log F (N/B) cost; keep tree height-balanced. (F = fanout) Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries/pointers. Two types of nodes: index nodes and data nodes; each node is 1 page (disk based method)

Example Root 100 120 150 180 30 40 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 41 55

57 81 95 to keys to keysto keys to keys < 5757  k<8181  k<95 95  Index node

Data node 57 81 95 To record with key 57 To record with key 81 To record with key 85 From non-leaf node to next leaf in sequence Struct { Key real; Pointr *long; } entry; Node entry[B];

Insertion Find correct leaf L. Put data entry onto L. If L has enough space, done! Else, must split L (into L and a new node L2) Redistribute entries evenly, copy up middle key. Insert index entry pointing to L2 into parent of L. This can happen recursively To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.) Splits “ grow ” tree; root split increases height. Tree growth: gets wider or one level taller at top.

Deletion Start at root, find leaf L where entry belongs. Remove the entry. If L is at least half-full, done! If L has only d-1 entries, Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). If re-distribution fails, merge L and sibling. If merge occurred, must delete entry (pointing to L or sibling) from parent of L. Merge could propagate to root, decreasing height.

Characteristics Optimal method for 1-d range queries: Space: O(N/B), Updates: O(log B (N/B)), Query:O(log B (N/B) + a/B) Space utilization: 67% for random input B-tree variation: index nodes store also pointers to records

Example Root 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 Range[32, 160]

Other issues Internal node architecture [Lomet01]: Reduce the overhead of tree traversal. Prefix compression: In index nodes store only the prefix that differentiate consecutive sub-trees. Fanout is increased. Cache sensitive B+-tree Place keys in a way that reduces the cache faults during the binary search in each node. Eliminate pointers so a cache line contains more keys for comparison.

Exact Match Queries Yes! Hashing static hashing dynamic hashing B+-tree is perfect, but.... to answer a selection query (ssn=10) needs to traverse a full path. In practice, 3-4 block accesses (depending on the height of the tree, buffering) Any better approach?

Hashing Hash-based indexes are best for equality selections. Cannot support range searches. Static and dynamic hashing techniques exist.

Static Hashing # primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed. h(k) MOD N= bucket to which data entry with key k belongs. (N = # of buckets) h(key) mod N h key Primary bucket pages Overflow pages 1 0 N-1

Static Hashing (Contd.) Buckets contain data entries. Hash fn works on search key field of record r. Use its value MOD N to distribute values over range 0... N-1. h(key) = (a * key + b) usually works well. a and b are constants; lots known about how to tune h. Long overflow chains can develop and degrade performance. Extendible and Linear Hashing: Dynamic techniques to fix this problem.

Linear Hashing A dynamic hashing scheme that handles the problem of long overflow chains without using a directory. Directory avoided in LH by using temporary overflow pages, and choosing the bucket to split in a round-robin fashion. When any bucket overflows split the bucket that is currently pointed to by the “Next” pointer and then increment that pointer to the next bucket.

Linear Hashing – The Main Idea Use a family of hash functions h 0, h 1, h 2,... h L (key) = h(key) mod(2 L N) N = initial # buckets h is some hash function L level, starts at 0 Each time we use at most two hash functions: h L (key) and h L+1 (key) h L+1 doubles the range of h L

Linear Hashing (Contd.) Algorithm proceeds in `rounds’. Current round number is L. There are N L (= N * 2 L ) buckets at the beginning of a round Buckets 0 to Next-1 have been split; Next to N L have not been split yet this round. Round ends when all initial buckets have been split (i.e. Next = N L ). To start next round: L++; Next = 0;

Linear Hashing - Insert Find appropriate bucket If bucket to insert into is full: Add overflow page and insert data entry. Split Next bucket and increment Next. Note: This is likely NOT the bucket being inserted to!!! to split a bucket, create a new bucket and use h Level+1 to redistribute entries. Since buckets are split round-robin, long overflow chains don’t develop!

Example: Insert 43 L=0, N=4 Next=0 PRIMARY PAGES Level=0 Next=1 PRIMARY PAGES OVERFLOW PAGES 44 36 32 25 95 1418 10 30 3135 11 7 44 36 32 25 9 5 1418 10 30 3135 11 7 43 h 0 (x) = x mod 4 h 1 (x) = x mod 8

Example: End of a Round 22* Next=3 Level=0, Next = 3 PRIMARY PAGES OVERFLOW PAGES 32* 9* 5* 14* 25* 66* 10* 18* 34* 35*31* 7* 11* 43* 44*36* 37*29* 30* 37* Next=0 PRIMARY PAGES OVERFLOW PAGES 32* 9*25* 66* 18* 10* 34* 35* 11* 44* 36* 5* 29* 43* 14* 30* 22* 31*7* 50* Insert 50 Level=1, Next = 0

LH Search Algorithm To find bucket for data entry r, find h L (r): If h L (r) >= Next (i.e., h L (r) is a bucket that hasn’t been involved in a split this round) then r belongs in that bucket for sure. Else, r could belong to bucket h L (r) or bucket h L (r) + N Level must apply h L+1 (r) to find out.

B+-trees and Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: page = b bytes or B records (or block) If r is the size of a.

Similar presentations

Presentation on theme: "B+-trees and Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: page = b bytes or B records (or block) If r is the size of a."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

B+-trees and Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: page = b bytes or B records (or block) If r is the size of a.

Similar presentations

Presentation on theme: "B+-trees and Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: page = b bytes or B records (or block) If r is the size of a."— Presentation transcript:

Similar presentations

About project

Feedback