Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Similar presentations


Presentation on theme: "1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:"— Presentation transcript:

1 1 Indexing

2 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query: Find all locations that sell beer for 2.75. Select addr From Sells, Bars Where (Sells.bar = Bars.bar) and (Sells.price = 2.75)

3 3 Sells Bars JOIN Sells.bar = Bars.bar PROJECT Bars.addr SELECT Sells.price = 2.75 Sells(bar,beer,price ) Bars(bar,addr ) Joe’sBud2.50 Joe’sMaple St. Joe’sMiller2.75 Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00

4 4 Sells Bars JOIN Sells.bar = Bars.bar PROJECT Bars.addr SELECT Sells.price = 2.75 Can speed up execution if we have an index on price Sells(bar,beer,price ) Bars(bar,addr ) Joe’sBud2.50 Joe’sMaple St. Joe’sMiller2.75 Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00

5 5 Example of Using an Index

6 6 Index should be on disk; here’s a disk-based hash index example

7 The whole table could be organized as an index; here’s a table organized as a hash 7

8 Summary Indexes help searching for information far more efficiently –speed up query execution Hash indexes –have to be disk based –can be outside the table, pointing into records in table –or the whole table can be organized as a hash index Great for equality search Not so great for range search 8

9 9 Range Search Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query: Find all locations that sell beer between 2.75 and 3 Select addr From Sells, Bars Where (Sells.bar = Bars.bar) and (Sells.price >= 2.75) and (Sells.price <= 3)

10 Hashes are not well suited for range search 10

11 A better idea: organize table as a sorted file 11

12 We need more than just a sorted file: B+ trees 12

13 The B+ tree index can be outside the table 13

14 14 A larger B+ tree example; here leaves point to where the data records reside 80 2060100120140 101518203040506065808590 101518203040506065808590 Note how the leaves are chained together, to help range search

15 15 Parameter d = the degree Each node has >= d and <= 2d keys (except root) –if a node has n keys, then it has (n+1) pointers to lower-level nodes Each leaf has >=d and <= 2d keys –if a leaf has n keys, then it has n pointers to the data records with those keys –and also a pointer to the next leaf (to facilitate range query answering) B+ Trees Basics

16 16 B+ Tree Example 80 2060100120140 101518203040506065808590 101518203040506065808590 d = 2 20 <= k < 6060 <= k pointer to data records each non-root node must have at least 2 keys k < 20 data records

17 17 B+ Tree Design How large is d ? Example: –Key size = 4 bytes –Pointer size = 8 bytes –Block size = 4096 byes 2d x 4 + (2d+1) x 8 <= 4096 d = 170

18 18 Searching a B+ Tree Equality queries (i.e., given exact key values): –Start at the root –Proceed down, to the leaf Range queries: –As above –Then sequential traversal Select name From people Where age = 18 Select name From people Where 20 <= age and age <= 65

19 19 Searching a B+ Tree 80 2060100120140 101518203040506065808590 101518203040506065808590

20 20

21 21 B+ Trees in Practice Typical order: 100. Typical fill-factor: 67%. –so average fanout = 200*0.67% = 133 Typical capacities: –Height 4: 133 4 = 312,900,700 records –Height 3: 133 3 = 2,352,637 records Can often hold top levels in buffer pool: –Level 1 = 1 page = 8 Kbytes –Level 2 = 133 pages = 1 Mbyte –Level 3 = 17,689 pages = 133 MBytes

22 22

23 Sample Exam Question Assume a file which has 950 thousands (that is, 950000) records. Assume also that we are indexing this file using a B+ tree. In this particular B+ tree, the average page has an occupancy of 100 pointers (that is, the tree's average branching factor is 100). Assume further that the amount of memory set aside for storing the index is 150 blocks, and that all 950000 records of the above file reside on disk. Assume no duplicate search key exists, given a search key K, compute the minimal number of disk I/O needed to retrieve the record with that search key. Explain your answer. 23

24 Solution Assume each leaf node points to data records –so leaf nodes do not contain data records We have 950,000 records, so need 950,000 pointers from leaf nodes Each leaf node can point to 100 records, so need 9,500 leaf nodes So tree has three levels: root at level 0, 100 nodes at level 1, 10000 nodes at level 2 Have 150 memory pages for index, so can store root node + all of level 1 and some of level 2 So minimal cost is 1 24

25 Some Math Assume tree has fanout of F (that is, each non- leaf node has F pointers to lower-level nodes) Assume tree has N leaf pages Then the number of levels in tree is log_F(N) So each insert/delete/lookup costs roughly log_F(N) –assuming the entire tree is on disk This assumes that tree will stay height-balanced Insert/delete must ensure that tree stay height- balanced 25

26 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke26 To explain insert/delete, we will use the following B+ tree, with a slightly different notation Root 1724 30 2* 3*5* 7*14*16* 19*20*22*24*27* 29*33*34* 38* 39* 13 2* means a pair of (key, pointer to data record) note how pointers are specified using a different notation here

27 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke27 Inserting a Data Entry into a B+ Tree  Find correct leaf L.  Put data entry onto L.  If L has enough space, done !  Else, must split L (into L and a new node L2) Redistribute entries: d in L, (d+1) in L2, copy up middle key. Insert index entry pointing to L2 into parent of L.  Then if parent of L needs to be split up, do so; this can happen all the way to the root  To split a non-leaf node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.)  Splits “grow” tree; root split increases height.  Tree growth: gets wider or one level taller at top.

28 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke28 Insert 8* Root 1724 30 2* 3*5* 7*14*16* 19*20*22*24*27* 29*33*34* 38* 39* 13 first insert 8* in here see that node is full; so have to split 2* 3*5* 7* 8* 5 Entry to be inserted in parent node. (Note that 5 is continues to appear in the leaf.) s copied up and

29 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke29 2*3* 14*16* 19*20*22*24*27* 29*33*34* 38* 39* 7*5*8* 1724 30 13 insert 5 in here, but now this node is full 5 appears once in the index. Contrast 52430 17 13 Entry to be inserted in parent node. (Note that 17 is pushed up and only this with a leaf split.)

30 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke30 Example B+ Tree After Inserting 8*  Notice that root was split, leading to increase in height. 2*3* Root 17 24 30 14*16* 19*20*22*24*27* 29*33*34* 38* 39* 135 7*5*8*

31 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke31 Note on Inserting 8* into Example B+ Tree  Observe how minimum occupancy is guaranteed in both leaf and index pg splits.  Note difference between copy- up and push-up ; be sure you understand the reasons for this. 2* 3*5* 7* 8* 5 Entry to be inserted in parent node. (Note that 5 is continues to appear in the leaf.) s copied up and appears once in the index. Contrast 52430 17 13 Entry to be inserted in parent node. (Note that 17 is pushed up and only this with a leaf split.)

32 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke32 Deleting a Data Entry from a B+ Tree  Start at root, find leaf L where entry belongs.  Remove the entry.  If L is at least half-full, done!  If L has only d-1 entries, Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). will have to change parent a bit (see example) If re-distribution fails, merge L and sibling.  If merge occurred, must delete entry (pointing to L or sibling) from parent of L.  Merge could propagate to root, decreasing height.

33 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke33 Deleting 19* 2*3* Root 17 24 30 14*16* 20*22*24*27* 29*33*34* 38* 39* 135 7*5*8* Just remove node, doesn’t change tree 2*3* Root 17 24 30 14*16* 19*20*22*24*27* 29*33*34* 38* 39* 135 7*5*8*

34 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke34 Now Deleting 20* 2*3* 17 24 30 14*16* 20*22*24*27* 29*33*34* 38* 39* 135 7*5*8* Must borrow from a sibling if possible 2*3* 17 30 14*16* 33*34* 38* 39* 135 7*5*8*22*24* 27 27*29* Notice how the middle key is copied up

35 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke35 Now Deleting 24* 2*3* 17 30 14*16* 33*34* 38* 39* 135 7*5*8*22*24* 27 27*29* Can’t borrow from any sibling; must merge 2* 3* 7* 14*16* 22* 27* 29* 33*34* 38* 39* 5*8* 30 135 17

36 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke36... Deleting 24*  Must merge.  Observe ` toss ’ of index entry (on right), and ` pull down ’ of index entry (below). 30 22*27* 29*33*34* 38* 39* 2* 3* 7* 14*16* 22* 27* 29* 33*34* 38* 39* 5*8* Root 30 135 17

37 Comparison Hash B+ tree 37

38 Types of Indexes clustered vs. non-clustered primary vs. secondary 38

39 39 A non-clustered index

40 A clustered index 40

41 A clustered index 41

42 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke42 Indexes over composite search keys  Composite Search Keys : Search on a combination of fields.  Equality query: Every field value is equal to a constant value. E.g. wrt index: age=20 and sal =75  Range query: Some field value is not a constant. E.g.: age =20; or age=20 and sal > 10 sue1375 bob cal joe12 10 20 8011 12 nameagesal 12,20 12,10 11,80 13,75 20,12 10,12 75,13 80,11 11 12 13 10 20 75 80 Data records sorted by name Data entries in index sorted by Data entries sorted by Examples of composite key indexes using lexicographic order.


Download ppt "1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:"

Similar presentations


Ads by Google