Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lecture 20: Indexes Friday, February 25, 2005. 2 Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)

Similar presentations


Presentation on theme: "1 Lecture 20: Indexes Friday, February 25, 2005. 2 Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)"— Presentation transcript:

1 1 Lecture 20: Indexes Friday, February 25, 2005

2 2 Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)

3 3 Representing Data Elements Note: will cover VERY LITTLE in class READING ASSIGNMENT: Chapter 12. Disk Blocks = fixed size Table record = variable size Issue: how to group records into blocks

4 4 Representing Data Elements A record is always smaller than a block –Use BLOBs when needed Pack multiple records into blocks –But initially don’t fill (why ?) Cross block bounderies ? –Advantage: better space utilization –Disadvantage: slower access

5 5 BLOB Binary large objects Supported by modern database systems E.g. images, sounds, etc. Storage: attempt to cluster blocks together CLOB = character large objec Supports only restricted operations

6 Indexes An index on a file speeds up selections on the search key fields for the index. –Any subset of the fields of a relation can be the search key for an index on the relation. –Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation). An index contains a collection of data entries, and supports efficient retrieval of all data entries with a given key value k.

7 7 Index Classification Primary/secondary –Primary = may reorder data according to index –Secondary = cannot reorder data Clustered/unclustered –Clustered = records close in the index are close in the data –Unclustered = records close in the index may be far in the data Dense/sparse –Dense = every key in the data appears in the index –Sparse = the index contains only some keys B+ tree / Hash table / …

8 8 Primary Index File is sorted on the index attribute Dense index: sequence of (key,pointer) pairs 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80

9 9 Primary Index Sparse index 10 30 50 70 90 110 130 150 10 20 30 40 50 60 70 80

10 10 Primary Index with Duplicate Keys Dense index: 10 20 30 40 50 60 70 80 10 20 30 40

11 11 Primary Index with Duplicate Keys Sparse index: pointer to lowest search key in each block: Search for 20 10 20 30 10 20 30 40 20 is here......but need to search here too

12 12 Better: pointer to lowest new search key in each block: Search for 20 Search for 15 ? 35 ? Primary Index with Duplicate Keys 10 20 30 40 50 60 70 80 10 20 30 40 50 20 is here......ok to search from here 30

13 13 Secondary Indexes To index other attributes than primary key Always dense (why ?) 10 20 30 20 30 20 10 20 10 30

14 14 Clustered/Unclustered Primary indexes = usually clustered Secondary indexes = usually unclustered

15 Clustered vs. Unclustered Index Data entries ( Index File ) ( Data file ) Data Records Data entries Data Records CLUSTERED UNCLUSTERED

16 16 Secondary Indexes Applications: –index other attributes than primary key –index unsorted files (heap files) –index clustered data

17 17 B+ Trees Search trees Idea in B Trees: –make 1 node = 1 block Idea in B+ Trees: –Make leaves into a linked list (range queries are easier)

18 18 Parameter d = the degree Each node has >= d and <= 2d keys (except root) Each leaf has >=d and <= 2d keys: B+ Trees Basics 30120240 Keys k < 30 Keys 30<=k<120 Keys 120<=k<240Keys 240<=k 405060 40 5060 Next leaf

19 19 B+ Tree Example 80 2060100120140 101518203040506065808590 101518203040506065808590 d = 2 Find the key 40 40  80 20 < 40  60 30 < 40  40

20 20 B+ Tree Design How large d ? Example: –Key size = 4 bytes –Pointer size = 8 bytes –Block size = 4096 byes 2d x 4 + (2d+1) x 8 <= 4096 d = 170

21 21 Searching a B+ Tree Exact key values: –Start at the root –Proceed down, to the leaf Range queries: –As above –Then sequential traversal Select name From people Where age = 25 Select name From people Where age = 25 Select name From people Where 20 <= age and age <= 30 Select name From people Where 20 <= age and age <= 30

22 B+ Trees in Practice Typical order: 100. Typical fill-factor: 67%. –average fanout = 133 Typical capacities: –Height 4: 133 4 = 312,900,700 records –Height 3: 133 3 = 2,352,637 records Can often hold top levels in buffer pool: –Level 1 = 1 page = 8 Kbytes –Level 2 = 133 pages = 1 Mbyte –Level 3 = 17,689 pages = 133 MBytes

23 23 Insertion in a B+ Tree Insert (K, P) Find leaf where K belongs, insert If no overflow (2d keys or less), halt If overflow (2d+1 keys), split node, insert in parent: If leaf, keep K3 too in right node When root splits, new root has 1 key only K1K2K3K4K5 P0P1P2P3P4p5 K1K2 P0P1P2 K4K5 P3P4p5 parent K3 parent

24 24 Insertion in a B+ Tree 80 2060100120140 101518203040506065808590 101518203040506065808590 Insert K=19

25 25 Insertion in a B+ Tree 80 2060100120140 10151819203040506065808590 10151820304050606580859019 After insertion

26 26 Insertion in a B+ Tree 80 2060100120140 10151819203040506065808590 10151820304050606580859019 Now insert 25

27 27 Insertion in a B+ Tree 80 2060100120140 1015181920253040506065808590 10151820253040606580859019 After insertion 50

28 28 Insertion in a B+ Tree 80 2060100120140 1015181920253040506065808590 10151820253040606580859019 But now have to split ! 50

29 29 Insertion in a B+ Tree 80 203060100120140 1015181920256065808590 10151820253040606580859019 After the split 50 304050

30 30 Deletion from a B+ Tree 80 203060100120140 1015181920256065808590 10151820253040606580859019 Delete 30 50 304050

31 31 Deletion from a B+ Tree 80 203060100120140 1015181920256065808590 101518202540606580859019 After deleting 30 50 4050 May change to 40, or not

32 32 Deletion from a B+ Tree 80 203060100120140 1015181920256065808590 101518202540606580859019 Now delete 25 50 4050

33 33 Deletion from a B+ Tree 80 203060100120140 10151819206065808590 1015182040606580859019 After deleting 25 Need to rebalance Rotate 50 4050

34 34 Deletion from a B+ Tree 80 193060100120140 10151819206065808590 1015182040606580859019 Now delete 40 50 4050

35 35 Deletion from a B+ Tree 80 193060100120140 10151819206065808590 10151820606580859019 After deleting 40 Rotation not possible Need to merge nodes 50

36 36 Deletion from a B+ Tree 80 1960100120140 1015181920506065808590 10151820606580859019 Final tree 50


Download ppt "1 Lecture 20: Indexes Friday, February 25, 2005. 2 Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)"

Similar presentations


Ads by Google