1 Lecture 18: Indexes Monday, November 10, 2003. 2 Midterm Problem 1a: select student.sname, avg(takes.grade) from student, takes where student.sid =

1 Lecture 18: Indexes Monday, November 10, 2003

2 Midterm Problem 1a: select student.sname, avg(takes.grade) from student, takes where student.sid = takes.sid group by student.sid, student.sname

3 Midterm Problem 1b: Another solution: use “not in” select student.sname from student where 3.0 < all (select takes.sid from takes where student.sid = takes.sid)

4 Midterm Problem 1c: Another solution: “student.dept = ALL…” Most people got SQL right, or almost right select name from student where sid not in (select takes.sid from takes, courses where takes.cid = courses.cid and courses.dept <> student.dept)

5 Midterm Problem 2: Very few people got this detail right isa River Sea endsIn Water isa name

6 Midterm Problem 3a: Customer  Department Problem 3b: AC, BC, BD Most people got it right !

7 Midterm Problem 4a /db/website[document/author/text()="John Smith"]/url

8 Midterm Problem 4b This was the hard question: very few got it right Notice: the question had a second interpretation for $r in /db let $a = distinct-values( for $w in $r/website, $x in $w/document/auhtor/text() where count(distinct-values($w/document [author/text()=$x]/word/text())) > 5000 return $x ) for $z in $a return $z

9 Midterm Problem 4c This was easy, yet few got it right –Out of time ? Website(url ) Document(docID, url, author) Word(docID, word) Href(docID, href) Website(url ) Document(docID, url, author) Word(docID, word) Href(docID, href)

10 Outline Index structures (13.1, 13.2) B-trees (13.3) Administrative: Phase 2 of the project due Monday, 11/17 HW3 due on Friday, 11/21

Indexes An index on a file speeds up selections on the search key fields for the index. –Any subset of the fields of a relation can be the search key for an index on the relation. –Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation). An index contains a collection of data entries, and supports efficient retrieval of all data entries with a given key value k.

12 Index Classification Primary/secondary –Primary = may reorder data according to index –Secondary = cannot reorder data Clustered/unclustered –Clustered = records close in the index are close in the data –Unclustered = records close in the index may be far in the data Dense/sparse –Dense = every key in the data appears in the index –Sparse = the index contains only some keys B+ tree / Hash table / …

13 Primary Index File is sorted on the index attribute Dense index: sequence of (key,pointer) pairs 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80

14 Primary Index Sparse index 10 30 50 70 90 110 130 150 10 20 30 40 50 60 70 80

15 Primary Index with Duplicate Keys Dense index: 10 20 30 40 50 60 70 80 10 20 30 40

16 Primary Index with Duplicate Keys Sparse index: pointer to lowest search key in each block: Search for 20 10 20 30 10 20 30 40 20 is here......but need to search here too

17 Better: pointer to lowest new search key in each block: Search for 20 Search for 15 ? 35 ? Primary Index with Duplicate Keys 10 20 30 40 50 60 70 80 10 20 30 40 50 20 is here......ok to search from here 30

18 Secondary Indexes To index other attributes than primary key Always dense (why ?) 10 20 30 20 30 20 10 20 10 30

19 Clustered/Unclustered Primary indexes = usually clustered Secondary indexes = usually unclustered

Clustered vs. Unclustered Index Data entries ( Index File ) ( Data file ) Data Records Data entries Data Records CLUSTERED UNCLUSTERED

21 Secondary Indexes Applications: –index other attributes than primary key –index unsorted files (heap files) –index clustered data

22 Applications of Secondary Indexes Secondary indexes needed for heap files Also for Clustered data: Company(name, city), Product(pid, maker) Select city From Company, Product Where name=maker and pid=“p045” Select city From Company, Product Where name=maker and pid=“p045” Select pid From Company, Product Where name=maker and city=“Seattle” Select pid From Company, Product Where name=maker and city=“Seattle” Company 1Company 2Company 3 Products of company 1 Products of company 2Products of company 3

Composite Search Keys Composite Search Keys: Search on a combination of fields. –Equality query: Every field value is equal to a constant value. E.g. wrt index: age=20 and sal =75 –Range query: Some field value is not a constant. E.g.: age =20; or age=20 and sal > 10 sue1375 bob cal joe12 10 20 8011 12 nameagesal 12,20 12,10 11,80 13,75 20,12 10,12 75,13 80,11 11 12 13 10 20 75 80 Data records sorted by name Data entries in index sorted by Data entries sorted by Examples of composite key indexes using lexicographic order.

24 B+ Trees Search trees Idea in B Trees: –make 1 node = 1 block Idea in B+ Trees: –Make leaves into a linked list (range queries are easier)

25 Parameter d = the degree Each node has >= d and <= 2d keys (except root) Each leaf has >=d and <= 2d keys: B+ Trees Basics 30120240 Keys k < 30 Keys 30<=k<120 Keys 120<=k<240Keys 240<=k 405060 40 5060 Next leaf

26 B+ Tree Example 80 2060100120140 101518203040506065808590 101518203040506065808590 d = 2 Find the key 40 40  80 20 < 40  60 30 < 40  40

27 B+ Tree Design How large d ? Example: –Key size = 4 bytes –Pointer size = 8 bytes –Block size = 4096 byes 2d x 4 + (2d+1) x 8 <= 4096 d = 170

28 Searching a B+ Tree Exact key values: –Start at the root –Proceed down, to the leaf Range queries: –As above –Then sequential traversal Select name From people Where age = 25 Select name From people Where age = 25 Select name From people Where 20 <= age and age <= 30 Select name From people Where 20 <= age and age <= 30

B+ Trees in Practice Typical order: 100. Typical fill-factor: 67%. –average fanout = 133 Typical capacities: –Height 4: 133 4 = 312,900,700 records –Height 3: 133 3 = 2,352,637 records Can often hold top levels in buffer pool: –Level 1 = 1 page = 8 Kbytes –Level 2 = 133 pages = 1 Mbyte –Level 3 = 17,689 pages = 133 MBytes

30 Insertion in a B+ Tree Insert (K, P) Find leaf where K belongs, insert If no overflow (2d keys or less), halt If overflow (2d+1 keys), split node, insert in parent: If leaf, keep K3 too in right node When root splits, new root has 1 key only K1K2K3K4K5 P0P1P2P3P4p5 K1K2 P0P1P2 K4K5 P3P4p5 parent K3 parent

31 Insertion in a B+ Tree 80 2060100120140 101518203040506065808590 101518203040506065808590 Insert K=19

32 Insertion in a B+ Tree 80 2060100120140 10151819203040506065808590 10151820304050606580859019 After insertion

33 Insertion in a B+ Tree 80 2060100120140 10151819203040506065808590 10151820304050606580859019 Now insert 25

34 Insertion in a B+ Tree 80 2060100120140 1015181920253040506065808590 10151820253040606580859019 After insertion 50

35 Insertion in a B+ Tree 80 2060100120140 1015181920253040506065808590 10151820253040606580859019 But now have to split ! 50

36 Insertion in a B+ Tree 80 203060100120140 1015181920256065808590 10151820253040606580859019 After the split 50 304050

37 Deletion from a B+ Tree 80 203060100120140 1015181920256065808590 10151820253040606580859019 Delete 30 50 304050

38 Deletion from a B+ Tree 80 203060100120140 1015181920256065808590 101518202540606580859019 After deleting 30 50 4050 May change to 40, or not

39 Deletion from a B+ Tree 80 203060100120140 1015181920256065808590 101518202540606580859019 Now delete 25 50 4050

40 Deletion from a B+ Tree 80 203060100120140 10151819206065808590 1015182040606580859019 After deleting 25 Need to rebalance Rotate 50 4050

41 Deletion from a B+ Tree 80 193060100120140 10151819206065808590 1015182040606580859019 Now delete 40 50 4050

42 Deletion from a B+ Tree 80 193060100120140 10151819206065808590 10151820606580859019 After deleting 40 Rotation not possible Need to merge nodes 50

43 Deletion from a B+ Tree 80 1960100120140 1015181920506065808590 10151820606580859019 Final tree 50

1 Lecture 18: Indexes Monday, November 10, 2003. 2 Midterm Problem 1a: select student.sname, avg(takes.grade) from student, takes where student.sid =

Similar presentations

Presentation on theme: "1 Lecture 18: Indexes Monday, November 10, 2003. 2 Midterm Problem 1a: select student.sname, avg(takes.grade) from student, takes where student.sid ="— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Lecture 18: Indexes Monday, November 10, 2003. 2 Midterm Problem 1a: select student.sname, avg(takes.grade) from student, takes where student.sid =

Similar presentations

Presentation on theme: "1 Lecture 18: Indexes Monday, November 10, 2003. 2 Midterm Problem 1a: select student.sname, avg(takes.grade) from student, takes where student.sid ="— Presentation transcript:

Similar presentations

About project

Feedback