Lecture 21: Indexes Monday, November 13, 2000.

Lecture 21: Indexes Monday, November 13, 2000

Outline Addresses (3.3) Index Structures (4.1, 4.2, 4.3)

Physical Addresses Each block and each record have a physical address that consists of: The host The disk The cylinder number The track number The block within the track For records: an offset in the block sometimes this is in the block’s header

Logical Addresses Logical address: a string of bytes (10-16)
More flexible: can blocks/records around But need translation table: Logical address Physical address L1 P1 L2 P2 L3 P3

Main Memory Address When the block is read in main memory, it receives a main memory address Need another translation table Memory address Logical address M1 L1 M2 L2 M3 L3

Optimization: Pointer Swizzling
= the process of replacing a physical/logical pointer with a main memory pointer Still need translation table, but subsequent references are faster

Pointer Swizzling Block 2 Block 1 Disk read in memory swizzled Memory
unswizzled

Pointer Swizzling Automatic: when block is read in main memory, swizzle all pointers in the block On demand: swizzle only when user requests No swizzling: always use translation table

Pointer Swizzling When blocks return to disk: pointers need unswizzled
Danger: someone else may point to this block Pinned blocks: we don’t allow it to return to disk Keep a list of references to this block

Indexes An index on a file speeds up selections on the search key fields for the index. Any subset of the fields of a relation can be the search key for an index on the relation. Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation). An index contains a collection of data entries, and supports efficient retrieval of all data entries with a given key value k. 7

Index Classification Primary/secondary Clustered/unclustered
Dense/sparse B+ tree / Hash table / …

Primary Index File is sorted on the index attribute
Dense index: sequence of (key,pointer) pairs 10 20 10 20 30 40 30 40 50 60 70 80 50 60 70 80

Primary Index Sparse index 10 20 30 40 50 60 70 80 10 30 50 90 70 110
130 150 50 60 70 80

Primary Index with Duplicate Keys
Dense index: 10 10 20 30 40 10 20 50 60 70 80 20 30 40

Sparse index: pointer to lowest search key in each block: Search for 20 10 10 20 30 10 20 20 30 40

Better: pointer to lowest new search key in each block: Search for 20 10 10 20 30 40 10 20 50 60 70 80 30 40 50

Secondary Indexes To index other attributes than primary key
Always dense (why ?) 20 30 10 20 30 20 20 30 10 20 10 30

Clustered/Unclustered
Primary indexes = usually clustered Secondary indexes = usually unclustered

Clustered vs. Unclustered Index
Data entries Data entries (Index File) (Data file) Data Records Data Records CLUSTERED UNCLUSTERED 12

Secondary Indexes Applications:
index other attributes than primary key index unsorted files (heap files) index clustered data

Applications of Secondary Indexes
Clustered data Company(name, city), Product(pid, maker) Select city From Company, Product Where name=maker and pid=“p045” Select pid From Company, Product Where name=maker and city=“Seattle” Products of company 2 Products of company 3 Products of company 1 Company 1 Company 2 Company 3

Composite Search Keys Composite Search Keys: Search on a combination of fields. Equality query: Every field value is equal to a constant value. E.g. wrt <sal,age> index: age=20 and sal =75 Range query: Some field value is not a constant. E.g.: age =20; or age=20 and sal > 10 Examples of composite key indexes using lexicographic order. 11,80 11 12,10 12 name age sal 12,20 12 13,75 bob 12 10 13 <age, sal> cal 11 80 <age> joe 12 20 10,12 sue 13 75 10 20,12 Data records sorted by name 20 75,13 75 80,11 80 <sal, age> <sal> Data entries in index sorted by <sal,age> Data entries sorted by <sal> 13

B+ Trees Search trees Idea in B Trees: Idea in B+ Trees:
make 1 node = 1 block Idea in B+ Trees: Make leaves into a linked list (range queries are easier)

B+ Trees Basics Parameter d
Each node has >= d and <= 2d keys (except root) Each leaf has >=d and <= 2d keys: 30 120 240 Keys k < 30 Keys 30<k<=120 Keys 120<k<=240 Keys 240<k 40 50 60 Next leaf 40 50 60

B+ Tree Example d = 2 80 20 60 100 120 140 10 15 18 20 30 40 50 60 65 80 85 90 10 15 18 20 30 40 50 60 65 80 85 90

B+ Tree Design How large d ? Example: 2d x 4 + (2d+1) x 8 <= 4096
Key size = 4 bytes Pointer size = 8 bytes Block size = 4096 byes 2d x 4 + (2d+1) x 8 <= 4096 d = 170

Searching a B+ Tree Exact key values: Range queries: Start at the root
Proceed down, to the leaf Range queries: As above Then sequential traversal Select name From people Where age = 25 Select name From people Where 20 <= age and age <= 30

B+ Trees in Practice Typical order: 100. Typical fill-factor: 67%.
average fanout = 133 Typical capacities: Height 4: 1334 = 312,900,700 records Height 3: 1333 = 2,352,637 records Can often hold top levels in buffer pool: Level 1 = page = Kbytes Level 2 = pages = Mbyte Level 3 = 17,689 pages = 133 MBytes

Insertion in a B+ Tree Insert (K, P) Find leaf where K belongs, insert
If no overflow (2d keys or less), halt If overflow (2d+1 keys), split node, insert in parent: If leaf, keep K3 too in right node When root splits, new root has 1 key only (K3, ) to parent K1 K2 K3 K4 K5 P0 P1 P2 P3 P4 p5 K1 K2 P0 P1 P2 K4 K5 P3 P4 p5

Insertion in a B+ Tree Insert K=19 80 20 60 100 120 140 10 15 18 20 30
50 60 65 80 85 90 10 15 18 20 30 40 50 60 65 80 85 90

Insertion in a B+ Tree After insertion 80 20 60 100 120 140 10 15 18
19 20 30 40 50 60 65 80 85 90 10 15 18 19 20 30 40 50 60 65 80 85 90

Insertion in a B+ Tree Now insert 25 80 20 60 100 120 140 10 15 18 19
30 40 50 60 65 80 85 90 10 15 18 19 20 30 40 50 60 65 80 85 90

Insertion in a B+ Tree After insertion 80 20 60 100 120 140 10 15 18
19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Insertion in a B+ Tree But now have to split ! 80 20 60 100 120 140 10
15 18 19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Insertion in a B+ Tree After the split 80 20 30 60 100 120 140 10 15
18 19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Lecture 21: Indexes Monday, November 13, 2000.

Similar presentations

Presentation on theme: "Lecture 21: Indexes Monday, November 13, 2000."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 21: Indexes Monday, November 13, 2000.

Similar presentations

Presentation on theme: "Lecture 21: Indexes Monday, November 13, 2000."— Presentation transcript:

Similar presentations

About project

Feedback