COMP 451/651 B-Trees Size and Lookup Chapter 1.

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Advertisements

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
Multidimensional Indexing
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
Multidimensional Data
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
2-dimensional indexing structure
BTrees & Bitmap Indexes
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Hash Table indexing and Secondary Storage Hashing.
B+-tree and Hashing.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Chapter 9 of DBMS First we look at a simple (strawman) approach (ISAM). We will see why it is unsatisfactory. This will motivate the B+Tree Read 9.1 to.
External Memory Hashing. Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Chapter 13.4 Hash Tables Steve Ikeoka ID: 113 CS 257 – Spring 2008.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
COMP 451/651 Multiple-key indexes
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Primary Indexes Dense Indexes
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
CS4432: Database Systems II
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Chapter 5 Multidimensional Indexes. One dimensional index can be used to support multidimensional query. F1=‘abcd’ F2= 123‘abcd#123’
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
CS422 Principles of Database Systems Indexes Chengyu Sun California State University, Los Angeles.
Multidimensional Access Structures
Spatial Indexing I Point Access Methods.
Database Management Systems (CS 564)
CPSC-310 Database Systems
External Memory Hashing
Indexing and Hashing Basic Concepts Ordered Indices
Multidimensional Indexes
Chapter 11 Indexing And Hashing (1)
Database Design and Programming
Presentation transcript:

COMP 451/651 B-Trees Size and Lookup Chapter 1

COMP 451/651 Records sorted on search key, with 10 records per blocks. The B-tree is a dense index. 1,000,000 /10 = 100,000 blocks to hold the records For the B-tree we reason as follows: We need 1,000,000 pointers at the leaves to point to each record. They can be packed into 1,000,000 / 70 = 14,286 leave blocks. We need 14,286 pointers in the above level. Packed into 14,286/70=204 blks We need 204 pointers in the above Packed into 204/70 = 3 blocks Total: 100,000 + 14,286 + 204 + 3 1(root) = 114,494 blocks Since the tree has 4 levels, we need 5 I/O’s for a lookup. B-Trees (I) Suppose: Blocks can hold either: 10 records or 99 keys and 100 pointers B-Tree nodes are 70% full 69 keys and 70 pointers 1,000,000 records For each structure described below, determine The total number of blocks The average # of I/O for lookup given the search key. Chapter 1

COMP 451/651 Same as (A), but the records aren’t sorted. Still, records packed 10 to a block. Same as (A) = 114,494 blocks Same as (A) = 5 I/O’s for a lookup. B-Trees (II) Suppose: Blocks can hold either: 10 records or 99 keys and 100 pointers B-Tree nodes are 70% full 69 keys and 70 pointers 1,000,000 records For each structure described below, determine The total number of blocks The average # of I/O for lookup given the search key. Chapter 1

B-Trees (III) Same as (A), but B-tree is a sparse index. COMP 451/651 Same as (A), but B-tree is a sparse index. 1,000,000 /10 = 100,000 blocks to hold the records For the sparse B-tree: We need 100,000 pointers at the leaves to point to each data block. They can be packed into 100,000 / 70 = 1,429 leave blocks. We need 1,429 pointers in the above level. Packed into 1429/70=21 blocks We need 21 pointers in the above Packed into 21/70 = 1 block (root) Total: 100,000 + 1,429 + 21 + 1 = 101,451blocks Since the tree has 3 levels, we need 4 I/O’s for a lookup. B-Trees (III) Suppose: Blocks can hold either: 10 records or 99 keys and 100 pointers B-Tree nodes are 70% full 69 keys and 70 pointers 1,000,000 records For each structure described below, determine The total number of blocks The average # of I/O for lookup given the search key. Chapter 1

B-Trees (IV) Suppose: Blocks can hold either: 10 records or COMP 451/651 Instead of the B-Tree leaves having pointers to data records, the B-Tree leaves hold the records themselves. A block can hold 10 records, but a leaf block is in fact 70% full, i.e. there are 7 records per leaf. 1,000,000 /7 = 142,857 blocks to hold the records. These blocks will be the leaves of the B-tree. We need 142,857 pointers at the next level. They can be packed into 142,857/ 70 = 2,040 blocks. We need 2,040 pointers in the above next level. Packed into 2040/70=30 blocks We need 30 pointers in the above… Packed into 30/70 = 1 block (root) Total: 142857 + 2040 +30 +1 = 144,928 blocks Since the tree has 4 levels, we need 4 I/O’s for a lookup. B-Trees (IV) Suppose: Blocks can hold either: 10 records or 99 keys and 100 pointers B-Tree nodes are 70% full 69 keys and 70 pointers 1,000,000 records For each structure described below, determine The total number of blocks The average # of I/O for lookup given the search key. Chapter 1

B-Trees Range Queries: COMP 451/651 B-Trees Range Queries: Repeat the exercise in the case that the query is a range query that is matched by 1000 records. Chapter 1

COMP 451/651 Records sorted on search key, with 10 records per blocks. The B-tree is a dense index. Since the tree has 4 levels, we need 4 I/O’s to go to the leaf where the pointer to start of the range is located. Then, by following the sibling pointers we retrieve all the leaves holding the pointers to the matching records. 1000 pointers are packed in 1000/70 = 14 blocks. Now, by following each of the 1000 pointers we read the 1000 records. Since the records are sorted the 1000 range records will occupy (almost) as few blocks as possible, i.e. 1000 / 10 = 100 blocks. In total, we need 4 + 14 + 100 = 118 I/O’s. B-Trees (I) Suppose: Blocks can hold either: 10 records or 99 keys and 100 pointers B-Tree nodes are 70% full 69 keys and 70 pointers 1,000,000 records For each structure described below, determine The total number of blocks The average # of I/O for lookup given the search key. Chapter 1

COMP 451/651 Same as (A), but the records aren’t sorted. Still, records packed 10 to a block. As in (A) we need 4+14 I/O’s to locate the 1000 pointers. However, since the records aren’t sorted, it might be that the 1000 records are located in 1000 different blocks. So, we might end up reading 1000 blocks. In total, we need 4 + 14 + 1000 = 1018 I/O’s. B-Trees (II) Suppose: Blocks can hold either: 10 records or 99 keys and 100 pointers B-Tree nodes are 70% full 69 keys and 70 pointers 1,000,000 records For each structure described below, determine The total number of blocks The average # of I/O for lookup given the search key. Chapter 1

B-Trees (III) Same as (A), but B-tree is a sparse index. COMP 451/651 Same as (A), but B-tree is a sparse index. Since the tree has 3 levels, we need 3 I/O’s to go to the leaf where the pointer to start of the range is located. How many pointers to data we need to follow? 1000 records are packed into 1000/10 = 100 blocks. So, we need to follow 100 pointers. How many leaves are needed to pack 100 pointers? 2 leaves. Total: 3 + 2 + 100 = 105 I/O’s B-Trees (III) Suppose: Blocks can hold either: 10 records or 99 keys and 100 pointers B-Tree nodes are 70% full 69 keys and 70 pointers 1,000,000 records For each structure described below, determine The total number of blocks The average # of I/O for lookup given the search key. Chapter 1

B-Trees (IV) Suppose: Blocks can hold either: 10 records or COMP 451/651 Instead of the B-Tree leaves having pointers to data records, the B-Tree leaves hold the records themselves. A block can hold 10 records, but a leaf block is in fact 70% full, i.e. there are 7 records per leaf. Since the tree has 4 levels, we need 4 I/O’s to go to the leaf where the start record of the range is located. By following the sibling pointers we need to retrieve as many leaves as are needed to hold 1000 records. 1000 records are packed in 1000 /7 = 143 leaves. Total: 4 + 143 = 147 I/O’s B-Trees (IV) Suppose: Blocks can hold either: 10 records or 99 keys and 100 pointers B-Tree nodes are 70% full 69 keys and 70 pointers 1,000,000 records For each structure described below, determine The total number of blocks The average # of I/O for lookup given the search key. Chapter 1

COMP 451/651 Extensible Hash Tables Chapter 1

Dynamic Hashing Framework COMP 451/651 Dynamic Hashing Framework Hash function h produces a sequence of k bits. Only some of the bits are used at any time to determine placement of keys in buckets. Extensible Hashing (Buckets may share blocks!) Keep parameter i = number of bits from the beginning of h(K) that determine the bucket. Bucket array now = pointers to buckets. A block can serve as several buckets. For each block, a parameter ji tells how many bits of h(K) determine membership in the block. I.e., a block represents 2i-j buckets that share the first j bits of their number. Chapter 1

COMP 451/651 Example An extensible hash table when i=1: Chapter 1

Extensible Hash­table Insert COMP 451/651 Extensible Hash­table Insert If record with key K fits in the block pointed to by h(K), put it there. If not, let this block B represent j bits. j<i: Split block B into two and distribute the records (of B) according to (j+1)st bit; set j:=j+1; fix pointers in bucket array, so that entries that formerly pointed to B now point either to B or the new block How? depending on…(j+1)st bit j=i: Set i:=i+1; Double the bucket array, so it has now 2i+1 entries; proceed as in (1). Let w be an old array entry. Both the new entries w0 and w1 point to the same block that w used to point to. Chapter 1

Now, after the insertion COMP 451/651 Example Insert record with h(K) = 1010. Before Now, after the insertion Chapter 1

Example: Next Currently Next: records with h(K)=0000; h(K)=0111. COMP 451/651 Example: Next After the insertions Currently Next: records with h(K)=0000; h(K)=0111. Bucket for 0... gets split, but i stays at 2. Then: record with h(K) = 1000. Overflows bucket for 10... Raise i to 3. Chapter 1

COMP 451/651 Exercise Suppose we want to insert keys with hash values: 0000…1111 in an extensible hash table. Assume that a block can hold three records. Chapter 1

0000 i=1 1 0000 0001 i=1 1 Insertion of 0011. No room 0000 0001 0010 COMP 451/651 0000 i=1 1 0000 0001 i=1 1 Insertion of 0011. No room 0000 0001 0010 i=1 1 Chapter 1

COMP 451/651 0000 0001 0010 i=2 00 01 2 10 11 1 This is the new block. 0000 0001 i=3 000 001 3 0010 0011 010 011 2 100 101 110 111 1 This is the new block. Chapter 1

COMP 451/651 0000 0001 i=3 000 001 3 0010 0011 010 011 2 100 101 110 111 1 0000 0001 i=3 000 001 3 0010 0011 010 011 0100 2 100 101 110 111 1 Chapter 1

COMP 451/651 0000 0001 i=3 000 001 3 0010 0011 010 011 0100 0101 2 100 101 110 111 1 0000 0001 i=3 000 001 3 0010 0011 010 011 0100 0101 0110 2 100 101 110 111 1 Chapter 1

COMP 451/651 i=3 000 0000 3 001 0001 010 011 0010 3 100 0011 101 110 111 0100 3 0110 3 0101 0111 1 Chapter 1

COMP 451/651 i=3 000 0000 3 001 0001 010 011 0010 3 100 0011 101 110 111 0100 3 0110 3 0101 0111 1000 1 Chapter 1

COMP 451/651 i=3 000 0000 3 001 0001 010 011 0010 3 100 0011 101 110 111 0100 3 0110 3 0101 0111 1000 1 1001 Chapter 1

COMP 451/651 i=3 000 0000 3 001 0001 010 011 0010 3 100 0011 101 110 111 0100 3 0110 3 0101 0111 1000 1 1001 1010 Chapter 1

COMP 451/651 i=3 000 0000 3 001 0001 010 011 0010 3 100 0011 101 110 111 0100 3 0110 3 0101 0111 1000 2 2 1001 Still no room for 1011 1010 Chapter 1

COMP 451/651 i=3 000 0000 3 001 0001 010 011 0010 3 100 0011 101 110 111 0100 3 0110 3 0101 0111 1000 3 1010 3 2 1001 1011 Chapter 1

COMP 451/651 i=3 000 0000 3 001 0001 010 011 0010 3 100 0011 101 110 111 0100 3 0110 3 0101 0111 1000 3 1010 3 1100 2 1001 1011 Chapter 1

COMP 451/651 i=3 000 0000 3 001 0001 010 011 0010 3 100 0011 101 110 111 0100 3 0110 3 0101 0111 1000 3 1010 3 1100 2 1001 1011 1101 Chapter 1

COMP 451/651 i=3 000 0000 3 001 0001 010 011 0010 3 100 0011 101 110 111 0100 3 0110 3 0101 0111 1000 3 1010 3 1100 2 1001 1011 1101 1110 Chapter 1

COMP 451/651 i=3 000 0000 3 001 0001 010 011 0010 3 100 0011 101 110 111 0100 3 0110 3 0101 0111 1000 3 1010 3 1100 3 1110 3 1001 1011 1101 1111 Chapter 1

COMP 451/651 Linear Hash Tables Chapter 1

This is also part of the structure COMP 451/651 Linear Hashing Use i bits from right (low­order) end of h(K). Buckets numbered [0…n-1], where 2i-1<n2i. Let last i bits of h(K) be m = (a1,a2,…,ai) If m < n, then record belongs in bucket m. If nm<2i, then record belongs in bucket m-2i-1, that is the bucket we would get if we changed a1 (which must be 1) to 0. i=1 n=2 r=3 #of buckets #of records This is also part of the structure Chapter 1

Linear Hash­Table Insert COMP 451/651 Linear Hash­Table Insert Pick an upper limit on capacity, e.g., 85% (1.7 records/bucket in our example). If an insertion exceeds capacity limit, set n := n + 1. If new n is 2i + 1, set i := i + 1. No change in bucket numbers needed --- just imagine a leading 0. Need to split bucket n - 2i-1 because there is now a bucket numbered (old) n. Chapter 1

Example Insert record with h(K) = 0101. COMP 451/651 Example Insert record with h(K) = 0101. Capacity limit exceeded; increment n. r=3 n=2 i=1 #of records #of buckets i=2 #of buckets n=3 #of records r=4 Chapter 1

Example Insert record with h(K) = 0001. Capacity limit not exceeded. COMP 451/651 Example Insert record with h(K) = 0001. Capacity limit not exceeded. But bucket is full; add overflow bucket. i=2 n=3 r=5 Chapter 1

Example Insert record with h(K) = 1100. COMP 451/651 Example Insert record with h(K) = 1100. Capacity exceeded; set n = 4, add bucket 11. Split bucket 01. r=7 n=4 i=2 Chapter 1

Lookup in Linear Hash Table COMP 451/651 Lookup in Linear Hash Table For record(s) with search key K, compute h(K); search the corresponding bucket according to the procedure described for insertion. If the record we wish to look up isn’t there, it can’t be anywhere else. E.g. lookup for a key which hashes to 1010, and then for a key which hashes to 1011. i=2 n=3 r=4 Chapter 1

COMP 451/651 Exercise Suppose we want to insert keys with hash values: 0000…1111 in a linear hash table with 100% capacity threshold. Assume that a block can hold three records. Chapter 1

COMP 451/651 r=1 n=1 i=1 0000 r=2 n=1 i=1 0000 0001 r=3 n=1 i=1 0000 0001 0010 r=4 n=2 i=1 0000 0010 0001 0011 1 Chapter 1

COMP 451/651 r=5 n=2 i=1 0000 0010 0100 0001 0011 1 r=6 n=2 i=1 0000 0010 0100 0001 0011 0101 1 Chapter 1

COMP 451/651 i=2 0000 n=3 00 r=7 0100 0001 01 0011 0101 0010 10 0110 Continue at home… Chapter 1

Multidimensional Indexes COMP 451/651 Multidimensional Indexes Chapter 1

Grid files (hash-like structure) COMP 451/651 Grid files (hash-like structure) Divide data into stripes in each dimension Rectangle in grid points to bucket Example: database records (age,salary) for people who buy gold jewelry. Data: (25,60) (45,60) (50,75) (50,100) (50,120) (70,110) (85,140) (30,260) (25,400) (45,350) (50,275) (60,260) Chapter 1

COMP 451/651 Grid file Chapter 1

COMP 451/651 Operations Lookup Find coordinates of point in each dimension --- gives you a bucket to search. Nearest Neighbor Lookup point P . Consider points in that bucket. Problem: there could be points in adjacent buckets that are closer. Example: NN of (45; 200). Problem: there could be no points at all in the bucket: widen search? Range Queries Ranges define a region of buckets. Buckets on border may contain points not in range. Example: 35 < age <= 45; 50 < salary <= 100. Queries Specifying Only One Attribute Problem: must search a whole row or column of buckets. Chapter 1

COMP 451/651 Insertion Use overflow buckets, or split stripes in one or more dimensions Insert (52,200). Split central bucket, for instance by splitting central salary stripe The blocks of 3 buckets are to be processed. In general the blocks of n buckets are to be processed during a split. n is the number of buckets in the chosen direction Very expensive. Chapter 1

Partitioned hashing Example: Gold jewelry with first bit = age mod 2 COMP 451/651 Partitioned hashing Example: Gold jewelry with first bit = age mod 2 bits 2 and 3: salary mod 4 Works well for: partial match (i.e. just an attribute specified) Bad for: range Nearest Neighbors queries Chapter 1

COMP 451/651 KD-Trees Generalizes binary search trees, but search attributes rotate among dimensions Levels rotate among the dimensions, partitioning the points by comparison with a value for that dimension. Leaves are blocks Chapter 1

COMP 451/651 Geometrically… Remember we didn’t want the stripes in grid files to continue all along the vertical or horizontal direction? Here they don’t. Chapter 1

Operations Lookup in KD­Trees COMP 451/651 Operations Lookup in KD­Trees Find appropriate leaf by binary search. Is the record there? Insert Into KD­Trees Lookup record to be inserted, reaching the appropriate leaf. If there is room, put record in that block. If not, find a suitable value for the appropriate dimension and split the leaf block. Example Someone 35 years old with a salary of $500K buys gold jewelry. Belongs in leaf with (25; 400) and (45; 350). Too full: split on age. See figure next. Chapter 1

Split at 35 is because it is the median. COMP 451/651 Split at 35 is because it is the median. Chapter 1

Queries Partial match queries COMP 451/651 Queries Partial match queries When we don’t know the value of the attribute at the node, we must explore both of its children. E.g. find points with age=50 Range Queries Sometimes a range will allow us to move to only one child of a node. But if the range straddles the splitting value then we must explore both children. Chapter 1

R-Tree Lookup (Where am I) COMP 451/651 R-Tree Lookup (Where am I) We start at the root, with which the entire region is associated. We examine the subregions at the root and determine which children correspond to interior regions that contain point P. If there are zero regions we are done; P is not in any data region. If there are some subregions we must recursively search those children as well, until we reach the leaves of the tree. Chapter 1

COMP 451/651 R-Tree Insertion We start at the root and try to find some subregion into R fits. If more than one we pick just one, and repeat the process there. If there is no region, we expand, and we want to expand as little as possible. So, we pick the child that will be expanded as little as possible. Eventually we reach a leaf, where we insert the region R. However, if there is no room we have to split the leaf. We split the leaf in such a way as to have the smallest subregions. Chapter 1

Example Suppose that the leaves have room for six regions. COMP 451/651 Example Suppose that the leaves have room for six regions. Further suppose that the six regions are together on one leaf, whose region is represented by the outer solid rectangle. Now suppose that another region POP is added. Chapter 1

Example (Cont’ ed) Road1 Road2 House1 School House2 Pipeline Pop COMP 451/651 Example (Cont’ ed) ((0,0),(60,50)) ((20,20),(100,80)) Road1 Road2 House1 School House2 Pipeline Pop Chapter 1

COMP 451/651 Example (Cont’ ed) Suppose now that House3 ((70,5),(80,15)) gets added. We do have space to the leaves, but we need to expand one of the regions at the parent. We choose to expand the one which needs to be expanded the least. Chapter 1

Which one should we expand? COMP 451/651 Which one should we expand? ((0,0),(80,50)) ((20,20),(100,80)) Road1 Road2 House1 House3 School House2 Pipeline Pop ((0,0),(60,50)) ((5,20),(100,80)) Road1 Road2 House1 School House2 Pipeline Pop House3 Chapter 1

Bitmap Indexes Suppose we have n tuples. COMP 451/651 Bitmap Indexes Suppose we have n tuples. A bitmap index for a field F is a collection of bit vectors of length n, one for each possible value that may appear in the field F. The vector for value v has 1 in position i if the i-th record has v in field F, and it has 0 there if not. (30, foo) (30, bar) (40, baz) (50, foo) (40, bar) (30, baz) foo 100100 bar 0… baz … Chapter 1

Motivation for Bitmap Indexes COMP 451/651 Motivation for Bitmap Indexes They allow very fast evaluation of partial match queries. SELECT title FROM Movie WHERE studioName=‘Disney’ AND year=1995; If there are bitmap indexes on both studioName and year, we can intersect the vectors for the Disney value and 1995 value. We should have another index to retrieve the tuples by number. Chapter 1