Download presentation
Presentation is loading. Please wait.
Published byTeresa Parker Modified over 8 years ago
1
R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice
2
Research motivation Besides kNN and range queries, R-tree-like index is usable for computation of Top-k query (find best k objects according to user preferences) h(x 1, x 2 ) = f 1 (x 1 ) + f 2 (x 2 ) Martin Šumák, Peter Gurský at ADBIS 2013
3
Why highly redundant point data Our data consists of flats with the following attributes: price area floor max floor of building year of approbation number of rooms Each flat is represented by a point in 6-dimensional space Martin Šumák, Peter Gurský at ADBIS 2013
4
R + -tree fundamentals R + -tree is R-tree-like index with the following specialities: zero overlaps between nodes at the same level rectangles of nodes cover all the parent’s rectangle suitable for point data and point queries Martin Šumák, Peter Gurský at ADBIS 2013
5
R + -tree fundamentals desired state zero overlaps minimum bounding rect. R + -tree avoids overlaps at the cost of rectangles size Martin Šumák, Peter Gurský at ADBIS 2013
6
desired state zero overlaps minimum bounding rect. R ++ -tree inner nodes keep two rectangles for each child node – the minimum and the parent covering one Martin Šumák, Peter Gurský at ADBIS 2013 The R ++ -tree idea
7
desired state zero overlaps minimum bounding rect. R ++ -tree inner nodes keep two rectangles for each child node – the minimum and the parent covering one Leaf nodes left unchanged Martin Šumák, Peter Gurský at ADBIS 2013 The R ++ -tree idea
8
Nodes of R ++ -tree Leaf nodes Exactly same as leaf nodes of R + -tree Contain Id and coordinates for each object Take one disk page each Inner nodes Contain pointer and two rectangles for each child node Take two disk pages each Martin Šumák, Peter Gurský at ADBIS 2013
9
Using of two rectangles in inner nodes Searching Only the minimum bounding rectangles are necessary Inserting new objects Both minimum bounding and parent covering rectangles need to be used (read/updated) Martin Šumák, Peter Gurský at ADBIS 2013
10
Implementation of inner nodes First page contains minimum bounding rectangles Second page contains parent covering rectangles Martin Šumák, Peter Gurský at ADBIS 2013
11
Advantages and drawbacks of two pages idea Advantages searching requires reading of one page per each node involved rate between page size and node capacity is the same as in R + - tree Drawbacks When updating, two pages per inner node need to be processed The real impact on whole index size is relatively low Martin Šumák, Peter Gurský at ADBIS 2013
12
Experiments - data Artificial data (range, kNN and top-k query) 100 000 random points of 2–10-dimensional space decimal values within [0; 1] Integer values from 1 to 100 Integer values from 1 to 10 Pseudo-real data (top-k query) 6 dimensional points – data of flats for sale 550 000 flats (20-multiple set) 2 700 000 flats (100-multiple set) Martin Šumák, Peter Gurský at ADBIS 2013
13
Experiments - measures 300 random queries per each data set and query type Average time per query Average number of I/Os per query One I/O corresponds to reading of one page i.e. processing one node Martin Šumák, Peter Gurský at ADBIS 2013
14
Artificial data 100 000 random points with decimal values within [0; 1] Martin Šumák, Peter Gurský at ADBIS 2013
15
Artificial data 100 000 random points with decimal values within [0; 1] Martin Šumák, Peter Gurský at ADBIS 2013
16
Artificial data 100 000 random points with decimal values within [0; 1] Martin Šumák, Peter Gurský at ADBIS 2013
17
Artificial data 100 000 random points with integer values from 1 to 100 Martin Šumák, Peter Gurský at ADBIS 2013
18
Artificial data 100 000 random points with integer values from 1 to 100 Martin Šumák, Peter Gurský at ADBIS 2013
19
Artificial data 100 000 random points with integer values from 1 to 100 Martin Šumák, Peter Gurský at ADBIS 2013
20
Artificial data 100 000 random points with integer values from 1 to 10 Martin Šumák, Peter Gurský at ADBIS 2013
21
Artificial data 100 000 random points with integer values from 1 to 10 Martin Šumák, Peter Gurský at ADBIS 2013
22
Artificial data 100 000 random points with integer values from 1 to 10 Martin Šumák, Peter Gurský at ADBIS 2013
23
Pseudo-real data 550 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013
24
Pseudo-real data 550 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013
25
Pseudo-real data 2 700 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013
26
Thank you for your attention Martin Šumák, Peter Gurský at ADBIS 2013
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.