R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice.

R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice

Research motivation  Besides kNN and range queries, R-tree-like index is usable for computation of Top-k query (find best k objects according to user preferences)  h(x 1, x 2 ) = f 1 (x 1 ) + f 2 (x 2 ) Martin Šumák, Peter Gurský at ADBIS 2013

Why highly redundant point data  Our data consists of flats with the following attributes:  price  area  floor  max floor of building  year of approbation  number of rooms  Each flat is represented by a point in 6-dimensional space Martin Šumák, Peter Gurský at ADBIS 2013

R + -tree fundamentals  R + -tree is R-tree-like index with the following specialities:  zero overlaps between nodes at the same level  rectangles of nodes cover all the parent’s rectangle  suitable for point data and point queries Martin Šumák, Peter Gurský at ADBIS 2013

R + -tree fundamentals  desired state  zero overlaps  minimum bounding rect.  R + -tree  avoids overlaps at the cost of rectangles size Martin Šumák, Peter Gurský at ADBIS 2013

 desired state  zero overlaps  minimum bounding rect.  R ++ -tree  inner nodes keep two rectangles for each child node – the minimum and the parent covering one Martin Šumák, Peter Gurský at ADBIS 2013 The R ++ -tree idea

 desired state  zero overlaps  minimum bounding rect.  R ++ -tree  inner nodes keep two rectangles for each child node – the minimum and the parent covering one Leaf nodes left unchanged Martin Šumák, Peter Gurský at ADBIS 2013 The R ++ -tree idea

Nodes of R ++ -tree  Leaf nodes  Exactly same as leaf nodes of R + -tree  Contain Id and coordinates for each object  Take one disk page each  Inner nodes  Contain pointer and two rectangles for each child node  Take two disk pages each Martin Šumák, Peter Gurský at ADBIS 2013

Using of two rectangles in inner nodes  Searching  Only the minimum bounding rectangles are necessary  Inserting new objects  Both minimum bounding and parent covering rectangles need to be used (read/updated) Martin Šumák, Peter Gurský at ADBIS 2013

Implementation of inner nodes  First page contains minimum bounding rectangles  Second page contains parent covering rectangles Martin Šumák, Peter Gurský at ADBIS 2013

Advantages and drawbacks of two pages idea  Advantages  searching requires reading of one page per each node involved  rate between page size and node capacity is the same as in R + - tree  Drawbacks  When updating, two pages per inner node need to be processed  The real impact on whole index size is relatively low Martin Šumák, Peter Gurský at ADBIS 2013

Experiments - data  Artificial data (range, kNN and top-k query)  100 000 random points of 2–10-dimensional space  decimal values within [0; 1]  Integer values from 1 to 100  Integer values from 1 to 10  Pseudo-real data (top-k query)  6 dimensional points – data of flats for sale  550 000 flats (20-multiple set)  2 700 000 flats (100-multiple set) Martin Šumák, Peter Gurský at ADBIS 2013

Experiments - measures  300 random queries per each data set and query type  Average time per query  Average number of I/Os per query  One I/O corresponds to reading of one page i.e. processing one node Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data 100 000 random points with decimal values within [0; 1] Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data 100 000 random points with integer values from 1 to 100 Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data 100 000 random points with integer values from 1 to 10 Martin Šumák, Peter Gurský at ADBIS 2013

Pseudo-real data 550 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013

Pseudo-real data 2 700 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013

Thank you for your attention Martin Šumák, Peter Gurský at ADBIS 2013

R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice.

Similar presentations

Presentation on theme: "R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice.

Similar presentations

Presentation on theme: "R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice."— Presentation transcript:

Similar presentations

About project

Feedback