Download presentation
Presentation is loading. Please wait.
Published byCuthbert Mills Modified over 9 years ago
1
Preferential top-k search over local data dissertation thesis RNDr. Martin Šumák supervisor: doc. RNDr. Stanislav Krajči, PhD. consultant: RNDr. Peter Gurský, PhD.
2
Outline Top-k search – motivation and example – restrictions and assumptions R-tree-based solution – normalization of data – R ++ -tree Grid file-based solution Experiments – Comparison with B + -trees-based solution, table scan, etc. 2013-08-05Preferential top-k search over local data, Dissertation thesis, RNDr. Martin Šumák2
3
Top-k search Example – find top 20 apartments with 3 or 4 rooms, not at first floor, with price about 60000 not exceeding 70000 euro – moreover, price is the most important attribute and floor is the least important attribute 2013-08-053Preferential top-k search over local data, Dissertation thesis, RNDr. Martin Šumák
4
Top-k query k = 20 preferences to attribute’s values – fuzzy functions importance of attributes – weights w price = 3 w rooms = 2 w floor = 1 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 4
5
Top-k query Overall value of object O is 3*f price (O price ) + 2*f rooms (O rooms ) + 1*f floor (O floor ) In general c(f price (O price ), f rooms (O rooms ), f floor (O floor )) 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 5 Function c has to be monotone!
6
The goal of top-k search to find top-k objects effectively – by processing minimum amount of data restrictions and assumptions – all the data is accessible locally – all attributes are numerical 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 6
7
R-tree-based solution object – a vector of n numbers – a point of n-dimensional space – R-tree, R*-tree, R + -tree, R ++ -tree 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 7
8
From kNN to top-k search k nearest neighbour – known incremental algorithm – distance from “query point Z” is the measure of “closeness” 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 8
9
From kNN to top-k search top-k search – overall value (h) is the measure of “goodness” – by replacing distance with overall value and reversing order we change the result from kNN to top-k 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 9
10
Analogy of kNN and top-k search Correctness Efficiency 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 10 top-k kNN
11
Disproportion of attribute values floor, area, price – very different ranges – solution: normalization – linear transformation of attribute values to interval [0; 1] Another disproportion comes from weights 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 11
12
Normalization applicability Useful for – R*-tree Meaningless for – R-tree (proven for the quadratic split method) – R + -tree, R ++ -tree – Grid file 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 12
13
Why the R ++ -tree Zero overlaps & minimum bounding rectangles may cause a problem when adding new object R + -tree avoids overlaps at the price of rectangles size 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 13
14
The R ++ -tree idea 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 14 Zero overlaps & minimum bounding rectangles may cause a problem when adding new object R ++ -tree keeps two rectangles for each node – the minimum one and the parent covering one
15
The R ++ -tree properties Height-balanced Zero overlaps Overflow nodes at leaf level only Minimum node occupancy is 1 For the top-k search purposes, attribute values can be strings or any other comparable values (not just numbers) 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 15
16
Top-k search over Grid file Grid file is a spatial index for point data We used static Grid file without extra directory 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 16
17
Top-k search over Grid file We have proven correctness and efficiency as well 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 17
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.