Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preferential top-k search over local data dissertation thesis RNDr. Martin Šumák supervisor: doc. RNDr. Stanislav Krajči, PhD. consultant: RNDr. Peter.

Similar presentations


Presentation on theme: "Preferential top-k search over local data dissertation thesis RNDr. Martin Šumák supervisor: doc. RNDr. Stanislav Krajči, PhD. consultant: RNDr. Peter."— Presentation transcript:

1 Preferential top-k search over local data dissertation thesis RNDr. Martin Šumák supervisor: doc. RNDr. Stanislav Krajči, PhD. consultant: RNDr. Peter Gurský, PhD.

2 Outline Top-k search – motivation and example – restrictions and assumptions R-tree-based solution – normalization of data – R ++ -tree Grid file-based solution Experiments – Comparison with B + -trees-based solution, table scan, etc. 2013-08-05Preferential top-k search over local data, Dissertation thesis, RNDr. Martin Šumák2

3 Top-k search Example – find top 20 apartments with 3 or 4 rooms, not at first floor, with price about 60000 not exceeding 70000 euro – moreover, price is the most important attribute and floor is the least important attribute 2013-08-053Preferential top-k search over local data, Dissertation thesis, RNDr. Martin Šumák

4 Top-k query k = 20 preferences to attribute’s values – fuzzy functions importance of attributes – weights w price = 3 w rooms = 2 w floor = 1 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 4

5 Top-k query Overall value of object O is 3*f price (O price ) + 2*f rooms (O rooms ) + 1*f floor (O floor ) In general c(f price (O price ), f rooms (O rooms ), f floor (O floor )) 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 5 Function c has to be monotone!

6 The goal of top-k search to find top-k objects effectively – by processing minimum amount of data restrictions and assumptions – all the data is accessible locally – all attributes are numerical 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 6

7 R-tree-based solution object – a vector of n numbers – a point of n-dimensional space – R-tree, R*-tree, R + -tree, R ++ -tree 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 7

8 From kNN to top-k search k nearest neighbour – known incremental algorithm – distance from “query point Z” is the measure of “closeness” 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 8

9 From kNN to top-k search top-k search – overall value (h) is the measure of “goodness” – by replacing distance with overall value and reversing order we change the result from kNN to top-k 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 9

10 Analogy of kNN and top-k search Correctness Efficiency 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 10 top-k kNN

11 Disproportion of attribute values floor, area, price – very different ranges – solution: normalization – linear transformation of attribute values to interval [0; 1] Another disproportion comes from weights 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 11

12 Normalization applicability Useful for – R*-tree Meaningless for – R-tree (proven for the quadratic split method) – R + -tree, R ++ -tree – Grid file 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 12

13 Why the R ++ -tree Zero overlaps & minimum bounding rectangles may cause a problem when adding new object R + -tree avoids overlaps at the price of rectangles size 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 13

14 The R ++ -tree idea 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 14 Zero overlaps & minimum bounding rectangles may cause a problem when adding new object R ++ -tree keeps two rectangles for each node – the minimum one and the parent covering one

15 The R ++ -tree properties Height-balanced Zero overlaps Overflow nodes at leaf level only Minimum node occupancy is 1 For the top-k search purposes, attribute values can be strings or any other comparable values (not just numbers) 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 15

16 Top-k search over Grid file Grid file is a spatial index for point data We used static Grid file without extra directory 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 16

17 Top-k search over Grid file We have proven correctness and efficiency as well 2013-08-05 Preferential top-k search over local data - dissertation thesis - Martin Šumák 17


Download ppt "Preferential top-k search over local data dissertation thesis RNDr. Martin Šumák supervisor: doc. RNDr. Stanislav Krajči, PhD. consultant: RNDr. Peter."

Similar presentations


Ads by Google