Preferential top-k search over local data dissertation thesis RNDr. Martin Šumák supervisor: doc. RNDr. Stanislav Krajči, PhD. consultant: RNDr. Peter.

Slides:



Advertisements
Similar presentations
Trees for spatial indexing
Advertisements

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone.
        iDistance -- Indexing the Distance An Efficient Approach to KNN Indexing C. Yu, B. C. Ooi, K.-L. Tan, H.V. Jagadish. Indexing the distance:
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Multidimensional Indexing
1 Storage of images for Efficient Retrieval  Representing IDB as relations  straightforward  Representing IDB with spatial data structures  represent.
Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.
Introduction to Spatial Database System Presented by Xiaozhi Yu.
Spatial Mining.
Indexing Network Voronoi Diagrams*
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
B+-tree and Hashing.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatial Indexing SAMs.
Indexing Positions of Moving Objects Using B + -trees 4-th WIM meeting, Aalborg 2002 Laurynas Speičys
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Chapter 3: Data Storage and Access Methods
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice.
Introduction to The NSP-Tree: A Space-Partitioning Based Indexing Method Gang Qian University of Central Oklahoma November 2006.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
1 The MV3R-Tree: A Spatio- Temporal Access Method for Timestamp and Interval Queries Yufei Tao and Dimitris Papadias Hong Kong University of Science and.
Adapted from Mike Franklin
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Answering Similar Region Search Queries Chang Sheng, Yu Zheng.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Spatial Database 2/5/2011 Reference – Ramakrishna Gerhke and Silbershatz.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
A New Spatial Index Structure for Efficient Query Processing in Location Based Services Speaker: Yihao Jhang Adviser: Yuling Hsueh 2010 IEEE International.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Mehdi Kargar Department of Computer Science and Engineering
Multidimensional Access Structures
Tree-Structured Indexes
Spatial Indexing I Point Access Methods.
Spatio-Temporal Databases
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Adapted from Mike Franklin
Database Design and Programming
File Processing : Multi-dimensional Index
Multidimensional Search Structures
Presentation transcript:

Preferential top-k search over local data dissertation thesis RNDr. Martin Šumák supervisor: doc. RNDr. Stanislav Krajči, PhD. consultant: RNDr. Peter Gurský, PhD.

Outline Top-k search – motivation and example – restrictions and assumptions R-tree-based solution – normalization of data – R ++ -tree Grid file-based solution Experiments – Comparison with B + -trees-based solution, table scan, etc Preferential top-k search over local data, Dissertation thesis, RNDr. Martin Šumák2

Top-k search Example – find top 20 apartments with 3 or 4 rooms, not at first floor, with price about not exceeding euro – moreover, price is the most important attribute and floor is the least important attribute Preferential top-k search over local data, Dissertation thesis, RNDr. Martin Šumák

Top-k query k = 20 preferences to attribute’s values – fuzzy functions importance of attributes – weights w price = 3 w rooms = 2 w floor = Preferential top-k search over local data - dissertation thesis - Martin Šumák 4

Top-k query Overall value of object O is 3*f price (O price ) + 2*f rooms (O rooms ) + 1*f floor (O floor ) In general c(f price (O price ), f rooms (O rooms ), f floor (O floor )) Preferential top-k search over local data - dissertation thesis - Martin Šumák 5 Function c has to be monotone!

The goal of top-k search to find top-k objects effectively – by processing minimum amount of data restrictions and assumptions – all the data is accessible locally – all attributes are numerical Preferential top-k search over local data - dissertation thesis - Martin Šumák 6

R-tree-based solution object – a vector of n numbers – a point of n-dimensional space – R-tree, R*-tree, R + -tree, R ++ -tree Preferential top-k search over local data - dissertation thesis - Martin Šumák 7

From kNN to top-k search k nearest neighbour – known incremental algorithm – distance from “query point Z” is the measure of “closeness” Preferential top-k search over local data - dissertation thesis - Martin Šumák 8

From kNN to top-k search top-k search – overall value (h) is the measure of “goodness” – by replacing distance with overall value and reversing order we change the result from kNN to top-k Preferential top-k search over local data - dissertation thesis - Martin Šumák 9

Analogy of kNN and top-k search Correctness Efficiency Preferential top-k search over local data - dissertation thesis - Martin Šumák 10 top-k kNN

Disproportion of attribute values floor, area, price – very different ranges – solution: normalization – linear transformation of attribute values to interval [0; 1] Another disproportion comes from weights Preferential top-k search over local data - dissertation thesis - Martin Šumák 11

Normalization applicability Useful for – R*-tree Meaningless for – R-tree (proven for the quadratic split method) – R + -tree, R ++ -tree – Grid file Preferential top-k search over local data - dissertation thesis - Martin Šumák 12

Why the R ++ -tree Zero overlaps & minimum bounding rectangles may cause a problem when adding new object R + -tree avoids overlaps at the price of rectangles size Preferential top-k search over local data - dissertation thesis - Martin Šumák 13

The R ++ -tree idea Preferential top-k search over local data - dissertation thesis - Martin Šumák 14 Zero overlaps & minimum bounding rectangles may cause a problem when adding new object R ++ -tree keeps two rectangles for each node – the minimum one and the parent covering one

The R ++ -tree properties Height-balanced Zero overlaps Overflow nodes at leaf level only Minimum node occupancy is 1 For the top-k search purposes, attribute values can be strings or any other comparable values (not just numbers) Preferential top-k search over local data - dissertation thesis - Martin Šumák 15

Top-k search over Grid file Grid file is a spatial index for point data We used static Grid file without extra directory Preferential top-k search over local data - dissertation thesis - Martin Šumák 16

Top-k search over Grid file We have proven correctness and efficiency as well Preferential top-k search over local data - dissertation thesis - Martin Šumák 17