Trees for spatial indexing

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

The Optimal-Location Query
Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj.
On Reinsertions in M-tree Jakub Lokoč Tomáš Skopal Charles University in Prague Department of Software Engineering Czech Republic.
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) II (some slides are based on notes by C. Faloutsos)
Multimedia Database Systems
Indexing and Range Queries in Spatio-Temporal Databases
Searching on Multi-Dimensional Data
Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.
Computational Geometry & Collision detection
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
Spatial Mining.
2-dimensional indexing structure
Efficient Algorithmic Techniques for Several Multidimensional Geometric Data Management and Analysis Problems Mugurel Ionut Andreica Politehnica University.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Bounding Volume Hierarchy “Efficient Distance Computation Between Non-Convex Objects” Sean Quinlan Stanford, 1994 Presented by Mathieu Brédif.
Spatio-Temporal Databases
Spatial Indexing SAMs.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Chapter 3: Data Storage and Access Methods
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Video Trails: Representing and Visualizing Structure in Video Sequences Vikrant Kobla David Doermann Christos Faloutsos.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Trees for spatial indexing
Trees for spatial data representation and searching
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice.
Introduction to The NSP-Tree: A Space-Partitioning Based Indexing Method Gang Qian University of Central Oklahoma November 2006.
A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
PRESENTED BY – GAURANGI TILAK SHASHANK AGARWAL Collision Detection.
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
The Curse of Dimensionality Richard Jang Oct. 29, 2003.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Database Systems Laboratory The Pyramid-Technique: Towards Breaking the Curse of Dimensionality Stefan Berchtold, Christian Bohm, and Hans-Peter Kriegal.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
Multidimensional Access Methods Ho Hoang Nguyen Nguyen Thanh Trong Dao Vu Quoc Trung Ngo Phuoc Huong Thien DATABASE.
High-Dimensional Data. Topics Motivation Similarity Measures Index Structures.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Clustering (1) Clustering Similarity measure Hierarchical clustering
Fast Subsequence Matching in Time-Series Databases.
Spatial Data Management
Mehdi Kargar Department of Computer Science and Engineering
Data Science Algorithms: The Basic Methods
Multidimensional Access Structures
Instance Based Learning
RE-Tree: An Efficient Index Structure for Regular Expressions
Spatial Indexing I Point Access Methods.
Multidimensional Indexes
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Spatial Indexing I R-trees
File Processing : Multi-dimensional Index
Donghui Zhang, Tian Xia Northeastern University
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

Trees for spatial indexing Part 2 : SAMs

SAMs R-Tree R*-Tree X TV

Answering question The Kd-Trie, is similar to kd-tree. In the article it was used for kd-tree. The split-axis isn’t in the middle, but is choosen is the median point. Because, we work with points, we have no problem is separating the elements.

UB-Tree range queries Algorithm is : Find all region who intersects q IF this region is a page, all objects that intersects q is in the answer. After that we search for the last subcube in this region and we search the brother, and if it intersects q we make the same loop on it. After that we look the father of B and search again.

R-Tree Special B+-Tree for spatial indexing. The performance of the R*-Tree is decreasing with the dimensionality. R-tree access method is prohibitively slow for dimensions higher than 5.

Problems of (R-Tree based) Index Structures Because it has been shown that with the increasing of the dimensionality we have also more overlap. Overlap is intuitively when for some point queries, we have multiple paths to search.

Definition of overlap Intuitively, overlap is the pourcentage of the volume that is covered by more than one directory hyperrectangle. This intuitive definition of overlap is directly correlated to the query performance. Because it implies multiple paths.

Definition of the overlap (2) Overlap = ||( Ui,j, i≠j Ri ∩ Rj )|| / ||( Ui Ri )|| We add all the intersection of the MBR in volume and we divide it by the union of all the MBR in volume. But overlap in highly populated areas is much more critical than overlap in low population. WeightedOverlap = |{ p|p Ui,j,i≠j Ri ∩ Rj )}| / |(p|p Ui Ri )|

1 1 Overlap = (¼)/(2) = 1/8 = 12,5 % WeightedOverlap = (2)/(6) = 1/3 = 33 %

Overlap / WeightedOverlap Depending the kind of data the the measurement can be different. If we have uniformed distributed data points, we can use the overlap measure In the case of real data, when can have clustering, so the weightedOverlap is more accurate.

X-Tree Avoid overlap in the directory. X-Tree hybrid of a linear array-like and a hierarchical R-Tree-like directory. In low dimensions the most efficient organization of the directory is hierarchical organization. For high dimensionality a linear organization is more efficient.

X-Tree In the X-Tree we have 3 types of nodes : data nodes,normal directory, and supernodes. The supernodes avoid splits in directory, so it’s more faster to search. Not the same as R*-Tree with larger blocks, because it creates larger blocks only if necessary.

X-Tree Supernode Normal directory Data nodes

Creation of supernodes They are only created if there is no other possibility to avoid overlap during insertion.

TV-Tree (Telescopic-Vector tree) The basis of the tv-tree is to use dynamically contracting and extending feature vectors. ( Like in classification )

TV-Tree A m-contraction of x, is a sequence of Amx where Am is a contraction matrix. A natural Am is ( 1 0 … 0 ) ( 0 1 0 … 0 ) ( …. ) ( 0 …. 0 1)

Multiple shapes We can use for example a sphere, because it’s only a center and a radius r. Represents the set of points with euclidean distance ≤ r. ~the euclidean distance is a special case of the Lp metrics with p=2. For L1 metric (manhattan distance) it defines a diamond shape. The TV-tree is working with any Lp-sphere.

Tv-Tree principle So the TV treats the attributs asymmetrically favoring the first few features over the rest. TV-Tree can use any type of MBR (minimum bounding region), rectangle,cube,sphere etc. TV-Tree can use any Lp-Sphere

TV-Tree node structure Each node is represents the MBR of all it’s descendents ( say an Lp-sphere ). Each region is represented by a center which is a telescopic-vector and a radius. So we talk about TMBR.

TV-1-Tree example

TV-2-Tree example

TMBR Act. Dim : y Act. Dim : x,z Act. Dim : z Act. Dim : x,y

What is the best number of active dimensions ? They find out that the best number of active dimensions was two

TV-Tree conclusion We accept overlap, so also multiple path to search. Branch choosen for new point is done with the following criteria :