On Reinsertions in M-tree Jakub Lokoč Tomáš Skopal Charles University in Prague Department of Software Engineering Czech Republic.

Slides:



Advertisements
Similar presentations
2.6. B OUNDING V OLUME H IERARCHIES Overview of different forms of bounding volume hierarchy.
Advertisements

Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj.
Trees for spatial indexing
Clustered Pivot Tables for I/O-optimized Similarity Search Juraj Moško, Jakub Lokoč, Tomáš Skopal Department of Software Engineering Faculty of Mathematics.
Jan SedmidubskyOctober 28, 2011Scalability and Robustness in a Self-organizing Retrieval System Jan Sedmidubsky Vlastislav Dohnal Pavel Zezula On Investigating.
Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
The Palm-tree Index Indexing with the crowd Ahmed R Mahmood*Walid G. Aref* Eduard Dragut*Saleh Basalamah** *Purdue University**Umm AlQura University.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Searching on Multi-Dimensional Data
Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.
Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB-Technical.
ADBIS 2003 Revisiting M-tree Building Principles Tomáš Skopal 1, Jaroslav Pokorný 2, Michal Krátký 1, Václav Snášel 1 1 Department of Computer Science.
Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague.
SASH Spatial Approximation Sample Hierarchy
2-dimensional indexing structure
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Bounding Volume Hierarchy “Efficient Distance Computation Between Non-Convex Objects” Sean Quinlan Stanford, 1994 Presented by Mathieu Brédif.
Accessing Spatial Data
Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula.
A New Point Access Method based on Wavelet Trees Nieves R. Brisaboa, Miguel R. Luaces, Diego Seco Database Laboratory University of A Coruña A Coruña,
Chapter 3: Data Storage and Access Methods
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Birch: An efficient data clustering method for very large databases
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Metric based KNN indexing Lecturer:Prof Ooi Beng Chin Presenters:Frankie ChanHT Y Tan ZhenqiangHT J.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
M- tree: an efficient access method for similarity search in metric spaces Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
Starting at Binary Trees
NM-Tree: Flexible Approximate Similarity Search in Metric and Non-metric Spaces Tomáš Skopal Jakub Lokoč Charles University in Prague Department of Software.
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies A hierarchical clustering method. It introduces two concepts : Clustering feature Clustering.
DDPIn Distance and Density Based Protein Indexing David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.
Tomáš Skopal 1, Benjamin Bustos 2 1 Charles University in Prague, Czech Republic 2 University of Chile, Santiago, Chile On Index-free Similarity Search.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
DASFAA 2005, Beijing 1 Nearest Neighbours Search using the PM-tree Tomáš Skopal 1 Jaroslav Pokorný 1 Václav Snášel 2 1 Charles University in Prague Department.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
High-Dimensional Data. Topics Motivation Similarity Measures Index Structures.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Mehdi Kargar Department of Computer Science and Engineering
Spatio-Temporal Databases
Distance Computation “Efficient Distance Computation Between Non-Convex Objects” Sean Quinlan Stanford, 1994 Presentation by Julie Letchner.
RE-Tree: An Efficient Index Structure for Regular Expressions
BIRCH: An Efficient Data Clustering Method for Very Large Databases
Spatio-Temporal Databases
CS222/CS122C: Principles of Data Management Notes #07 B+ Trees
15-826: Multimedia Databases and Data Mining
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Donghui Zhang, Tian Xia Northeastern University
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #06 B+ trees Instructor: Chen Li.
CS222P: Principles of Data Management UCI, Fall Notes #06 B+ trees
Presentation transcript:

On Reinsertions in M-tree Jakub Lokoč Tomáš Skopal Charles University in Prague Department of Software Engineering Czech Republic

Presentation Outline M-tree  the original structure Forced reinserting (in M-tree)  motivation  algorithm outline Experimental Results

(euclidean 2D space) M-tree (metric tree)  dynamic, balanced, and paged tree structure (like e.g. B + -tree, R-tree)  the leaves are clusters of indexed objects O j (ground objects)  routing entries in the inner nodes represent hyper-spherical metric regions (O i, r Oi ), recursively bounding the object clusters in leaves  the triangle inequality allows to discard irrelevant M-tree branches (metric regions resp.) during query evaluation range query Q

the compactness of metric regions’ hierarchy in M-tree heavily depends on the order of new objects’ insertions  newly created regions may be more suitable for previously inserted objects (but these reside in the old ones)  unnecessarily big “volumes” and overlaps between regions  higher probability of intersection with query region  less efficient search reduction of metric region “volume” should lead to more effective discarding of irrelevant subtrees how to rearrange objects to get a more compact M-tree hierarchy? Motivation

Reinsertions in general Batch construction/rearrangements  bulk loading algorithms static  post-processing, like slim-down algorithm very expensive Dynamic insertion  non-deterministic (sublinear) leaf determination looking for the best leaf  deterministic (logarithmic) leaf determination looking for a suboptimal leaf, only one path in the M-tree is traversed Our goal  to perform local rearrangements/hierarchy optimization during dynamic insertion  keeping the costs low i.e., sublinear in case of non-deterministic leaf determination and logarithmic in the deterministic case  the way: forced reinsertions redistribution of some objects in a leaf that is about to split (avoiding the split)

Forced reinsertions in M-tree Modified splitting of an M-tree leaf: 1. Remove the most distant objects (4 strategies) (i.e., remove objects close to the region’s border, reducing the radius) 2. Save them temporarily in a global memory stack. 3. Insert objects from the stack to M-tree (one by one). (regular dynamic insertion, possibly leading to other split attempts) 4. If new split appears, repeat the process. 5. When reached a user-defined limit of reinsertions (recursion depth), insert the rest objects in the stack in a usual way (w/o reinsertions).

O9O9 Reinserting example O2O2 O8O8 O 10 O5O5 O1O1 O4O4 O3O3 O6O6 O 11 Insert new object O 11 Remove O 8, O 6 and insert them into the stack Decrease region’s radius (to O 11 ) Insert O 6 from the stack Remove O 2 and insert in the stack Decrease region’s radius (to O 6 ) Insert O 2 from the stack Insert O 8 from the stack O7O7 O2O2 O1O1 O5O5 O9O9 O1O1 O3O3 O4O4 O5O5 O7O7 O8O8 O6O6 O9O9 O 10 STACK O 11

Removing strategies (moving objects to the stack) When reinserting, the k most distant objects in leaf are removed (and pushed to the stack). We distinguish 4 strategies of removing: (a) Pessimistic - removing in descending order from the most distant object - the removing early stops if the new (last inserted) object is reached (b) Optimistic - removing in descending order from the most distant object stack (top) (c) Reverse Pessimistic - removing in ascending order from the (at most) k-th most distant object - if the new object is within the k most distant, the removing consideres just the further ones (d) Reverse Optimistic - removing in ascending order from the k-th most distant object

Open questions How many entries remove from the node? How to select the recursion depth? Generally – greater recursion depth and/or the number of removed entries = better query costs, but higher construction costs (while the querying is improved much less than the construction is more expensive). Empirically, we set the number of removed entries to k=5 and the recursion depth to 10, which gives the best construction vs. query costs trade-off.

Experimental results 2 datasets  Corel features 68, dimensional vectors (color histograms) L2 distance  Polygons (synthetic) 250,000 2D polygons, each ranging from 10 to 15 vertices Hausdorff distance Several M-tree building methods  CLASSIC – deterministic with O(m^2) splitting  SAMPLING – deterministic with O(km) splitting  MW – non-deterministic with O(m^2) splitting  GSD – generalized slimdown algorithm (post-processing after CLASSIC)

Experimental results

Thank for your attention! References: [1] Paolo Ciaccia, Marco Patella, Pavel Zezula: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces VLDB 1997 [2] Tomas Skopal, Jaroslav Pokorný, Michal Krátký, Vaclav Snášel: Revisiting M-tree Building Principles ADBIS 2003 [3] Caetano Traina Jr., Agma Traina, Bernhard Seeger, Christos Faloutsos: Slim-trees: High Performance Metric Trees Minimizing Overlap Between NodesMetric EDBT 2000