Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa (Nara.

Slides:



Advertisements
Similar presentations
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Advertisements

Aggregating local image descriptors into compact codes
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Similarity Search on Bregman Divergence, Towards Non- Metric Indexing Zhenjie Zhang, Beng Chi Ooi, Srinivasan Parthasarathy, Anthony K. H. Tung.
Searching on Multi-Dimensional Data
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
Indexing Network Voronoi Diagrams*
Optimization of ICP Using K-D Tree
2-dimensional indexing structure
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Multimedia DBs.
Liang Jin (UC Irvine) Nick Koudas (AT&T) Chen Li (UC Irvine)
Dimensionality Reduction
High-Dimensional Similarity Search using Data-Sensitive Space Partitioning ┼ Sachin Kulkarni 1 and Ratko Orlandic 2 1 Illinois Institute of Technology,
Face Recognition Jeremy Wyatt.
Multimedia DBs. Time Series Data
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Efficient Nearest-Neighbor Search in Large Sets of Protein Conformations Fabian Schwarzer Itay Lotan.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
Dimensionality Reduction
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Color Transfer in Correlated Color Space Xuezhong Xiao, Computer Science & Engineering Department, Shanghai Jiao Tong University Lizhuang Ma., Computer.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Summarized by Soo-Jin Kim
Chapter 2 Dimensionality Reduction. Linear Methods
Multimedia and Time-series Data
Wavelet-Based Multiresolution Matching for Content-Based Image Retrieval Presented by Tienwei Tsai Department of Computer Science and Engineering Tatung.
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
Additive Data Perturbation: data reconstruction attacks.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.
PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.
Introduction to The NSP-Tree: A Space-Partitioning Based Indexing Method Gang Qian University of Central Oklahoma November 2006.
A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List.
Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
The Curse of Dimensionality Richard Jang Oct. 29, 2003.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Exact indexing of Dynamic Time Warping
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
MindReader: Querying databases through multiple examples Yoshiharu Ishikawa (Nara Institute of Science and Technology, Japan) Ravishankar Subramanya (Pittsburgh.
Continual Neighborhood Tracking for Moving Objects Yoshiharu Ishikawa Hiroyuki Kitagawa Tooru Kawashima University of Tsukuba, Japan
Chapter 13 (Prototype Methods and Nearest-Neighbors )
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Spatial Range Querying for Gaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi Iijima Nagoya University Jeffrey Xu Yu The Chinese University.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
An Image Database Retrieval Scheme Based Upon Multivariate Analysis and Data Mining Presented by C.C. Chang Dept. of Computer Science and Information.
Research in Computational Molecular Biology , Vol (2008)
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa (Nara Institute of Science and Technology) Ryoji Kataoka (NTT Cyber Space Laboratories) Shunsuke Uemura (Nara Institute of Science and Technology)

Outline n Introduction n STT (spatial transformation technique) –Definition of spatial transformation –Spatial transformation of rectangles –Search algorithm n MSTT (multiple STT) –Index structure construction –Query processing –Dissimilarity of matrices n Performance test n Conclusion

n Ellipsoid query –Search processing is performed by using quadratic form distance functions –Distance of p and q for a query matrix M : –represents correlations between dimensions Introduction Euclidean circles for isosurfaces weighted Euclidean iso-oriented ellipsoids quadratic form Ellipsoids (Not necessarily aligned to the coordinate axis)

n An application of a quadratic form distance function –represent the similarity between colors i and j Introduction color histograms Euclidean distance p q Dim d color histograms Quadratic form distance p q

n Spatial indices –e.g. R-tree family (R*-tree, X-tree, SR-tree, A-tree) –Based on the Euclidean distance function Cannot be applied to ellipsoid queries n Efficient search methods for user-adaptive ellipsoid queries –Query matrix M is variable Introduction

n Search method based on the steepest descent method –Works on spatial indices of R-tree family –Calculates the exact distance of a query point and an MBR in an index structure –…but requires high CPU cost which exceeds disk access cost R1 q p M p’ CPU time O(  d 2 )  …number of iterations d…dimensionality Moves p’ toward p iteratively Related Work : Seidl and Kriegel, VLDB97

n Technique that uses the MBB and MBS distance functions to reduce CPU time –MBB and MBS distance functions Related Work : Ankerst et al., VLDB98 q M q M MBB(M)MBS(M)

n Approximation technique by using the MBB and MBS distance functions –approximation distance : uses either MBB or MBS distance for better approximation quality –Calculates the exact distances only if data objects or MBRs cannot be filtered by their approximation distances –Saves CPU time by reducing the number of exact distance calculations –…but cannot reduce the number of exact distance calculations if its approximation quality is low Related Work : Ankerst et al., VLDB98

Our Contributions n STT (Spatial Transformation Technique) –Ellipsoid queries incur a high CPU cost –The efficiency depends on approximation quality –STT efficiently processes ellipsoid queries because of high approximation quality n MSTT (Multiple Spatial Transformation Technique) –Does not use only the Euclidean distance function to make index structures –Ellipsoid queries give various distance functions –In MSTT, various index structures are created; the search algorithm utilizes a structure well suited to a query matrix

Outline n Introduction n STT (spatial transformation technique) –Definition of spatial transformation –Spatial transformation of rectangles –Search algorithm n MSTT (multiple STT) –Index structure construction –Query processing –Dissimilarity of matrices n Performance test n Conclusion

n High approximation quality –STT consumes less CPU time n Spatial transformation –MBRs in a quadratic form distance space are transformed into rectangles in the Euclidean distance space Spatial Transformation Technique (STT) q (2, 2) P O P’ R S S’S’

n Definition of spatial transformation –p : a point in the quadratic form distance space S –p’ : a point in the Euclidean distance space S ’ –The distance between q and p in S is equal to the distance between p’ and O in S ’ –Spatial transformation of p into p’ Spatial Transformation q (2, 2) p (4, 2) p’ (-2, 1) O S S’S’

n Definition of spatial transformation –d M 2 (p, q) : the distance of p and q in S –E M : the eigenvector of M,  M : the eigenvalues of M –Spatial transformation of p into p’ Spatial Transformation

1. P in S is transformed into P’ in S ’ The calculation of distance between the origin and polygons in high-dimensional spaces incurs a high CPU cost 2. P’ is approximated by R 3. d 2 (R, O) is used instead of d 2 M (P, q) Approximation Rectangles q (2, 2) papa P pcpc pbpb pdpd pc’pc’ O P’ pb’pb’ pd’pd’ pa’pa’ R rara rbrb S S’S’ low CPU cost

1. Calculates p a : lower endpoint of the major diagonal of P 2. Creates the two matrices from the components a ij of A M 3.Calculates the approximation rectangle R of P’ l i : the edge length of P for the i -th dimension 4. R can be used for search since R totally contains P’, that is Approximation Rectangles

Search Algorithm qp S 1. Calculates the transformation matrix of M 2. Searches for similarity objects by using an index [ Data nodes ] –Calculates d MBB-MBS(M) (p, q)

Search Algorithm S qp 1. Calculates the transformation matrix of M 2. Searches for similarity objects by using an index [ Data nodes ] –Calculates d MBB-MBS(M) (p, q)

Search Algorithm S 1. Calculates the transformation matrix of M 2. Searches for similarity objects by using an index [ Data nodes ] –Calculates d MBB-MBS(M) (p, q) –Calculates d M (P, q) if d MBB-MBS(M) (p, q) d (M) (k-NN, q) qp

Search Algorithm S P q 1. Calculates the transformation matrix of M 2. Searches for similarity objects by using an index [ Directory nodes ] –Calculates d MBB-MBS(M) (P, q)

Search Algorithm S P q 1. Calculates the transformation matrix of M 2. Searches for similarity objects by using an index [ Directory nodes ] –Calculates d MBB-MBS(M) (P, q)

Search Algorithm 1. Calculates the transformation matrix of M 2. Searches for similarity objects by using an index [ Directory nodes ] –Calculates d MBB-MBS(M) (P, q) –Calculates d(R, O) if d MBB-MBS(M) (P, q) d (M) (k-NN, q) O P’ R S’S’

Search Algorithm 1. Calculates the transformation matrix of M 2. Searches for similarity objects by using an index [ Directory nodes ] –Calculates d MBB-MBS(M) (P, q) –Calculates d(R, O) if d MBB-MBS(M) (P, q) d (M) (k-NN, q) –Calculates d M (P, q) if d(R, O) d (M) (k-NN, q) S q P

Outline n Introduction n STT (spatial transformation technique) –Definition of spatial transformation –Spatial transformation of rectangles –Search algorithm n MSTT (multiple STT) –Index structure construction –Query processing –Dissimilarity of matrices n Performance test n Conclusion

n Node access problem –If a query matrix is NOT similar to the unit matrix, it causes a large number of node accesses –Index structures are constructed by the Euclidean distance function n Constructs various index structures by using quadratic form distance functions –Chooses a structure that gives sufficient search performance in query processing –Reduces both CPU time and number of page accesses for ellipsoid queries Multiple Spatial Transformation Technique (MSTT)

n Similarity of matrices –High search performance can be expected when the query matrix and the matrix of selected index are similar. Basic Idea Indices based on X i Matrices X i X1X1 XjXj XX

n Similarity of matrices –High search performance can be expected when the query matrix and the matrix of selected index are similar. Basic Idea Indices based on X i Matrices X i X1X1 X similar XX query (q, M) M

n Similarity of matrices –High search performance can be expected when the query matrix and the matrix of selected index are similar. Basic Idea X similar query (q, M) M’ M

n Index structure construction – C : the matrix for constructing the index I C –Transformation matrix –All data points in a data set are transformed – I C is constructed using transformed data points Indexing and Retrieval Mechanism

n Query processing 1. Calculates the transformed query point 2. Calculates the query matrix 3. Performs search processing by using I C, M’, q’ The query of M can be processed by using I C

n Flatness of a query matrix –The variance   M of the eigenvalues of M is called the flatness of M : : the i -th dimensional eigenvalue : the average of the eigenvalues of M –The flatness of the unit matrix is 0 (search of the Euclidean space). Similarity of Matrices

Dissimilarity of M and I C –MSTT employs   M’ as the measure of the dissimilarity between M and I C –   M’ : the flatness of M’ –The effectiveness of I c relative to M improves as   M’ decreases Similarity of Matrices

Outline n Introduction n STT (spatial transformation technique) –Definition of spatial transformation –Spatial transformation of rectangles –Search algorithm n MSTT (multiple STT) –Index structure construction –Query processing –Dissimilarity of matrices n Performance test n Conclusion

Performance Test n Data sets: real data set (rgb histogram of images) n Data size: 100,000 n Dimensionality: 8 and 27 n Page size: 8 KB n 20-nearest neighbor queries n Evaluation is based on the average for 100 query points n Index structure : A-tree (Sakurai et al., VLDB2000) n CPU: SUN UltraSPARC-II 450MHz

Performance Test n Query matrices for experiments –[HSE + 95] : the components of M  : positive constant, d w (c i,c j ) : the weighted Euclidean distance between the color c i and c j, w=(w r, w g, w b ) : the weightings of the red, green and blue components in RGB color space –  =10, w g =w b =1 – w r was varied from 1 to 1,000 –The flatness of M increases as w r becomes large

Performance of STT n Comparison of STT and MBB-MBS (8D) –Both methods require the same number of page accesses since they utilize exact distance functions –Low CPU cost : STT increases approximation quality, and reduces the number of exact calculations –The effectiveness of STT increases with matrix flatness CPU time (d = 8)Number of page accesses (d = 8)

Performance of STT CPU time (d = 27)Number of page accesses (d = 27) n Comparison of STT and MBB-MBS (27D) –The effectiveness of STT increases as either dimensionality or matrix flatness grows –STT achieves a 74% reduction in CPU cost for high dimensionality and matrix flatness

Performance of MSTT n Three structures –structure constructed by the unit matrix (Unit) –structure constructed by the matrix w r =10 –structure constructed by the matrix w r =1000 n Performance of MSTT –Dissimilarity : the cost of search using a structure chosen by the dissimilarity function –Dissimilarity is not optimal, but provides good performance CPU time (d = 8)Number of page accesses (d = 8)

n Search methods for user-adaptive ellipsoid queries n STT (Spatial Transformation Technique) –Spatial transformation : MBRs in the quadratic form distance space are transformed into rectangles in the Euclidean distance space –STT performs ellipsoid queries efficiently even when dimensionality or matrix flatness is high n MSTT (Multiple Spatial Transformation Technique) –MSTT creates various index structures; the search algorithm utilizes a structure well suited to a query matrix –MSTT reduces both CPU time and the number of page accesses Conclusions

Dimensionality Reduction n Eigenvalues of a query matrix –Dimensions corresponding to small eigenvalues contribute less to approximation quality –These dimensions are eliminated to save on CPU costs –Calculation time for the spatial transformation of rectangles is reduced to n/d n : the number of dimensions used The effect of D.R. grows as matrix flatness increases

Performance of STT (2) n Percentage of filtered exact distance calculations –The efficiency of MBB-MBS decreases as matrix flatness grows –STT effectively filters exact distance calculations for all queries Rate of filtered exact calculations d = 8d = 27

Performance of MSTT CPU time (d = 27)Number of page accesses (d = 27) n Low search cost –Compared with the structure by the Euclidean distance function, MSTT reduces both CPU time and the number of page accesses –MSTT constructs various structures –Dissimilarity function chooses structures well suited to the query matrix.