Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Ranking Multimedia Databases via Relevance Feedback with History and Foresight Support / 12 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT AND EXPLORATION.
Introduction Distance-based Adaptable Similarity Search
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Aggregating local image descriptors into compact codes
Component Analysis (Review)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009) Presented by: Lingbo Li ECE, Duke University.
Efficient Density-Based Clustering of Complex Objects Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle University of Munich Institute for Computer.
Graduate : Sheng-Hsuan Wang
A NOVEL LOCAL FEATURE DESCRIPTOR FOR IMAGE MATCHING Heng Yang, Qing Wang ICME 2008.
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa (Nara.
Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh 8 th International Conference on Computer.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Mining Time Series.
Image Similarity and the Earth Mover’s Distance Empirical Evaluation of Dissimilarity Measures for Color and Texture Y. Rubner, J. Puzicha, C. Tomasi and.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
Computer Vision Group, University of BonnVision Laboratory, Stanford University Abstract This paper empirically compares nine image dissimilarity measures.
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Principal Component Analysis
1 Efficient Clustering of High-Dimensional Data Sets Andrew McCallum WhizBang! Labs & CMU Kamal Nigam WhizBang! Labs Lyle Ungar UPenn.
1998/5/21by Chang I-Ning1 ImageRover: A Content-Based Image Browser for the World Wide Web Introduction Approach Image Collection Subsystem Image Query.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Presented by Zeehasham Rasheed
San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.
Dimensionality Reduction
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Content-Based Image Retrieval using the EMD algorithm Igal Ioffe George Leifman Supervisor: Doron Shaked Winter-Spring 2000 Technion - Israel Institute.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Large-Scale Content-Based Image Retrieval Project Presentation CMPT 880: Large Scale Multimedia Systems and Cloud Computing Under supervision of Dr. Mohamed.
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.
Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:
A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.
Efficient Computation of Reverse Skyline Queries VLDB 2007.
Features-based Object Recognition P. Moreels, P. Perona California Institute of Technology.
2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
A feature-based kernel for object classification P. Moreels - J-Y Bouguet Intel.
CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases.
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Using Transportation Distances for Measuring Melodic Similarity Pichaya Tappayuthpijarn Qiang Wang.
Content Based Color Image Retrieval vi Wavelet Transformations Information Retrieval Class Presentation May 2, 2012 Author: Mrs. Y.M. Latha Presenter:
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
A Statistical Approach to Texture Classification Nicholas Chan Heather Dunlop Project Dec. 14, 2005.
2D-LDA: A statistical linear discriminant analysis for image matrix
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC Relevance Feedback for Image Retrieval.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Hybrid Bayesian Linearized Acoustic Inversion Methodology PhD in Petroleum Engineering Fernando Bordignon Introduction Seismic inversion.
Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.
Sparse RecoveryAlgorithmResults  Original signal x = x k + u, where x k has k large coefficients and u is noise.  Acquire measurements Ax = y. If |x|=n,
南台科技大學 資訊工程系 Region partition and feature matching based color recognition of tongue image 指導教授:李育強 報告者 :楊智雁 日期 : 2010/04/19 Pattern Recognition Letters,
Scalability of Local Image Descriptors Björn Þór Jónsson Department of Computer Science Reykjavík University Joint work with: Laurent Amsaleg (IRISA-CNRS)
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
An Image Database Retrieval Scheme Based Upon Multivariate Analysis and Data Mining Presented by C.C. Chang Dept. of Computer Science and Information.
Chinese Academy of Sciences, Beijing, China
The Earth Mover's Distance
Group 9 – Data Mining: Data
Presentation transcript:

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT AND EXPLORATION Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction SIGMOD 08, June 10 th 2008, Vancouver, Canada Marc Wichterich, Ira Assent, Philipp Kranen, Thomas Seidl

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Outline  Introduction  Similarity Search  The Earth Mover’s Distance  Dimensionality Reduction  Dimensionality Reduction for the EMD  Reduction Matrixes  Data-independent Reduction  Data-dependent Reduction  Experimental Results  Conclusion & Outlook 1

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Introduction – Similarity Search  Objective: Find similar objects in database  Applications:  Medical images, edutainment, engineering, etc.  Requires:  Object feature extraction (here: feature histograms)  Similarity measure(here: Earth Mover’s Distance)  Efficient retrieval technique for similar objects 2 similar?

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Introduction – The Earth Mover’s Distance [1]  Transform object features to match those of other object  Minimum “cost x flow” for transformation: EMD 3 [1] Rubner, Tomasi, Perceptual Metrics for Image Database Navigation, Kluwer, histogram x histogram y Flows histogram x histogram y

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Introduction – Dimensionality Reduction  Challenge for Similarity Search: high computational complexity for high dimensionalities  Approach:  Reduce dimensionality of query & DB  Filter DB using lower dimensionality  Refine using orig. dimensionality  Filter quality criteria  Selectivity (few refinements)  No false dismissals(lower bound property) 4 reduce

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Dimensionality Reduction for the EMD  Both the feature vectors and the cost matrix have to be reduced  General linear dimensionality reduction techniques (PCA, ICA, etc.) fail quality criteria for EMD  Discarding dimensions destroys LB property  Splitting dimensions causes poor selectivity  Aggregating dimensionality reductions can work well  Original dimensions are not split up  Each reduced dimension consists of set of orig. dimensions 5 reduce

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Reduction Matrixes  Aggregating dimensionality reductions are characterized by reduction matrix R = [ r ab ]  {0,1} d x d’ with  Example:  Lower-bounding reduced cost matrix C’ = [ c’ a’b’ ] given R  as given by [2]  There is no larger lower bound (see paper)  Main question: Which dimensions to aggregate? 6 R = x = ( ) x' = ( ) = ( 6 9 ) [2] Ljosa, Bhattacharya, Singh, Indexing Spatially Sensitive Distance Measures using Multi-Resolution Lower Bounds, EDBT2006.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Independent Reduction 7  Goal: Tight lower bound (large reduced EMD values)  Large cost between reduced dimensions  Small loss of cost for each reduced dimension  Matches clustering goal: low intra-cluster dissimilarity / high inter-cluster dissimilarity  kMedoid clustering based on the cost matrix C = C' = lost cost information R =

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Dependent Reduction based on flows  Idea: Incorporate knowledge on data for better reduction  In data-independent reduction, only C is used  Problem: Ensuring large c’ a’b’ pointless if f’ a’b’ is small  Now: Also include information on F 8

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Dependent Reduction: Algorithm  Add preprocessing step analyzing the data  Collect information about flows in unreduced EMD  Use information to improve initial / intermediate reduction matrix  iterate until no improvement made 9 no yes

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Dependent Reduction: Preprocessing  Calculate average flow matrix F = [ f ab ] for sample S of DB  Approximate the flows F’ in reduced EMD with F’ = R T F R  Maximize approximate average reduced EMD 10 ~ __ _ R = approximate average reduced flows F' = ~ F = _ average flows approximate average reduced EMD

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Dependent Reduction: Optimization  Global optimization of requires assessment of all possible reduction matrices  Find local optimum via reassignment of dimensions  FB-All:Choose best reassignment in each iteration  FB-Mod:Choose first profitable reassignment in each iteration  Initial reduction matrices  Base: assign all original dimensions to first reduced dimension  KMed: reduction matrix from data-independent reduction 11

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Experimental Results 12  Data-independent vs. data-dependent aggregation sample image [2] data independent (kMedoid) data dependent (FB-All-Mod) costliest flows

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Experimental Results  Efficiency vs. reduced dimensionality (Retina DB) 13

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Experimental Results  Efficiency vs. reduced dimensionality (IRMA DB) 14

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Experimental Results 15  Filter & Refinement times and filter selectivity (IRMA DB)

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Conclusion & Outlook 16  Conclusion  Earth Mover’s Distance as a similarity measure  High quality, but computationally expensive in high dimensions  Dimensionality reduction for the EMD  Data-independent reduction: Clustering in feature space  Data-dependent reduction: Analyze flow information  Outlook  Local reductions  Different reduction for query and DB  Index reduced histograms using [3] [3] Assent, Wichterich, Meisen, Seidl, Efficient Similarity Search Using the Earth Mover's Distance for Large Multimedia Databases, ICDE 2008.