Download presentation
Presentation is loading. Please wait.
Published byMaximilian Summers Modified over 9 years ago
1
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT AND EXPLORATION Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction SIGMOD 08, June 10 th 2008, Vancouver, Canada Marc Wichterich, Ira Assent, Philipp Kranen, Thomas Seidl
2
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Outline Introduction Similarity Search The Earth Mover’s Distance Dimensionality Reduction Dimensionality Reduction for the EMD Reduction Matrixes Data-independent Reduction Data-dependent Reduction Experimental Results Conclusion & Outlook 1
3
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Introduction – Similarity Search Objective: Find similar objects in database Applications: Medical images, edutainment, engineering, etc. Requires: Object feature extraction (here: feature histograms) Similarity measure(here: Earth Mover’s Distance) Efficient retrieval technique for similar objects 2 similar?
4
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Introduction – The Earth Mover’s Distance [1] Transform object features to match those of other object Minimum “cost x flow” for transformation: EMD 3 [1] Rubner, Tomasi, Perceptual Metrics for Image Database Navigation, Kluwer, 2001. histogram x histogram y Flows histogram x histogram y
5
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Introduction – Dimensionality Reduction Challenge for Similarity Search: high computational complexity for high dimensionalities Approach: Reduce dimensionality of query & DB Filter DB using lower dimensionality Refine using orig. dimensionality Filter quality criteria Selectivity (few refinements) No false dismissals(lower bound property) 4 reduce
6
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Dimensionality Reduction for the EMD Both the feature vectors and the cost matrix have to be reduced General linear dimensionality reduction techniques (PCA, ICA, etc.) fail quality criteria for EMD Discarding dimensions destroys LB property Splitting dimensions causes poor selectivity Aggregating dimensionality reductions can work well Original dimensions are not split up Each reduced dimension consists of set of orig. dimensions 5 reduce
7
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Reduction Matrixes Aggregating dimensionality reductions are characterized by reduction matrix R = [ r ab ] {0,1} d x d’ with Example: Lower-bounding reduced cost matrix C’ = [ c’ a’b’ ] given R as given by [2] There is no larger lower bound (see paper) Main question: Which dimensions to aggregate? 6 R = 1 0 0 1 x = ( 2 4 3 6 ) x' = ( 2 4 3 6 ) = ( 6 9 ) 1 0 0 1 [2] Ljosa, Bhattacharya, Singh, Indexing Spatially Sensitive Distance Measures using Multi-Resolution Lower Bounds, EDBT2006.
8
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Independent Reduction 7 Goal: Tight lower bound (large reduced EMD values) Large cost between reduced dimensions Small loss of cost for each reduced dimension Matches clustering goal: low intra-cluster dissimilarity / high inter-cluster dissimilarity kMedoid clustering based on the cost matrix 0 1 3 4 1 0 2 3 3 2 0 1 4 3 1 0 C = 0 2 2 0 C' = lost cost information R = 1 0 0 1
9
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Dependent Reduction based on flows Idea: Incorporate knowledge on data for better reduction In data-independent reduction, only C is used Problem: Ensuring large c’ a’b’ pointless if f’ a’b’ is small Now: Also include information on F 8
10
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Dependent Reduction: Algorithm Add preprocessing step analyzing the data Collect information about flows in unreduced EMD Use information to improve initial / intermediate reduction matrix iterate until no improvement made 9 no yes
11
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Dependent Reduction: Preprocessing Calculate average flow matrix F = [ f ab ] for sample S of DB Approximate the flows F’ in reduced EMD with F’ = R T F R Maximize approximate average reduced EMD 10 ~ __ _ R = 1 0 0 1 approximate average reduced flows 4 8 9 5 F' = ~ 2 1 2 3 0 1 2 1 3 2 3 1 1 3 0 1 F = _ average flows approximate average reduced EMD
12
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Dependent Reduction: Optimization Global optimization of requires assessment of all possible reduction matrices Find local optimum via reassignment of dimensions FB-All:Choose best reassignment in each iteration FB-Mod:Choose first profitable reassignment in each iteration Initial reduction matrices Base: assign all original dimensions to first reduced dimension KMed: reduction matrix from data-independent reduction 11
13
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Experimental Results 12 Data-independent vs. data-dependent aggregation sample image [2] data independent (kMedoid) data dependent (FB-All-Mod) costliest flows
14
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Experimental Results Efficiency vs. reduced dimensionality (Retina DB) 13
15
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Experimental Results Efficiency vs. reduced dimensionality (IRMA DB) 14
16
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Experimental Results 15 Filter & Refinement times and filter selectivity (IRMA DB)
17
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Conclusion & Outlook 16 Conclusion Earth Mover’s Distance as a similarity measure High quality, but computationally expensive in high dimensions Dimensionality reduction for the EMD Data-independent reduction: Clustering in feature space Data-dependent reduction: Analyze flow information Outlook Local reductions Different reduction for query and DB Index reduced histograms using [3] [3] Assent, Wichterich, Meisen, Seidl, Efficient Similarity Search Using the Earth Mover's Distance for Large Multimedia Databases, ICDE 2008.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.