Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Similar presentations


Presentation on theme: "Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT."— Presentation transcript:

1 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT AND EXPLORATION Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction SIGMOD 08, June 10 th 2008, Vancouver, Canada Marc Wichterich, Ira Assent, Philipp Kranen, Thomas Seidl

2 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Outline  Introduction  Similarity Search  The Earth Mover’s Distance  Dimensionality Reduction  Dimensionality Reduction for the EMD  Reduction Matrixes  Data-independent Reduction  Data-dependent Reduction  Experimental Results  Conclusion & Outlook 1

3 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Introduction – Similarity Search  Objective: Find similar objects in database  Applications:  Medical images, edutainment, engineering, etc.  Requires:  Object feature extraction (here: feature histograms)  Similarity measure(here: Earth Mover’s Distance)  Efficient retrieval technique for similar objects 2 similar?

4 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Introduction – The Earth Mover’s Distance [1]  Transform object features to match those of other object  Minimum “cost x flow” for transformation: EMD 3 [1] Rubner, Tomasi, Perceptual Metrics for Image Database Navigation, Kluwer, 2001. histogram x histogram y Flows histogram x histogram y

5 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Introduction – Dimensionality Reduction  Challenge for Similarity Search: high computational complexity for high dimensionalities  Approach:  Reduce dimensionality of query & DB  Filter DB using lower dimensionality  Refine using orig. dimensionality  Filter quality criteria  Selectivity (few refinements)  No false dismissals(lower bound property) 4 reduce

6 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Dimensionality Reduction for the EMD  Both the feature vectors and the cost matrix have to be reduced  General linear dimensionality reduction techniques (PCA, ICA, etc.) fail quality criteria for EMD  Discarding dimensions destroys LB property  Splitting dimensions causes poor selectivity  Aggregating dimensionality reductions can work well  Original dimensions are not split up  Each reduced dimension consists of set of orig. dimensions 5 reduce

7 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Reduction Matrixes  Aggregating dimensionality reductions are characterized by reduction matrix R = [ r ab ]  {0,1} d x d’ with  Example:  Lower-bounding reduced cost matrix C’ = [ c’ a’b’ ] given R  as given by [2]  There is no larger lower bound (see paper)  Main question: Which dimensions to aggregate? 6 R = 1 0 0 1 x = ( 2 4 3 6 ) x' = ( 2 4 3 6 ) = ( 6 9 ) 1 0 0 1 [2] Ljosa, Bhattacharya, Singh, Indexing Spatially Sensitive Distance Measures using Multi-Resolution Lower Bounds, EDBT2006.

8 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Independent Reduction 7  Goal: Tight lower bound (large reduced EMD values)  Large cost between reduced dimensions  Small loss of cost for each reduced dimension  Matches clustering goal: low intra-cluster dissimilarity / high inter-cluster dissimilarity  kMedoid clustering based on the cost matrix 0 1 3 4 1 0 2 3 3 2 0 1 4 3 1 0 C = 0 2 2 0 C' = lost cost information R = 1 0 0 1

9 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Dependent Reduction based on flows  Idea: Incorporate knowledge on data for better reduction  In data-independent reduction, only C is used  Problem: Ensuring large c’ a’b’ pointless if f’ a’b’ is small  Now: Also include information on F 8

10 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Dependent Reduction: Algorithm  Add preprocessing step analyzing the data  Collect information about flows in unreduced EMD  Use information to improve initial / intermediate reduction matrix  iterate until no improvement made 9 no yes

11 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Dependent Reduction: Preprocessing  Calculate average flow matrix F = [ f ab ] for sample S of DB  Approximate the flows F’ in reduced EMD with F’ = R T F R  Maximize approximate average reduced EMD 10 ~ __ _ R = 1 0 0 1 approximate average reduced flows 4 8 9 5 F' = ~ 2 1 2 3 0 1 2 1 3 2 3 1 1 3 0 1 F = _ average flows approximate average reduced EMD

12 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Data-Dependent Reduction: Optimization  Global optimization of requires assessment of all possible reduction matrices  Find local optimum via reassignment of dimensions  FB-All:Choose best reassignment in each iteration  FB-Mod:Choose first profitable reassignment in each iteration  Initial reduction matrices  Base: assign all original dimensions to first reduced dimension  KMed: reduction matrix from data-independent reduction 11

13 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Experimental Results 12  Data-independent vs. data-dependent aggregation sample image [2] data independent (kMedoid) data dependent (FB-All-Mod) costliest flows

14 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Experimental Results  Efficiency vs. reduced dimensionality (Retina DB) 13

15 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Experimental Results  Efficiency vs. reduced dimensionality (IRMA DB) 14

16 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Experimental Results 15  Filter & Refinement times and filter selectivity (IRMA DB)

17 Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 Conclusion & Outlook 16  Conclusion  Earth Mover’s Distance as a similarity measure  High quality, but computationally expensive in high dimensions  Dimensionality reduction for the EMD  Data-independent reduction: Clustering in feature space  Data-dependent reduction: Analyze flow information  Outlook  Local reductions  Different reduction for query and DB  Index reduced histograms using [3] [3] Assent, Wichterich, Meisen, Seidl, Efficient Similarity Search Using the Earth Mover's Distance for Large Multimedia Databases, ICDE 2008.


Download ppt "Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT."

Similar presentations


Ads by Google