Download presentation
Presentation is loading. Please wait.
1
Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research - Haifa Presented by: Shai Erera
2
Outline Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions
3
Outline Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions
4
Motivation Web 2.0 enables mass multimedia productions Still, search is limited to manually added metadata State of the art solutions for CBIR (Content Based Image Retrieval) do not scale – Reveal linear scalability in the collection size due to large number of distance computations Can we use textIR methods to scale up CBIR?
5
Outline Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions
6
Problem definition Low level image features can be generalized to Metric Spaces Metric Space: An ordered pair (S,d), where S is a domain and d a distance function d: S x S R such that – d satisfies non-negativity, reflexibility, symmetry and triangle inequality The best-k results for a query in a metric space are the k objects with the smallest distance to the query – Convert distances to scores (small distance – high score) between [0,1]
7
Problem definition Top-K Problem: – Assume m metric spaces, a Query Q, an aggregate function f and a score function sd(): – Retrieve the best k objects D with highest f(sd 1 (Q,D), sd 2 (Q,D)…sd m (Q,D)) q k=5
8
Outline Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions
9
Metric Inverted Index Assume a collection of objects each having m features – Object D = {F 1 :v 1, F 2 :v 2,…, F m :v m } – m metric spaces Indexing steps – Lexicon creation (select candidates) – Invert objects (canonization to lexicon terms)
10
Metric inverted indexing – Lexicon creation Number of different features too large Need to select candidates – Naïve solution: Lexicon of fixed size l Select randomly l/m documents and extract their features These l features form our lexicon – Improvement Replace the random choice by clustering (K-Means etc.) Keep the lexicon in an M-Tree structure
11
Metric inverted indexing – invert objects Given object D = {F 1 :v 1, F 2 :v 2,…, F m :v m } Canonization – map features (F i :v i ) to lexicon entries – For each feature select the n nearest lexicon terms – D’ = {F 1 :v 11, F 1 :v 12, …F 1 :v 1n, F 2 :v 21, F 2 :v 22, …F 2 :v 2n, … F m :v m1, F m :v m2, …F m :v mn } Index D’ in the relevant posting-lists
12
Outline Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions
13
Retrieval stage – term selection Given Q = {F 1 :qv 1, F 2 :qv 2,…, F m :qv m } Canonization – For each feature select the n nearest lexicon terms – Q’ = {F 1 :qv 11, F 1 :qv 12, …F 1 :qv 1n, F 2 :qv 21, F 2 :qv 22, …F 2 :qv 2n, … F m :qv m1, F m :qv m2, …F m :qv mn }
14
Retrieval stage – Boolean Filtering These m*n posting-lists will be queried via a Boolean Query Two possible modes: – Strict-query-mode: – Fuzzy-query-mode:
15
Retrieval stage – Scoring Documents retrieved by the Boolean Query are fully scored Return the best k objects with the highest aggregate score f(sd_1(Q,D),sd_2(Q,D),…,sd_m(Q,D))
16
Outline Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions
17
Experiments Focus on: – Efficiency – Effectiveness Collection of 160,000 images from Flickr 3 features are extracted from each image – EdgeHistogram, ScalableColor and ColorLayout 180 queries – Fuzzy-Query-Mode – Sampled from the collection of images Compared to M-tree data-structure
18
Experiments – Measures Used Effectiveness: MAP is a natural candidate for measuring – Problem: In Image Retrieval, no document is irrelevant – Solution: we defined as relevant the k highest scored documents in the collection (according to the M-Tree computation) – MAP@K: MAP computed on relevant and retrieved lists of size k
19
Experiments – Measures Used contd. Efficiency: we compute the number of computations per query – A computation unit (cu) is a distance computation call between two feature values
20
Effectiveness MAP vs. number of Nearest Terms size of the lexicon = 12000
21
Effectiveness MAP vs. lexicon size Number Nearest Terms =30
22
Effectiveness vs. Efficiency MAP vs. number of comparisons Number Nearest Terms =30
23
M-Tree vs. Metric Inverted Number of comparisons vs. top-k Number Nearest Terms =30
24
Outline Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions
25
We reduce the gap between Text IR and Multimedia Retrieval Our method achieves very good approximation (MAP = 98%) Our method improves drastically the efficiency (90%) over state-of-the-art methods
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.