Download presentation
Presentation is loading. Please wait.
Published byAlexander Powers Modified over 9 years ago
1
Inference Network Approach to Image Retrieval Don Metzler R. Manmatha Center for Intelligent Information Retrieval University of Massachusetts, Amherst
2
Motivation Most image retrieval systems assume: Implicit “AND” between query terms Equal weight to all query terms Query made up of single representation (keywords or image) “tiger grass” => “find images of tigers AND grass where each is equally important” How can we search with queries made up of both keywords and images? How do we perform the following queries? “swimmers OR jets” “tiger AND grass, with more emphasis on tigers than grass” “find me images of birds that are similar to this image”
3
Related Work Inference networks Semantic image retrieval Kernel methods
4
Inference Networks Inference Network Framework [Turtle and Croft ‘89] Formal information retrieval framework INQUERY search engine Allows structured queries phrases, term weighting, synonyms, etc… #wsum( 2.0 #phrase ( image retrieval ) 1.0 model ) Handles multiple document representations (full text, abstracts, etc…) MIRROR [deVries ‘98] General multimedia retrieval framework based on inference network framework Probabilities based on clustering of metadata + feature vectors
5
Image Retrieval / Annotation Co-occurrence model [Mori, et al] Translation model [Duygulu, et al] Correspondence LDA [Blei and Jordan] Relevance model-based approaches Cross-Media Relevance Models (CMRM) [Jeon, et al] Continuous Relevance Models (CRM) [Lavrenko, et al]
6
Goals Input Set of annotated training images User’s information need Terms Images “Soft” Boolean operators (AND, OR, NOT) Weights Set of test images with no annotations Output Ranked list of test images relevant to user’s information need
7
Data Corel data set † 4500 training images (annotated) 500 test images 374 word vocabulary Each image automatically segmented using normalized cuts Each image represented as set of representation vectors 36 geometric, color, and texture features Same features used in similar past work † Available at: http://vision.cs.arizona.edu/kobus/research/data/eccv_2002/
8
Features Geometric (6) area position (2) boundary/area convexity moment of inertia Color (18) avg. RGB x 2 (6) std. dev. of RGB (3) avg. L*a*b x 2 (6) std. dev. of L*a*b (3) Texture (12) mean oriented energy, 30 deg. increments (12)
9
Image representation cat, grass, tiger, water annotation vector (binary, same for each segment) representation vector (real, 1 per image segment)
10
Image Inference Network J – representation vectors for image, (continuous, observed) q w – word w appears in annotation, (binary, hidden) q r – representation vector r describes image, (binary, hidden) q op – query operator satisfied (binary, hidden) I – user’s information need is satisfied, (binary, hidden) I J q r1 q rk … q op1 q op2 q w1 q wk … “Image Network” “Query Network” fixed (based on image) dynamic (based on query)
11
Example Instantiation #or #and tigergrass
12
What needs to be estimated? P(q w | J) P(q r | J) P(q op | J) P(I | J) I J q r1 q rk … q op1 q op2 q w1 q wk …
13
P(q w | J) [ P( tiger | ) ] Probability term w appears in annotation given image J Apply Bayes’ Rule and use non-parametric density estimation Assumes representation vectors are conditionally independent given term w annotates the image ???
14
How can we compute P(r i | q w )? training set representation vectors representation vectors associated with image annotated by w area of high likelihood area of low likelihood
15
P(q w | J) [final form] Σ assumed to be diagonal, estimated from training data
16
Regularized estimates… P(q w | J) are good, but not comparable across images termP(q w | J) cat0.45 grass0.35 tiger0.15 water0.05 termP(q w | J) cat0.90 grass0.05 tiger0.01 water0.03 Is the 2 nd image really 2x more “cat-like”? Probabilities are relative per image
17
Regularized estimates… Impact Transformations Used in information retrieval “Rank is more important than value” [Anh and Moffat] Idea: rank each term according to P(q w | J) give higher probabilities to higher ranked terms P(q w | J) ≈ 1/rank qw Zipfian assumption on relevant words a few words are very relevant a medium number of words are somewhat relevant many words are not relevant
18
Regularized estimates… termP(q w | J)1/rank cat0.450.48 grass0.350.24 tiger0.150.16 water0.050.12 termP(q w | J)1/rank cat0.900.48 grass0.050.24 tiger0.010.12 water0.030.16
19
What needs to be estimated? P(q w | J) P(q r | J) P(q op | J) P(I | J) I J q r1 q rk … q op1 q op2 q w1 q wk …
20
P(q r | J) [ P( | ) ] Probability representation vector observed given J Use non-parametric density estimation again Impose density over J’s representation vectors just as we did in the previous case Estimates may be poor Based on small sample (~ 10 representation vectors) Naïve and simple, yet somewhat effective
21
Model Comparison Relevance modeling-based CMRM, CRM General form: Fully non-parametric Model used here General form:
22
What needs to be estimated? P(q w | J) P(q r | J) P(q op | J) P(I | J) I J q r1 q rk … q op1 q op2 q w1 q wk …
23
Query Operators “Soft” Boolean operators #and / #wand (weighted and) #or #not One node added to query network for each operator present in query Many others possible #max, #sum, #wsum #syn, #odn, #uwn, #phrase, etc…
24
#or( #and ( tiger grass ) ) #or #and tigergrass
25
Operator Nodes Combine probabilities from term and image nodes Closed forms derived from corresponding link matrices Allows efficient inference within network Par(q) = Set of q’s parent nodes
26
… but where do they come from? AB Q P(Q=true|a,b)AB 0false 0 true 0 false 1true
27
Results - Annotation ModelTranslationCMRMCRMInfNet # words with recall >= 04966107117 Results on full vocabulary Mean per-word recall0.040.090.190.24 Mean per-word precision0.060.100.160.17 F-measure0.050.090.170.20
28
foals (0.46) mare (0.33) horses (0.20) field (1.9E-5) grass (4.9E-6) railroad (0.67) train (0.27) smoke (0.04) locomotive (0.01) ruins (1.7E-5) sphinx (0.99) polar (5.0E-3) stone (1.0E-3) bear (9.7E-4) sculpture (6.0E-4)
29
Results - Retrieval Precision @ 5 retrieved images 1 word2 word3 word CMRM0.19890.13060.1494 CRM0.24800.19020.1888 InfNet0.25250.16720.1727 InfNet-reg0.25470.19640.2170 Mean Average Precision 1 word2 word3 word CMRM0.16970.16420.2030 CRM0.23530.25340.3152 InfNet0.24840.21550.2478 InfNet-reg0.26330.26490.3238
32
Future Work Use rectangular segmentation and improved features Different probability estimates Better methods for estimating P(q r | J) Use CRM to estimate P(q w | J) Apply to documents with both text and images Develop a method/testbed for evaluating for more “interesting” queries
33
Conclusions General, robust model based on inference network framework Departure from implied “AND” between query terms Unique non-parametric method for estimating network probabilities Pros Retrieval (inference) is fast Makes no assumptions about distribution of data Cons Estimation of term probabilities is slow Requires sufficient data to get a good estimate
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.