Using Probabilistic Models for Multimedia Retrieval Arjen P. de Vries (Joint research with Thijs Westerveld) Centrum voor Wiskunde en Informatica.

Slides:



Advertisements
Similar presentations
Image Retrieval With Relevant Feedback Hayati Cam & Ozge Cavus IMAGE RETRIEVAL WITH RELEVANCE FEEDBACK Hayati CAM Ozge CAVUS.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
TRECVID 2004 Tzvetanka (‘Tzveta’) I. Ianeva Lioudmila (‘Mila’) Boldareva Thijs Westerveld Roberto Cornacchia Djoerd Hiemstra (the 1 and only) Arjen P.
Visual Recognition Tutorial
ICASSP, May Arjen P. de Vries Thijs Westerveld Tzvetanka I. Ianeva Combining Multiple Representations on the TRECVID Search Task.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Evaluating Search Engine
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Lecture 5: Learning models using EM
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
Latent Dirichlet Allocation a generative model for text
Incremental Learning of Temporally-Coherent Gaussian Mixture Models Ognjen Arandjelović, Roberto Cipolla Engineering Department, University of Cambridge.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Presented by Zeehasham Rasheed
ICME 2004 Tzvetanka I. Ianeva Arjen P. de Vries Thijs Westerveld A Dynamic Probabilistic Multimedia Retrieval Model.
Scalable Text Mining with Sparse Generative Models
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Information Retrieval in Practice
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Finding Better Answers in Video Using Pseudo Relevance Feedback Informedia Project Carnegie Mellon University Carnegie Mellon Question Answering from Errorful.
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
ICIP 2004, Singapore, October A Comparison of Continuous vs. Discrete Image Models for Probabilistic Image and Video Retrieval Arjen P. de Vries.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
Supplementary Slides. More experimental results MPHSM already push out many irrelevant images Query image QHDM result, 4 of 36 ground truth found ANMRR=
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Relevance Feedback Hongning Wang
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Multimedia Retrieval. Outline Overview Indexing Multimedia Generative Models & MMIR –Probabilistic Retrieval –Language models, GMMs Experiments –Corel.
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008 Annotations by Michael L. Nelson.
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
(Note: a lot of input from Thijs Westerveld)
Evaluation of IR Systems
Language Models for Information Retrieval
Image Segmentation Techniques
John Lafferty, Chengxiang Zhai School of Computer Science
Multimedia Information Retrieval
Presentation transcript:

Using Probabilistic Models for Multimedia Retrieval Arjen P. de Vries (Joint research with Thijs Westerveld) Centrum voor Wiskunde en Informatica E-BioSci/ORIEL Annual Workshop, Sep 3-5, 2003

Eiffel tower  scary/spooky Eiffel tower   Introduction

Outline Generative Models –Generative Model –Probabilistic retrieval –Language models, GMMs Experiments –Corel experiments –TREC Video benchmark Conclusions

What is a Generative Model? A statistical model for generating data –Probability distribution over samples in a given ‘language’ M P ( | M )= P ( | M ) P ( | M, ) © Victor Lavrenko, Aug. 2002

Generative Models video of Bayesian model to that present the disclosure can a on for retrieval in have is probabilistic still of for of using this In that is to only queries queries visual combines visual information look search video the retrieval based search. Both get decision (a visual generic results (a difficult We visual we still needs, search. talk what that to do this for with retrieval still specific retrieval information a as model still LM abstract

Unigram and higher-order models Unigram Models N-gram Models Other Models –Grammar-based models, etc. –Mixture models = P ( ) P ( | ) P ( ) P ( ) P ( ) P ( ) P ( | ) P ( | ) P ( | ) © Victor Lavrenko, Aug. 2002

The fundamental problem Usually we don’t know the model M –But have a sample representative of that model First estimate a model from a sample Then compute the observation probability P ( | M ( ) ) M © Victor Lavrenko, Aug. 2002

Indexing: determine models Indexing –Estimate Gaussian Mixture Models from images using EM –Based on feature vector with colour, texture and position information from pixel blocks –Fixed number of components DocsModels

Retrieval: use query likelihood Query: Which of the models is most likely to generate these 24 samples?

Probabilistic Image Retrieval ?

Query Rank by P(Q|M) P(Q|M 1 ) P(Q|M 4 ) P(Q|M 3 ) P(Q|M 2 )

Probabilistic Retrieval Model Text –Rank using probability of drawing query terms from document models Images –Rank using probability of drawing query blocks from document models Multi-modal –Rank using joint probability of drawing query samples from document models

Unigram Language Models (LM) –Urn metaphor Text Models P( ) ~ P ( ) P ( ) P ( ) P ( ) = 4/9 * 2/9 * 4/9 * 3/9 © Victor Lavrenko, Aug. 2002

Generative Models and IR Rank models (documents) by probability of generating the query Q: P( | ) = 4/9 * 2/9 * 4/9 * 3/9 = 96/9 P( | ) = 3/9 * 3/9 * 3/9 * 3/9 = 81/9 P( | ) = 2/9 * 3/9 * 2/9 * 4/9 = 48/9 P( | ) = 2/9 * 5/9 * 2/9 * 2/9 = 40/9

The Zero-frequency Problem Suppose some event not in our example –Model will assign zero probability to that event –And to any set of events involving the unseen event Happens frequently with language It is incorrect to infer zero probabilities –Especially when dealing with incomplete samples ?

Smoothing Idea: shift part of probability mass to unseen events Interpolation with background (General English) –Reflects expected frequency of events –Plays role of IDF – +(1- )

Image Models Urn metaphor not useful –Drawing pixels useless Pixels carry no semantics –Drawing pixel blocks not effective chances of drawing exact query blocks from document slim Use Gaussian Mixture Models (GMM) –Fixed number of Gaussian components/clusters/concepts

? Image Models Expectation-Maximisation (EM) algorithm –iteratively estimate component assignments re-estimate component parameters

Component 1Component 2Component 3 Expectation Maximization E M

animation Component 1Component 2Component 3 E M

Key-frame representation Query model split colour channels Take samples Cr Cb Y DCT coefficients position EM algorithm

Scary Formulas

Probabilistic Retrieval Model Find document(s) D* with highest probability given query Q (MAP): Equal Priors  ML Approximated by minimum Kullback-Leibler divergence

Query –Bag of textual terms –Bag of visual blocks Query model –empirical query distribution KL distance Query Models

Corel Experiments

Testing the Model on Corel 39 classes, ~100 images each Build models from all images Use each image as query –Rank full collection –Compute MAP (mean average precision) AP=average of precision values after each relevant image is retrieved MAP is mean of AP over multiple queries –Relevant  from query class

Example results Query: Top 5:

MAP per Class (mean:.12) English Pub Signs.36 English Country Gardens.33 Arabian Horses.31 Dawn & Dusk.21 Tropical Plants.19 Land of the Pyramids.19 Canadian Rockies.18 Lost Tribes.17 Elephants.17 Tigers.16 Tropical Sea Life.16 Exotic Tropical Flowers.16 Lions.15 Indigenous People.15 Nesting Birds.13 … Sweden.07 Ireland.07 Wildlife of the Galapagos.07 Hawaii.07 Rural France.07 Zimbabwe.07 Images of Death Valley.07 Nepal.07 Foxes & Coyotes.06 North American Deer.06 California Coasts.06 North American Wildlife.06 Peru.05 Alaskan Wildlife.05 Namibia.05

Class confusion Query from class A Relevant  from class B Queries retrieve images from own class Interesting mix-ups –Beaches – Greek islands –Indigenous people – Lost tribes –English country gardens – Tropical plants – Arabian Horses Similar backgrounds

Tuning the Models Yet another subset of Corel data –39 classes, 10 images each –Index as before and calculate MAP Vary model parameters –NY: Number of DCT coefficients from Y channel (1,3,6,10,15,21) –NCbCr: Number of DCT coefficients from CB and Cr channels (0,1,NY) –Xypos: Do/do not use position of samples –C: number of components in GMM (1,2,4,8,16,32)

Example Image

Example models + samples Varying C, NY=10, NCbCr=1, Xypos=1 C=4C=8C=32

Example models + samples Varying NCbCr, NY=10, Xypos=1, C=8 NCbCr=0NCbCr=1NCbCr=10

MAP with different parameters NCbCrXyposC=1C=2C=4C=8C=16C=

Statistical Significance Mixture better than single Gauss (c>1) Small differences between settings –Yet, small differences might be significant Wilcoxon signed-rank test (sign. level 5%) ABDiffRankSignrnk m=87m=88.4  =15  = Z +,Z - Z + =9Z - =6  =7.5

Statistical Significance Results –Optimal number of components at C=8 Fewer components -> insufficient resolution More components -> overfitting –Colour information is important (NCbCr >0) More is better if enough components –Position information undecided although using it never harms

Background Matching Query: Top 5:

Background Matching Query: Top 5:

TREC Experiments

TREC Video Track Goal: Promote progress in content-based video retrieval via metric based evaluation 25 Topics –Multimedia descriptions of an information need; 22 had video examples (avg. 2.7 each), 8 had image (avg. 1.9 each) Task is to return up to 100 best shots –NIST assessors judged top 50 shots from each submitted result set; subsequent full judgements showed only minor variations in performance

Video Data Used mainly Internet Archive –advertising, educational, industrial, amateur films –Noisy, strange color, but real archive data –73.3 hours, partitioned as follows:

Video Representation Video as sequence of shots (all TREC) –Common ground truth shot set used in evaluation; 14,524 shots Shot = image + text (CWI specific) : –Key-frame (middle frame of shot) –ASR Speech Transcript (LIMSI)

Search Topics Requesting shots with specific or generic: – People, Things, Locations, Activities George WashingtonFootball players

Search Topics Requesting shots with specific or generic: –People, Things, Locations, Activities Golden Gate BridgeSailboats

Search Topics Requesting shots with specific or generic: –People, Things, Locations, Activities Overhead views of cities

Search Topics Requesting shots with specific or generic: –People, Things, Locations, Activities Rocket taking off

Search Topics Summary Requested shots with specific/generic: –Combinations of the above: People spending leisure time at the beach Locomotive approaching the viewer Microscopic views of living cells

Experiments …with official TREC measures –Query representation –Textual/Visual/Combined runs …without measures; inspecting visual similarity –Selecting components –Colour vs. texture –EM initialisation

Measures Precision –fraction of retrieved documents that is relevant Recall –fraction of relevant documents that is retrieved Average Precision –precision averaged over different levels of recall Mean Average Precision (MAP) –mean of average precision over all queries

Textual and Visual runs Textual –Short Queries (Topic description) –Long Queries (Topic description + transcripts from video examples) Visual –All examples –Best examples Combined –Simply add textual and visual log-likelihood scores (joint probability of seeing both query terms and query blocks)

Textual and Visual runs Textual > Visual Tlong > Tshort Combining overall not useful If both visual and textual runs good, combining improves

Visual runs Scores for purely visual runs low (MAP.037) Drop further when video examples are removed from relevance judgements

Observation CBR successful under two conditions: –the query example is derived from the same source as the target objects –a domain-specific detector is at hand

vt076: Find shots with James H. Chandler Top 10:

Retrieval Results Non-interactive results disappointing –MAP across all participants/systems.056 –Ignoring ASR runs, MAP drops to.044 Only Known-item retrieval possible –MAP for queries with examples from collection.094 –MAP without these.026 (-40% from average) No significant differences between variants

Selecting Query Images Find shots of the Golden Gate Bridge Full topic –use all examples Best example –compute results for individual examples and find best Manual example –manually select good example from ones available in topic

Selecting Query Images In general Best > Full (MAP full: , best: 0.444) Sometimes Full > Best

Selecting Components Query articulation can improve retrieval effectiveness, but requires enormous user effort [lowlands2001] Document models (GMM), allow for easy selection of important regions [LL10]

Selecting Components For each topic we manually selected meaningful components No improvement in MAP Perhaps useful for more general queries (feature detection?) –Further investigation necessary

Component Search

1-3: 18:

Being lucky… 1-3: Rel.: Visually similar by chanceVisual NOT similar Keyframe does not represent shot

Informal Results Analysis Forget about MAP scores Investigate two aspects of experimental results –How is image similarity captured Look at top 10 results –How do visual results contribute to (MAP) scores Look at key-frames from relevant shots in top 100 Qualitative observations

Some Observations Colour dominates texture Homogeneous Queries –Semantically similar results –…or at least visually similar Heterogeneous queries –Results dominated by subset of query

Some Observations Colour dominates texture

Some Observations Colour dominates texture Homogeneous queries give intuitive results –Semantically similar –... or at least visually

Homogeneous query with semantics

Homogeneous query no semantics, but visual similarity Top 5 audience Top 5 grass: Full queryAudience componentGrass component

Some Observations Colour dominates texture Homogeneous queries give intuitive results –Semantically similar –... or at least visually Results for heterogeneous queries often dominated by part of samples

Heterogeneous query full query MMMMM

Heterogenous query grass samples MMMMM

Heterogeneous query Possible explanations domination sky samples –no document in the collection explains grass samples well –sky samples well explained by any document (i.e. background probability is high) Smoothing with background probabilities might help

Heterogenous queries with smoothing MMMMM Smoothing seems to help somewhat, but problem not solved Looking for model which favors documents with balanced individual sample scores

Controlled Experiments What determines visual similarity in the generative probabilistic model Small special purpose collections created from the large TREC video collection 1.Emphasis on colour information 2.Role of initialisation of the mixture models

Colour Experiments Collection with 2 copies of each frame –Original colour image –Greyscale version Build models –Models can describe colour and texture Search using colour and greyscale queries

Colour Experiments M 1A M 1B M 2B M 2A M NB M NA P(| )~P(| ) M iA M iB P(| )~P(| ) M iA M iB

Distance between pairs models without colour Results P(| ) Ranks 2.9 P(| ) Results Ranks 2.0

Distance between pairs models with colour Results Ranks 89.7 P(| ) Ranks 7.3 P(| ) Results Indeed colour dominates texture

Colour Experiments Conclusion: –Model from colour image only captures colour information Queries Models rank 1 rank 7.3 rank 89.7

EM initialisation EM sensitive to initialisation –Build collection with several models for each frame –Compare scores for different models from same frame –Concentrate on top ranks

EM initialisation Collection with: –2 Videos –5 frames / shot –10 models / frame From random initialisations Models from same frame should have similar scores

EM initialisation

Results –Models from query frame all near top list Mean rank: 8.06, std.dev.5.95 –Models from same shot closer together than models from other frames –In general: higher ranking frames have their models closer together Although EM sensitive to initialisation, this does not affect ranking much

Concluding Remarks

Lessons TREC-10 Generalization remains a problem –Good results  examples from collection Textual search outperforms visual search –Even with topics designed for visual retrieval! Successful visual retrieval often traces down to involving luck (background, known-item) Combining textual and visual results possible in the presented framework –When both have reasonable performance, combination outperforms individual runs

Lessons TREC-10 Components queries retrieve intuitive results Convenient for query articulation! Color dominates texture Sensitivity EM to initialization does not harm results Note: Findings specific for model, but at least suggest hypotheses for others to investigate

Need 4 Test Collections Results on one collection do not automatically transfer to another –Multiple collections needed to conclude one technique is better than other What is a good Test Collection? –Should be representative of realistic task This is what TREC tries to achieve –Results should be measurable Like when using Corel

Plans for TREC-11 Better video representation –More frames per shot –Audio GMM (on MFCC) Spatial and temporal aspects –Shot = background + “objects” Special research interest in the right balance between interactive query articulation and (semi-)automatic query formulation

Future plans Balancing results for heterogeneous queries Propagating generic concepts

Care for more? A probabilistic Multimedia Retrieval Model and Its Evaluation, Thijs Westerveld, Arjen de Vries, Alex van Ballegooij, Franciska de Jong and Djoerd Hiemstra, EURASIP journal on Applied Signal Processing 2003:2