TRECVID 2004 Tzvetanka (‘Tzveta’) I. Ianeva Lioudmila (‘Mila’) Boldareva Thijs Westerveld Roberto Cornacchia Djoerd Hiemstra (the 1 and only) Arjen P.

Slides:



Advertisements
Similar presentations
LEARNING SEMANTICS OF WORDS AND PICTURES TEJASWI DEVARAPALLI.
Advertisements

Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
06/21/2004StreetTIVO Arjen P. de Vries
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
ICASSP, May Arjen P. de Vries Thijs Westerveld Tzvetanka I. Ianeva Combining Multiple Representations on the TRECVID Search Task.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.
INFO 624 Week 3 Retrieval System Evaluation
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Presented by Zeehasham Rasheed
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
ICME 2004 Tzvetanka I. Ianeva Arjen P. de Vries Thijs Westerveld A Dynamic Probabilistic Multimedia Retrieval Model.
Scalable Text Mining with Sparse Generative Models
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Information Retrieval in Practice
Using Probabilistic Models for Multimedia Retrieval Arjen P. de Vries (Joint research with Thijs Westerveld) Centrum voor Wiskunde en Informatica.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
A Language Modeling Approach to Information Retrieval 한 경 수  Introduction  Previous Work  Model Description  Empirical Results  Conclusions.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
ICIP 2004, Singapore, October A Comparison of Continuous vs. Discrete Image Models for Probabilistic Image and Video Retrieval Arjen P. de Vries.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
M4 / September Integrating multimodal descriptions to index large video collections M4 meeting – Munich Nicolas Moënne-Loccoz, Bruno Janvier,
TREC-2003 (CDVP TRECVID 2003 Team)- 1 - Center for Digital Video Processing C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g CDVP & TRECVID-2003.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Using Social Annotations to Improve Language Model for Information Retrieval Shengliang Xu, Shenghua Bao, Yong Yu Shanghai Jiao Tong University Yunbo Cao.
Content Based Color Image Retrieval vi Wavelet Transformations Information Retrieval Class Presentation May 2, 2012 Author: Mrs. Y.M. Latha Presenter:
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Multimedia Retrieval. Outline Overview Indexing Multimedia Generative Models & MMIR –Probabilistic Retrieval –Language models, GMMs Experiments –Corel.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Multimedia Retrieval. Outline Overview Indexing Multimedia Generative Models & MMIR –Probabilistic Retrieval –Language models, GMMs Experiments –Corel.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Statistical techniques for video analysis and searching chapter Anton Korotygin.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
A Study of Poisson Query Generation Model for Information Retrieval
SIGIR 2005 Relevance Information: A Loss of Entropy but a Gain for IDF? Arjen P. de Vries Thomas Roelleke,
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Visual Information Retrieval
Large-Scale Content-Based Audio Retrieval from Text Queries
(Note: a lot of input from Thijs Westerveld)
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Matching Words with Pictures
Multimedia Information Retrieval
Content Based Image Retrieval
Presentation transcript:

TRECVID 2004 Tzvetanka (‘Tzveta’) I. Ianeva Lioudmila (‘Mila’) Boldareva Thijs Westerveld Roberto Cornacchia Djoerd Hiemstra (the 1 and only) Arjen P. de Vries Probabilistic Approaches to Video Retrieval The Lowlands Team at TRECVID 2004

TRECVID 2004 Generative Models… A statistical model for generating data –Probability distribution over samples in a given ‘language’ M P ( | M )= P ( | M ) P ( | M, ) © Victor Lavrenko, Aug aka ‘Language Modelling’

TRECVID 2004 Basic question: –What is the likelihood that this document is relevant to this query? P(rel|I,Q) = P(I,Q|rel)P(rel) / P(I,Q) … in Information Retrieval P(I,Q|rel) = P(Q|I,rel)P(I|rel)

TRECVID 2004 Retrieval (Query generation) Models P(Q|M 1 ) P(Q|M 4 ) P(Q|M 3 ) P(Q|M 2 ) Query Docs

TRECVID 2004 ‘Language Modelling’ Not just ‘English’ But also, the language of –author –newspaper –text document –image Hiemstra or Robertson? ‘Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing.’

TRECVID 2004 ‘Language Modelling’ Guardian or Times?Not just ‘English’ But also, the language of –author –newspaper –text document –image

TRECVID 2004 ‘Language Modelling’ or ?Not just English! But also, the language of –author –newspaper –text document –image

TRECVID 2004 Application to Video Retrieval Matching against multiple modalities gives robustness –GMM of shot (‘dynamic’) or key-frame (‘static’) –MNM of associated text (ASR) –Assume scores for both modalities independent Merge multiple examples’ results RR fashion Interactive search much more successful than manual: the role of user is very important

TRECVID 2004 TRECVID 2004: Research Questions Pursued Modelling video content: –How to best model the visual content? –How to best model the textual content? Does audio-visual content modelling contribute to better retrieval results? –Both in manual and interactive? How to translate the topic into a query?

TRECVID 2004 Experimental Set-up Build models for each shot –Static, Dynamic, Language Build queries from topics –Automatic as well as manually constructed simple keyword text queries –Select visual example

TRECVID 2004 Modelling Visual Content

TRECVID 2004 Static Model DocsModels Indexing - Estimate Gaussian Mixture Models from images using EM - Based on feature vector with colour, texture and position information from pixel blocks - Fixed number of components

TRECVID 2004 Static Model Indexing –Estimate a Gaussian Mixture Model from each keyframe (using EM) –Fixed number of components (C=8) –Feature vectors contain colour, texture, and position information from pixel blocks:

TRECVID 2004 Dynamic Model Indexing: GMM of multiple frames around keyframe Feature vectors extended with time- stamp normalized in [0,1]: 0.5 1

TRECVID 2004 Examples

TRECVID 2004 Examples

TRECVID 2004 Examples

TRECVID 2004 Dynamic vs. Static Dynamic model –Retrieves more relevant shots 227 vs. 212 –Places these higher in the result lists MAP vs Topic 142 (has example from collection) –Dynamic finds 15 relevant vs. static 3;

TRECVID 2004 Example: Topic 136 Dynamic rank 1-4 (8 found): Static rank 1-4 (4 found):

TRECVID 2004 Dynamic Model Advantages More training data for models –Less sensitive to random initialization Reduced dependency upon selecting appropriate keyframe Spatio-temporal aspects of shot are captured

TRECVID 2004 Modelling Textual Content

TRECVID 2004 Hierarchical Language Model MNM Smoothed over multiple levels Alpha * P(T|Shot) + Beta * P(T|‘Scene’) + Gamma * P(T|Video) + (1–Alpha–Beta–Gamma) * P(T|Collection) Additional video level is beneficial –On 2003 data, vs

TRECVID 2004 Using Video-OCR ASR –MAP ASR+OCR –MAP –Higher initial precision, more relevant –Difference is not statistically significant Further improvements possible? –Pre-process OCR data? Add captions?

TRECVID 2004 MULTI: modalities, examples

TRECVID 2004 Multi-modal Retrieval Combining visual and text scores (using independence assumption) gives better results than each modality on its own –Dynamic+ASR (manual) finds 18 additional relevant shots over ASR only (565 vs. 547) –Consistent with TRECVID 2003 finding!

TRECVID 2004 Query by Multiple Examples Rank-based vs. Score-based –Round-robin (min{rank}) –CMS (mean{score}) Results: –RR gives better MAP ( vs ) –CMS finds more relevant (239 vs. 227)

TRECVID 2004 Query by Multiple Examples A manually made selection of examples gave better results than using all Order effect with RR –Dynamic: video examples first –Static: image examples first –Diffence results from the initial precision

TRECVID 2004 Interactive Search

TRECVID 2004 Interactive System Based on pre-computed similarity matrix –ASR language model –Static key-frame model (using ALA) Update probability scores from searcher’s feedback –See Boldareva & Hiemstra, CIVR 2004 Select most informative modality automatically –Monitor marginal-entropy to indicate user-system performance, apply to choosing update strategy (text/visual/combined) for next iteration

TRECVID 2004 Marginal Entropy ~ MAP

TRECVID 2004 Interactive Results Interactive strategy combining multiple modalities is in general beneficial (MAP=0.1900), even when one modality does not perform well Monitoring marginal entropy not yet successful to decide between modalities for update strategy (but, still promising)

TRECVID 2004 Surprise, Surprise…

TRECVID 2004 Under the Hood Work in Progress Back to the Future – DB+IR!!! –All static model processing has been moved from customised Matlab scripts to MonetDB query plans (CWI’s open-source main-memory DBMS) –Parallel training process on Linux cluster Next steps: –Integration with MonetDB’s XQuery front-end (Pathfinder) and the Cirquid project’s XML-IR system (TIJAH)

TRECVID 2004 Conclusions For most topics, neither the static nor the dynamic visual model captures the user information need sufficiently… …averaged over all topics however, it is better to use both modalities than ASR only Working hypothesis: Matching against both modalities gives robustness

TRECVID 2004 Conclusions Visual aspects of an information need are best captured by using multiple examples Combining results for multiple (good) examples in round-robin fashion, each ranked on both modalities, gives near- best performance for almost all topics

TRECVID 2004 Unfinished Research! Analysis of TRECVID 2004 results –Q: Why is the dynamic model better? More training data, spatio-temporal aspects in model, varying number of components, less dependent on keyframe, … –Q: Why does the audio not help? –Q: Why does the entropy-based monitoring of user-system performance not help?

TRECVID 2004 Unfinished Research! Comparison to TRECVID 2003 results –Apply 2004 training procedure to 2003 data –Apply anchor-person detector –Apply 2003 topic processing (& vice-versa) Static model –Full covariance matrices –Varying number of components

TRECVID 2004 Future Research Retrieval Model –Apply document generation approach –How to properly model multiple modalities? –How to handle multiple query examples? System Aspects –Integration INEX and TRECVID systems –Top-K query processing

TRECVID 2004 Thanks !!!