ICME 2004 Tzvetanka I. Ianeva Arjen P. de Vries Thijs Westerveld A Dynamic Probabilistic Multimedia Retrieval Model.

Slides:



Advertisements
Similar presentations
Image Retrieval With Relevant Feedback Hayati Cam & Ozge Cavus IMAGE RETRIEVAL WITH RELEVANCE FEEDBACK Hayati CAM Ozge CAVUS.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Patch to the Future: Unsupervised Visual Prediction
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
TRECVID 2004 Tzvetanka (‘Tzveta’) I. Ianeva Lioudmila (‘Mila’) Boldareva Thijs Westerveld Roberto Cornacchia Djoerd Hiemstra (the 1 and only) Arjen P.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
ICASSP, May Arjen P. de Vries Thijs Westerveld Tzvetanka I. Ianeva Combining Multiple Representations on the TRECVID Search Task.
IEEE TCSVT 2011 Wonjun Kim Chanho Jung Changick Kim
ICME 2008 Huiying Liu, Shuqiang Jiang, Qingming Huang, Changsheng Xu.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Natan Jacobson, Yen-Lin Lee, Vijay Mahadevan, Nuno Vasconcelos, Truong Q. Nguyen IEEE, ICME 2010.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
T.Sharon 1 Internet Resources Discovery (IRD) Video IR.
Incremental Learning of Temporally-Coherent Gaussian Mixture Models Ognjen Arandjelović, Roberto Cipolla Engineering Department, University of Cambridge.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
A Probabilistic Framework for Video Representation Arnaldo Mayer, Hayit Greenspan Dept. of Biomedical Engineering Faculty of Engineering Tel-Aviv University,
Scalable Text Mining with Sparse Generative Models
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Information Retrieval in Practice
Using Probabilistic Models for Multimedia Retrieval Arjen P. de Vries (Joint research with Thijs Westerveld) Centrum voor Wiskunde en Informatica.
Multimedia Databases (MMDB)
Image and Video Retrieval INST 734 Doug Oard Module 13.
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Characterizing activity in video shots based on salient points Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Kevin Cherry Robert Firth Manohar Karki. Accurate detection of moving objects within scenes with dynamic background, in scenarios where the camera is.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Chapter 23: Probabilistic Language Models April 13, 2004.
A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION Yan Song, Sheng Tang, Yan-Tao Zheng, Tat-Seng Chua, Yongdong Zhang, Shouxun Lin.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Chapter 5 Multi-Cue 3D Model- Based Object Tracking Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
March 31, 1998NSF IDM 98, Group F1 Group F Multi-modal Issues, Systems and Applications.
ICIP 2004, Singapore, October A Comparison of Continuous vs. Discrete Image Models for Probabilistic Image and Video Retrieval Arjen P. de Vries.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
M4 / September Integrating multimodal descriptions to index large video collections M4 meeting – Munich Nicolas Moënne-Loccoz, Bruno Janvier,
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
MULTIMEDIA DATA MODELS AND AUTHORING
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Visual Information Retrieval
(Note: a lot of input from Thijs Westerveld)
Introduction Multimedia initial focus
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Statistical Models for Automatic Speech Recognition
Video Google: Text Retrieval Approach to Object Matching in Videos
Dynamical Statistical Shape Priors for Level Set Based Tracking
Image Segmentation Techniques
SMEM Algorithm for Mixture Models
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
Multimedia Information Retrieval
Video Google: Text Retrieval Approach to Object Matching in Videos
EM Algorithm and its Applications
Presentation transcript:

ICME 2004 Tzvetanka I. Ianeva Arjen P. de Vries Thijs Westerveld A Dynamic Probabilistic Multimedia Retrieval Model

ICME 2004 Introduction Video Representation schemes used for retrieval: –Static –Spatio-temporal Video is a temporal media so a ‘good’ model solves the limitations of keyframe-based shot representation

ICME 2004 Spatio-temporal grouping Spatial priority and tracking of regions from frame to frame Joint spatial and temporal segmentation –Human vision finds salient structures jointly in space and time (Gepshtein and Kubovy, 2000)

ICME 2004 Motivation Pursue video retrieval instead of image (keyframe) retrieval Extension of the Static Probabilistic Multimedia Retrieval model (2003) GMM in DCT-space-time domain –Diagonal covariance

ICME 2004 Static Model DocsModels Indexing - Estimate Gaussian Mixture Models from images using EM - Based on feature vector with colour, texture and position information from pixel blocks - Fixed number of components

ICME 2004 Static Model Indexing –Estimate a Gaussian Mixture Model from each keyframe (using EM) –Fixed number of components (C=8) –Feature vectors contain colour, texture, and position information from pixel blocks:

ICME 2004 Static Model Models P(Q|M 1 ) P(Q|M 4 ) P(Q|M 3 ) P(Q|M 2 ) Query Retrieval –Calculate conditional probabilities of query samples given models in collection

ICME 2004 Dynamic Model Selecting frames – 1 second sequence around the keyframe – Entire video shot as sequence of frames sampled at regular intervals Features

ICME 2004 Dynamic Model Indexing: GMM of multiple frames around keyframe Feature vectors extended with time- stamp normalized in [0,1]: 0.5 1

ICME 2004 Dynamic Model

ICME 2004 Query example: A single image Artificial sequence of 29 images as the single query example where the time is normalized between 0 and 1 Extend the query example image’s features with a fixed temporal feature value of 0.5 – Better results and lower computational cost

ICME 2004 Dynamic Model Advantages More training data for models –Less sensitive to random initialization Reduced dependency upon selecting appropriate keyframe Some spatio-temporal aspects of shot are captured –(Dis-)appearance of objects

ICME 2004 Dynamic Model

ICME 2004 Dynamic Model

ICME 2004 Dynamic Model

ICME 2004 Retrieval Framework Smoothing Building dynamic GMMs Likelihood goes to infinity ???

ICME 2004 Experimental Set-up Build models for each shot –Static, Dynamic, Language Build Queries from topics –Construct simple keyword text query –Select visual example –Rescale and compress example images to match video size and quality

ICME 2004 Combining Modalities Independence assumption textual/visual –P(Q t,Q v |Shot) = P(Q t |LM) * P(Q v |GMM) Combination works if both runs useful [CWI:TREC:2002] Dynamic run more useful than static run RunMAP ASR only.130 Static only.022 Static+ASR.105 Dynamic only.022 Dynamic+ASR.132

ICME 2004 Combining Modalities Dynamic: Higher Initial Precision

ICME 2004 Dynamic: Higher initial precision Static run Dynamic run

ICME 2004 Dow Jones Topic (120)

ICME 2004 Dow Jones Topic (120) “Dow Jones Industrial Average rise day points” + =

ICME 2004 Conclusions Dynamic model captures visual similarity better –Spatio-temporal aspects –More training data –Apropriate key-frame less critical –Less sensitive to the random initialization ASR + dynamic better than either alone

ICME 2004 Future work More data needs more computation effort – optimizations ? Avoid the singular solutions Dynamic number of components ? Full covariance in space-time Integration of audio

ICME 2004 Thanks !!!

ICME 2004 Merging Run Results Combining (conflicting) examples difficult [CWI:TREC:2002] Single example  Miss relevant shots Round-Robin Merging Combined

ICME 2004 Merging Run Results

ICME 2004 Merging Run Results Combining (conflicting) examples difficult [CWI:TREC:2002] Single example  Miss relevant shots Round-Robin Merging Combined ASR Single All Selected Best

ICME 2004 Conclusions Visual aspects of an information need are best captured by using multiple examples Combining results for multiple (good) examples in round-robin fashion, each ranked on both modalities, gives near- best performance for almost all topics