Download presentation
Presentation is loading. Please wait.
1
ICME 2004 Tzvetanka I. Ianeva Arjen P. de Vries Thijs Westerveld A Dynamic Probabilistic Multimedia Retrieval Model
2
ICME 2004 Introduction Video Representation schemes used for retrieval: –Static –Spatio-temporal Video is a temporal media so a ‘good’ model solves the limitations of keyframe-based shot representation
3
ICME 2004 Spatio-temporal grouping Spatial priority and tracking of regions from frame to frame Joint spatial and temporal segmentation –Human vision finds salient structures jointly in space and time (Gepshtein and Kubovy, 2000)
4
ICME 2004 Motivation Pursue video retrieval instead of image (keyframe) retrieval Extension of the Static Probabilistic Multimedia Retrieval model (2003) GMM in DCT-space-time domain –Diagonal covariance
5
ICME 2004 Static Model DocsModels Indexing - Estimate Gaussian Mixture Models from images using EM - Based on feature vector with colour, texture and position information from pixel blocks - Fixed number of components
6
ICME 2004 Static Model Indexing –Estimate a Gaussian Mixture Model from each keyframe (using EM) –Fixed number of components (C=8) –Feature vectors contain colour, texture, and position information from pixel blocks:
7
ICME 2004 Static Model Models P(Q|M 1 ) P(Q|M 4 ) P(Q|M 3 ) P(Q|M 2 ) Query Retrieval –Calculate conditional probabilities of query samples given models in collection
8
ICME 2004 Dynamic Model Selecting frames – 1 second sequence around the keyframe – Entire video shot as sequence of frames sampled at regular intervals Features
9
ICME 2004 Dynamic Model Indexing: GMM of multiple frames around keyframe Feature vectors extended with time- stamp normalized in [0,1]: 0.5 1
10
ICME 2004 Dynamic Model
11
ICME 2004 Query example: A single image Artificial sequence of 29 images as the single query example where the time is normalized between 0 and 1 Extend the query example image’s features with a fixed temporal feature value of 0.5 – Better results and lower computational cost
12
ICME 2004 Dynamic Model Advantages More training data for models –Less sensitive to random initialization Reduced dependency upon selecting appropriate keyframe Some spatio-temporal aspects of shot are captured –(Dis-)appearance of objects
13
ICME 2004 Dynamic Model
14
ICME 2004 Dynamic Model
15
ICME 2004 Dynamic Model
16
ICME 2004 Retrieval Framework Smoothing Building dynamic GMMs Likelihood goes to infinity ???
17
ICME 2004 Experimental Set-up Build models for each shot –Static, Dynamic, Language Build Queries from topics –Construct simple keyword text query –Select visual example –Rescale and compress example images to match video size and quality
18
ICME 2004 Combining Modalities Independence assumption textual/visual –P(Q t,Q v |Shot) = P(Q t |LM) * P(Q v |GMM) Combination works if both runs useful [CWI:TREC:2002] Dynamic run more useful than static run RunMAP ASR only.130 Static only.022 Static+ASR.105 Dynamic only.022 Dynamic+ASR.132
19
ICME 2004 Combining Modalities Dynamic: Higher Initial Precision
20
ICME 2004 Dynamic: Higher initial precision Static run Dynamic run
21
ICME 2004 Dow Jones Topic (120)
22
ICME 2004 Dow Jones Topic (120) “Dow Jones Industrial Average rise day points” + =
23
ICME 2004 Conclusions Dynamic model captures visual similarity better –Spatio-temporal aspects –More training data –Apropriate key-frame less critical –Less sensitive to the random initialization ASR + dynamic better than either alone
24
ICME 2004 Future work More data needs more computation effort – optimizations ? Avoid the singular solutions Dynamic number of components ? Full covariance in space-time Integration of audio
25
ICME 2004 Thanks !!!
26
ICME 2004 Merging Run Results Combining (conflicting) examples difficult [CWI:TREC:2002] Single example Miss relevant shots Round-Robin Merging 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Combined 1 2 3 4.
27
ICME 2004 Merging Run Results
28
ICME 2004 Merging Run Results Combining (conflicting) examples difficult [CWI:TREC:2002] Single example Miss relevant shots Round-Robin Merging Combined 1 2 3 4. 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 +ASR Single.022.132 All.031.149 Selected.039.151 Best.050.155
29
ICME 2004 Conclusions Visual aspects of an information need are best captured by using multiple examples Combining results for multiple (good) examples in round-robin fashion, each ranked on both modalities, gives near- best performance for almost all topics
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.