Download presentation
Presentation is loading. Please wait.
1
ICASSP, May 21 2004 Arjen P. de Vries Thijs Westerveld Tzvetanka I. Ianeva Combining Multiple Representations on the TRECVID Search Task
2
ICASSP, May 21 2004 Introduction Video Retrieval should take advantage of information from all available sources and modalities –…but so far ASR best for almost any query LL11@TRECVID2003: Combining information sources –Different models/modalities –Multiple example images
3
ICASSP, May 21 2004 ‘Language Modelling’ approach to IR DocsModels
4
ICASSP, May 21 2004 Calculate conditional probabilities of observing query samples given each model in the collection Retrieval Models P(Q|M 1 ) P(Q|M 4 ) P(Q|M 3 ) P(Q|M 2 ) Query
5
ICASSP, May 21 2004 Static Model Indexing –Estimate a Gaussian Mixture Model from each keyframe (using EM) –Fixed number of components (C=8) –Feature vectors contain colour, texture, and position information from pixel blocks:
6
ICASSP, May 21 2004 Dynamic Model Indexing: GMM of multiple frames (N=29) around keyframe Feature vectors extended with time- stamp in [0,1]: 0.5 1
7
ICASSP, May 21 2004 Dynamic Model
8
ICASSP, May 21 2004 Dynamic Model Advantages More training data for models Reduced dependency upon selecting appropriate keyframe Some spatio-temporal aspects of shot are captured –(Dis-)appearance of objects
9
ICASSP, May 21 2004 Experimental Set-up Build models for each shot –Static, Dynamic, Language Build Queries from topics –Construct simple keyword text query –Select visual example –Rescale and compress example images to match video size and quality
10
ICASSP, May 21 2004 Combining Modalities Independence assumption textual/visual –P(Q t,Q v |Shot) = P(Q t |LM) * P(Q v |GMM) Combination works if both runs useful [CWI:TREC:2002] Dynamic run more useful than static run RunMAP ASR only.130 Static only.022 Static+ASR.105 Dynamic only.022 Dynamic+ASR.132
11
ICASSP, May 21 2004 Combining Modalities Dynamic: Higher Initial Precision
12
ICASSP, May 21 2004 Dow Jones Topic (120)
13
ICASSP, May 21 2004 Dow Jones Topic (120) “Dow Jones Industrial Average rise day points” + =
14
ICASSP, May 21 2004 Dow Jones Topic (120)
15
ICASSP, May 21 2004 Arafat topic (103)
16
ICASSP, May 21 2004 Arafat Topic (103)
17
ICASSP, May 21 2004 Basketball topic (101) Baseball topic (102)
18
ICASSP, May 21 2004 Basketball Topic
19
ICASSP, May 21 2004 Merging Run Results
20
ICASSP, May 21 2004 Merging Run Results Combining (conflicting) examples difficult [CWI:TREC:2002] Single example Miss relevant shots Round-Robin Merging 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Combined 1 2 3 4.
21
ICASSP, May 21 2004 Merging Run Results Combining (conflicting) examples difficult [CWI:TREC:2002] Single example Miss relevant shots Round-Robin Merging Combined 1 2 3 4. 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 +ASR Single.022.132 All.031.149 Selected.039.151 Best.050.155
22
ICASSP, May 21 2004 Flames (112)
23
ICASSP, May 21 2004 Flames Topic (112)
24
ICASSP, May 21 2004 Conclusions For most topics, neither the static nor the dynamic visual model captures the user information need sufficiently… …averaged over 25 topics however, it is better to use both modalities than ASR only Working hypothesis: Matching against both modalities gives robustness
25
ICASSP, May 21 2004 Conclusions Dynamic captures visual similarity better –Thanks to spatio-temporal aspects? Experiments with full covariance matrix for -dims Static model of KF is too fragile –Dependency on single KF? To be tested by ranking max(all I-frames in shot) –Not enough training data?
26
ICASSP, May 21 2004 Conclusions Visual aspects of an information need are best captured by using multiple examples Combining results for multiple (good) examples in round-robin fashion, each ranked on both modalities, gives near- best performance for almost all topics
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.