Download presentation
Presentation is loading. Please wait.
Published byLibby Timmons Modified over 10 years ago
1
Visual Event Recognition in Videos by Learning from Web Data Lixin Duan, Dong Xu, Ivor Tsang, Jiebo Luo ¶ Nanyang Technological University, Singapore ¶ Kodak Research Labs, Rochester, NY, USA
2
Outline Overview of the Event Recognition System Similarity between Videos – Aligned Space-Time Pyramid Matching Cross-Domain Problem – Adaptive Multiple Kernel Learning Experiments Conclusion
3
Overview GOAL: Recognize consumer videos Large intra-class variability; limited labeled videos Sports PicnicWedding
4
GOAL: Recognize consumer videos by leveraging a large number of loosely labeled web videos (e.g., from YouTube) Sports Picnic Wedding Overview Consumer Videos A Large Number of Web Videos
5
Overview Video Database Test video Classifier Output Flowchart of the system
6
Pyramid matching methods – Temporally aligned pyramid matching, D. Xu and S.-F. Chang [1] – Unaligned space-time pyramid matching, I. Laptev [2] Similarity between Videos Time axisSpace axes Space-time axes
7
Similarity between Videos
8
Aligned Space-Time Pyramid Matching – Level 1 Distance
9
Similarity between Videos Distance Integer-flow Earth Movers Distance (EMD), Y. Rubner [3] s.t.
10
Distance Similarity between Videos Integer-flow Earth Movers Distance (EMD), Y. Rubner [3] s.t.
11
Cross-Domain Problem Data distribution mismatch between consumer videos and web videos – Consumer videos: Naturally captured – Web videos: Edited; Selected Maximum Mean Discrepancy (MMD), K. M. Borgwardt [4]
12
Cross-Domain Problem Prior information
13
Cross-Domain Problem
14
Adaptive Multiple Kernel Learning (A-MKL) where MMD Structural risk functional
15
Cross-Domain Problem
17
Experiments Data set – 195 consumer videos and 906 web videos collected by ourselves and from Kodak Consumer Video Benchmark Data Set [5] – 6 events: wedding, birthday, picnic, parade, show and sports – Training data: 3 videos per event from consumer videos and all web videos – Test data: The rest consumer videos
18
Experiments
19
Aligned Space-Time Pyramid Matching (ASTPM) vs. Unaligned Space-Time Pyramid Matching (USTPM) – ASTPM is better than USTPM at Level 1 Aligned Unaligned
20
Experiments
21
Comparisons of cross-domain learning methods – (a) SIFT features – (b) ST features – (c) SIFT features and ST features – parade: 75.7% (A-MKL) vs. 62.2% (FR)
22
Experiments Comparisons of cross-domain learning methods Relative improvements – SVM_T: 36.9% – SVM_AT: 8.6% – Feature Replication (FR) [6]: 7.6% – Adaptive SVM (A-SVM) [7]: 49.6% – Domain Transfer SVM (DTSVM) [8]: 9.9% MKL-based methods – Better fuse SIFT features and ST features – Handle noise in the loose labels
23
Conclusion We propose a new event recognition framework for consumer videos by leveraging a large number of loosely labeled web videos. We develop a new aligned space-time pyramid matching method. We present a new cross-domain learning method A-MKL which handles the mismatch between the data distributions of the consumer video domain and the web video domain.
24
References [1] D. Xu and S.-F. Chang. Video event recognition using kernel methods with multi-level temporal alignment. T-PAMI, 30(11):1985–1997, 2008. [2] I. Laptev, M. Marszałek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, 2008. [3] Y. Rubner, C. Tomasi, and L. J. Guibas. The Earth movers distance as a metric for image retrieval. IJCV, 40(2): 99-121, 2000. [4] K. M. Borgwardt, A. Gretton, M. J. Rasch, H.-P. Kriegel, B. Schölkopf, and A. Smola. Integrating structured biological data by kernel maximum mean discrepancy. In ISMB, 2006.
25
References [5] F. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality and the SMO algorithm. In ICML, 2004. [6] H. Daumé III. Frustratingly easy domain adaptation. In ACL, 2007. [7] L. Duan, I. W. Tsang, D. Xu, and S. J. Maybank. Domain transfer svm for video concept detection. In CVPR, 2009. [8] J. Yang, R. Yan, and A. G. Hauptmann. Cross-domain video concept detection using adaptive svms. In ACM MM, 2007. [9] D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004.
26
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.