Download presentation
Presentation is loading. Please wait.
Published byColin Griffin Modified over 11 years ago
1
Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO Research Group, City University of Hong Kong 1
2
Consider this question first Can you find video clips containing fire? … … … time Video 3 Video 2 Video 1 2
3
http://jaeger.earthsci.unimelb.edu.au/Images/Topographic/Whole_Earth/Earth_50.jpg 12,756.32 kilometers 3
4
Texts (e.g., tags) are noisy even with human doing the annotation. Need to look beyond texts! Texts (e.g., tags) are noisy even with human doing the annotation. Need to look beyond texts! The commercial search engines… 4
5
There has been a great deal of interests in developing content-based approaches for video/image annotation. –Everingham, Zisserman, Williams and Van Gool, Oxford Tech. Report, 2006. –Fei-Fei, Fergus and Perona, CVPR workshop 2004. –Grauman and Darrell, ICCV 2005. –Lazebnik, Schmid, and Ponce, CVPR 2006. –Jiang, Ngo, Yang, CIVR 2007. Recent developments sky, mountain, tree, bridge, river… 5
6
Recent developments 6 Bag of visual words J. Sivic et al, ICCV 2003
7
Our feature representation framework 7 Chang TRECVID 2008; Jiang, Yang, Ngo & Hauptmann, IEEE TMM, to appear
8
Recent results on TRECVID 8
9
Limitations Most existing methods aim at the assignment of concept labels individually –but concepts do not occur in isolation! Domain change between training and testing data was not considered military personnel smoke explosion_fire roadoutdoor vehicle building 9 Documentary Videos Broadcast News Videos
10
Method overview Domain adaptive semantic diffusion (DASD) 10 road vehicle 0.05 0.19 0.80 0.46 0.13 0.01 0.12 0.91 0.18 0.05 Water 0.11 0.58 0.10 0.13 0.02 Jiang, Wang, Chang, Ngo, ICCV 2009
11
Method overview (cont.) Domain adaptive semantic diffusion (DASD) –Semantic graph Nodes are concepts Edges represent concept correlation – Graph diffusion Smooth concept detection scores w.r.t the concept correlation vehicle road Watersky 0.1 0.2 0.8 0.5 0.1 … 0.4 0.1 0.6 0.1 0.0 … 0.8 0.0 0.4 0.5 0.2 0.8 … 0.7 0.0 0.1 0.9 0.2 0.1 … 0.3 11
12
Formulation Energy function Detection score of concept c i on test samples Concept affinity matrix 12
13
Formulation (cont.) Gradually smooth the function makes the detection scores in accordance with the concept relationships Detection score smoothing process 13
14
Formulation (cont.) Domain changes… 14 -- concept affinity matrix Broadcast News Videos Documentary Videos 0.09 VEHICLE 0.64 0.20 0.24 0.29 DESERT SKY CLOUDS WEAPON 0.17 CAR PARKING_LOT
15
Formulation (cont.) Graph adaptation Graph adaptation process 15
16
Iteration: 8Iteration: 12Iteration: 0Iteration: 4Iteration: 16Iteration: 20 Graph adaptation - example Broadcast news video domain Documentary video domain 16
17
Experiments Datasets –TRECVID 2005, 2006, 2007 Baseline detectors: VIREO-374 Graph construction: –Ground-truth labels on TRECVID 2005 SPORTSWEATHER OFFICEBUILDING DESERT MOUNTAIN WALKING PEOPLE- MARCHING EXPLOSION-FIRE MAP TRUCK CORP. LEADER SPORTSWEATHEROFFICE DESERT MOUNTAIN WATER POLICE MILITARY ANIMAL TWO PEOPLE NIGHT TIME TELEPHONE STREET CLASSROOMBUS TRECVID 05/06 (Broadcast News Videos) TRECVID 07 (Documentary Videos) 17
18
Results Performance gain on TRECVID 05-07 Datasets TRECVID-200520062007 # of evaluated concepts3920 Baseline (MAP)0.1660.1540.099 SD11.8%15.6%12.1% DASD11.9%17.5%16.2% 18 SD: semantic diffusion Consistent improvement over all 3 data sets DASD: domain adaptive semantic diffusion Graph adaptation further improves the performance
19
Results (cont.) TRECVID 2006 Test Data Per-concept performance 19
20
Results (cont.) 20 TRECVIDJiang et alAytar et alWeng et alDASD 20052.2%4.0%N/A11.9% 2006N/A 16.7%17.5% Comparison with the state-of-the-arts various baseline detectors (TRECVID-06)
21
Computational time Complexity is O(mn) –m: # concepts; n: # video shots Only 2 milliseconds per shot/keyframe! 21 TRECVID 05TRECVID 06TRECVID 07 SD59s84s12s DASD89s165s28s Single image/frame computation time
22
Summary 22 A judicious approach using local features achieves very impressive results for visual annotation Context information is helpful ! –Domain adaptive semantic diffusion effective for enhancing video annotation accuracy can alleviate the effect of data domain changes highly efficient – Future directions include: detector reliability: diffusion over directed graph web data annotation: utilize contextual information to improve the quality of tags
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.