Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO Research Group, City University of Hong Kong 1
Consider this question first Can you find video clips containing fire? … … … time Video 3 Video 2 Video 1 2
12, kilometers 3
Texts (e.g., tags) are noisy even with human doing the annotation. Need to look beyond texts! Texts (e.g., tags) are noisy even with human doing the annotation. Need to look beyond texts! The commercial search engines… 4
There has been a great deal of interests in developing content-based approaches for video/image annotation. –Everingham, Zisserman, Williams and Van Gool, Oxford Tech. Report, –Fei-Fei, Fergus and Perona, CVPR workshop –Grauman and Darrell, ICCV –Lazebnik, Schmid, and Ponce, CVPR –Jiang, Ngo, Yang, CIVR Recent developments sky, mountain, tree, bridge, river… 5
Recent developments 6 Bag of visual words J. Sivic et al, ICCV 2003
Our feature representation framework 7 Chang TRECVID 2008; Jiang, Yang, Ngo & Hauptmann, IEEE TMM, to appear
Recent results on TRECVID 8
Limitations Most existing methods aim at the assignment of concept labels individually –but concepts do not occur in isolation! Domain change between training and testing data was not considered military personnel smoke explosion_fire roadoutdoor vehicle building 9 Documentary Videos Broadcast News Videos
Method overview Domain adaptive semantic diffusion (DASD) 10 road vehicle Water Jiang, Wang, Chang, Ngo, ICCV 2009
Method overview (cont.) Domain adaptive semantic diffusion (DASD) –Semantic graph Nodes are concepts Edges represent concept correlation – Graph diffusion Smooth concept detection scores w.r.t the concept correlation vehicle road Watersky … … … …
Formulation Energy function Detection score of concept c i on test samples Concept affinity matrix 12
Formulation (cont.) Gradually smooth the function makes the detection scores in accordance with the concept relationships Detection score smoothing process 13
Formulation (cont.) Domain changes… concept affinity matrix Broadcast News Videos Documentary Videos 0.09 VEHICLE DESERT SKY CLOUDS WEAPON 0.17 CAR PARKING_LOT
Formulation (cont.) Graph adaptation Graph adaptation process 15
Iteration: 8Iteration: 12Iteration: 0Iteration: 4Iteration: 16Iteration: 20 Graph adaptation - example Broadcast news video domain Documentary video domain 16
Experiments Datasets –TRECVID 2005, 2006, 2007 Baseline detectors: VIREO-374 Graph construction: –Ground-truth labels on TRECVID 2005 SPORTSWEATHER OFFICEBUILDING DESERT MOUNTAIN WALKING PEOPLE- MARCHING EXPLOSION-FIRE MAP TRUCK CORP. LEADER SPORTSWEATHEROFFICE DESERT MOUNTAIN WATER POLICE MILITARY ANIMAL TWO PEOPLE NIGHT TIME TELEPHONE STREET CLASSROOMBUS TRECVID 05/06 (Broadcast News Videos) TRECVID 07 (Documentary Videos) 17
Results Performance gain on TRECVID Datasets TRECVID # of evaluated concepts3920 Baseline (MAP) SD11.8%15.6%12.1% DASD11.9%17.5%16.2% 18 SD: semantic diffusion Consistent improvement over all 3 data sets DASD: domain adaptive semantic diffusion Graph adaptation further improves the performance
Results (cont.) TRECVID 2006 Test Data Per-concept performance 19
Results (cont.) 20 TRECVIDJiang et alAytar et alWeng et alDASD %4.0%N/A11.9% 2006N/A 16.7%17.5% Comparison with the state-of-the-arts various baseline detectors (TRECVID-06)
Computational time Complexity is O(mn) –m: # concepts; n: # video shots Only 2 milliseconds per shot/keyframe! 21 TRECVID 05TRECVID 06TRECVID 07 SD59s84s12s DASD89s165s28s Single image/frame computation time
Summary 22 A judicious approach using local features achieves very impressive results for visual annotation Context information is helpful ! –Domain adaptive semantic diffusion effective for enhancing video annotation accuracy can alleviate the effect of data domain changes highly efficient – Future directions include: detector reliability: diffusion over directed graph web data annotation: utilize contextual information to improve the quality of tags