Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO.

Slides:



Advertisements
Similar presentations
Towards Data Mining Without Information on Knowledge Structure
Advertisements

Shih-Fu Chang1, Yu-Gang Jiang1,2, Junfeng He1, Elie El Khoury3,
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
1 Yell / The Law and Special Education, Second Edition Copyright © 2006 by Pearson Education, Inc. All rights reserved.
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
Kapitel 14 Recognition – p. 1 Recognition Scene understanding / visual object categorization Pose clustering Object recognition by local features Image.
Kapitel 14 Recognition Scene understanding / visual object categorization Pose clustering Object recognition by local features Image categorization Bag-of-features.
Video Surveillance E Senior/Feris/Tian 1 Emerging Topics in Video Surveillance Rogerio Feris IBM TJ Watson Research Center
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. *See PowerPoint Lecture Outline for a complete, ready-made.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 38.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Chapter 1 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang, Jun Wang, Shih-Fu Chang, Chong-Wah Ngo VIREO Research Group.
Performance of Hedges & Long Futures Positions in CBOT Corn Goodland, Kansas March 2, 2009 Daniel OBrien, Extension Ag Economist K-State Research and Extension.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.
ZMQS ZMQS
Richmond House, Liverpool (1) 26 th January 2004.
BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.
Break Time Remaining 10:00.
ABC Technology Project
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
15. Oktober Oktober Oktober 2012.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
We are learning how to read the 24 hour clock
MOTION. 01. When an object’s distance from another object is changing, it is in ___.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
: 3 00.
5 minutes.
Week 1.
Visions of Australia – Regional Exhibition Touring Fund Applicant organisation Exhibition title Exhibition Sample Support Material Instructions 1) Please.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Clock will move after 1 minute
1 Unit 1 Kinematics Chapter 1 Day
PSSA Preparation.
How Cells Obtain Energy from Food
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov.
Using Closed Captions to Train Activity Recognizers that Improve Video Retrieval Sonal Gupta and Raymond Mooney University of Texas at Austin.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.
Presentation transcript:

Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO Research Group, City University of Hong Kong 1

Consider this question first Can you find video clips containing fire? … … … time Video 3 Video 2 Video 1 2

12, kilometers 3

Texts (e.g., tags) are noisy even with human doing the annotation. Need to look beyond texts! Texts (e.g., tags) are noisy even with human doing the annotation. Need to look beyond texts! The commercial search engines… 4

There has been a great deal of interests in developing content-based approaches for video/image annotation. –Everingham, Zisserman, Williams and Van Gool, Oxford Tech. Report, –Fei-Fei, Fergus and Perona, CVPR workshop –Grauman and Darrell, ICCV –Lazebnik, Schmid, and Ponce, CVPR –Jiang, Ngo, Yang, CIVR Recent developments sky, mountain, tree, bridge, river… 5

Recent developments 6 Bag of visual words J. Sivic et al, ICCV 2003

Our feature representation framework 7 Chang TRECVID 2008; Jiang, Yang, Ngo & Hauptmann, IEEE TMM, to appear

Recent results on TRECVID 8

Limitations Most existing methods aim at the assignment of concept labels individually –but concepts do not occur in isolation! Domain change between training and testing data was not considered military personnel smoke explosion_fire roadoutdoor vehicle building 9 Documentary Videos Broadcast News Videos

Method overview Domain adaptive semantic diffusion (DASD) 10 road vehicle Water Jiang, Wang, Chang, Ngo, ICCV 2009

Method overview (cont.) Domain adaptive semantic diffusion (DASD) –Semantic graph Nodes are concepts Edges represent concept correlation – Graph diffusion Smooth concept detection scores w.r.t the concept correlation vehicle road Watersky … … … …

Formulation Energy function Detection score of concept c i on test samples Concept affinity matrix 12

Formulation (cont.) Gradually smooth the function makes the detection scores in accordance with the concept relationships Detection score smoothing process 13

Formulation (cont.) Domain changes… concept affinity matrix Broadcast News Videos Documentary Videos 0.09 VEHICLE DESERT SKY CLOUDS WEAPON 0.17 CAR PARKING_LOT

Formulation (cont.) Graph adaptation Graph adaptation process 15

Iteration: 8Iteration: 12Iteration: 0Iteration: 4Iteration: 16Iteration: 20 Graph adaptation - example Broadcast news video domain Documentary video domain 16

Experiments Datasets –TRECVID 2005, 2006, 2007 Baseline detectors: VIREO-374 Graph construction: –Ground-truth labels on TRECVID 2005 SPORTSWEATHER OFFICEBUILDING DESERT MOUNTAIN WALKING PEOPLE- MARCHING EXPLOSION-FIRE MAP TRUCK CORP. LEADER SPORTSWEATHEROFFICE DESERT MOUNTAIN WATER POLICE MILITARY ANIMAL TWO PEOPLE NIGHT TIME TELEPHONE STREET CLASSROOMBUS TRECVID 05/06 (Broadcast News Videos) TRECVID 07 (Documentary Videos) 17

Results Performance gain on TRECVID Datasets TRECVID # of evaluated concepts3920 Baseline (MAP) SD11.8%15.6%12.1% DASD11.9%17.5%16.2% 18 SD: semantic diffusion Consistent improvement over all 3 data sets DASD: domain adaptive semantic diffusion Graph adaptation further improves the performance

Results (cont.) TRECVID 2006 Test Data Per-concept performance 19

Results (cont.) 20 TRECVIDJiang et alAytar et alWeng et alDASD %4.0%N/A11.9% 2006N/A 16.7%17.5% Comparison with the state-of-the-arts various baseline detectors (TRECVID-06)

Computational time Complexity is O(mn) –m: # concepts; n: # video shots Only 2 milliseconds per shot/keyframe! 21 TRECVID 05TRECVID 06TRECVID 07 SD59s84s12s DASD89s165s28s Single image/frame computation time

Summary 22 A judicious approach using local features achieves very impressive results for visual annotation Context information is helpful ! –Domain adaptive semantic diffusion effective for enhancing video annotation accuracy can alleviate the effect of data domain changes highly efficient – Future directions include: detector reliability: diffusion over directed graph web data annotation: utilize contextual information to improve the quality of tags