Download presentation
Presentation is loading. Please wait.
Published byCharla Singleton Modified over 9 years ago
1
Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign
2
LSCOM (Large Scale Concept Ontology for Multimedia) A broadcast news video dataset 200+ news videos/ 170 hours 61,901 shots Language ◦ English/Arabic/Chinese
3
Why broadcast News ontology? Critical mass of users, content providers, applications Good content availability (TRECVID LDC FBIS) Share Large set of core concepts with other domains
4
LSCOM Provides Richly annotated video content for accomplishing required access and analysis functions over massive amount of video content Large scale useful well-defined semantic lexicon ◦ More than 3000 concepts ◦ 374 annotated concepts ◦ Bridging semantic gap from low-level features to high-level concepts
5
A LSCOM concept 000 - Parade Concept ID: 000 Name: Parade Definition: Multiple units of marchers, devices, bands, banners or Music. Labeled: Yes
6
LSCOM Hierarchy http://www.lscom.org/ontology/index.html Thing.Individual..Dangerous_Thing...Dangerous_Situation....Emergency_Incident.....Disaster_Event......Natural_Disaster....Natural_Hazard.....Avalance.....Earthquake.....Mudslide.....Natural_Disaster.....Tornado...Dangerous_Tangible_Thing....Cutting_Device
7
Definition: What’s the ontology? (Wikipedia) An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.
8
Ontology Represents the visual knowledge base in a structure way ◦ Graph structure ◦ Tree (hierarchy) structure Images/videos can be effectively learned and retrieved by the coherence between concepts ◦ Logical coherence ◦ Statistical coherence
9
An Ontology Hierarchy: Military Vehicle
10
An example from Wikipedia
11
Ontology Tree for LSCOM
12
A Light Scale Concept Ontology for Multimedia Understanding (LSCOM-Lite) The aim is to break the semantic space using a few concepts (39 concepts). Selection Criteria ◦ Semantic Coverage As many as semantic concepts in News videos could be covered by the light concept set. ◦ Compactness These concept should not semantically overlap. ◦ Modelability These concepts could be modeled with a smaller semantic gap.
13
Selected concept dimensions Divide the semantic space into a multimedia-dimensional space, where each dimension is nearly orthogonal ◦ Program Category ◦ Setting/Scene/Site ◦ People ◦ Objects ◦ Activities ◦ Events ◦ Graphics
14
Histogram of LSCOM-Lite Concepts
15
Some example keyframes
16
Applications Application I: Conceptual Fusion (most basic – early fusion) Application II: Cross-Category Classification (inter-class relation) Application III: Event Dynamic in Concept Space
17
Application I: Conceptual Fusion Video Concept 1 Concept 2 Concept 3 Concept n Visual Features Classifier …
18
LSCOM 374 Models 374 LIBSVM models ◦ http://www.ee.columbia.edu/ln/dvmm/columbi a374/ http://www.ee.columbia.edu/ln/dvmm/columbi a374/ ◦ Feature used (MPEG-7 descriptors) Color Moments Edge Histogram Wavelet Texture ◦ LIBSVM – a library for support vector machine at http://www.csie.ntu.edu.tw/~cjlin/libsvm/ http://www.csie.ntu.edu.tw/~cjlin/libsvm/
19
Application II: cross-category classification with concept transfer G.-J. Qi et al. Towards Cross-Category Knowledge Propagation for Learning Visual Concepts, in CVPR 2011
20
Instance-Level Concept Correlation +1 +1 MountainCastle Mountain and castle Castle only Mountain only
21
Transfer Function Mountain, Castle Mountain Castle None of them
22
Model Concept Relations
23
Automatically construct ontology in a data-driven manner
24
An application III – Event Dynamics in Concept Space
25
Event Detection with Concept Dynamics W. Jiang et al, Semantic event detection based on visual concept prediction, ICME, Germany, 2008.
26
Open Problems Cross-Dataset Gap ◦ Generalize LSCOM dataset to other dataset (e.g., non- news video dataset) Cross-Domain Gap ◦ Text script associated with news videos Can help information extraction for visual concepts? Automatic ontology construction ◦ Task dependent v.s. task independent ◦ Data driven v.s. preliminary knowledge (e.g., WordNet) ◦ Incorporate prior human knowledge (logic relation etc.)
27
TRECVID Competition Task 1: High-Level Feature Extraction ◦ Input: subshot ◦ Output: detection results for 39 LSCOM-Lite concepts in the subshot
28
High-Level Feature Extraction Each concept assumed to be binary (absent or present) in each subshot Submission: Find subshots that contain a certain concept, rank them by the detection confidence score, and submit the top 2000. Evaluations: NIST evaluated 20 medium frequent concepts from 39 concepts using a 50% random samples of all the submission pools
29
20 Evaluated Concepts
30
Evaluation Metric: Average Precision Relevant subshots should be ranked higher than the irrelevant ones. R is the number of relevant images in total, R j is the number of relevant images in top j images, I j indicates if the jth image is irrelevant or not.
31
Results
32
TRECVID Competition Task II: Video Search ◦ Input: text-based 24 topics ◦ Output: relevant subshots in the database
33
Topics to search
34
Topics to search (cont’d)
35
Topics to search
36
Three Types of Search Systems
37
Results: Automatic Runs
38
Results: Manual Runs
39
Results: Interactive Runs
40
Machine Problem 7: Shot Boundary Detection in Videos
41
Goals Detect the abrupt content changes between consecutive frames. ◦ Scene changes ◦ Scene cuts
42
Steps Step 1: Measuring the change of content between video frames ◦ Visual/Acoustic measurements Step 2: Compare the content distance between successive frames. If the distance is larger than a certain threshold, then a shot boundary may exist.
43
Measuring Content based on Visual Information 256 dimensional Color Histogram ◦ In RGB space, normalize the r, g, b in [0,1] ◦ Color space nr ng 8X8 histogram
44
Color Histograms Divide each image into four parts, each part has a 8X8 histogram, and 256 dim features in total.
45
Acoustic Features 12 cepstral coefficients Energy (sum of square of raw signals) Zero crossing rates (ZCR) ZCR = sum(|sign(S(2:N))-sign(S(1:N-1))|) Hints: normalize energy to avoid it over- dominating when computing distances between successive frames
46
Datasets Two videos of little over one minute Manually label the shot boundary
47
What to submit Source code Report ◦ compare shot boundary detection results returned by your algorithm with the manually labeled boundaries ◦ Compare ◦ Explain your choice of threshold ◦ Explain the differences between the acoustic- based and visual-based detection results
48
Where and when to submit Email to ece.ece.ece.417@gmail.comece.ece.ece.417@gmail.com Due: May 2 nd
49
Thanks! Q&A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.