Download presentation
Presentation is loading. Please wait.
Published byAllen Stevenson Modified over 9 years ago
1
Copyright © 2004 Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard Wactlar Carnegie Mellon University Pittsburgh, USA October 2004
2
Copyright © 2004 Our Meaning of Contexture Definition: The weaving or assembly of [multimedia] parts into a cohesive whole in order to provide a more complete picture or information structure [to both questions and answers] Interpreting and communicating an associated visual and verbal context to information May contain language, imagery and gestures May illuminate the meaning or significance May explain its circumstances More like a collegial expert response than an encyclopedic source Accelerating discovery by both system and analyst Understanding video perspectives Subtle opinions, attitudes, biases Both visual and textual rhetoric Continuously updating video biographs and event timelines
3
Copyright © 2004 9/11/20013/18/19987/2/19999/19/20013/19/20039/17/2003 People Association Interviewee Monologue Word Quotation Visual Quotation A word quotation of Retired Wesley Clark describing for the president in the Being a CNN Military Analyst on militar y deployment in preparation for the wa Being an interviewee in a news program as a presidential candidate for Wesley Clark (without being mentioned in the video clip) sitting next to Madeleine hearing on a resolution that would direct President Clinton to A visual quotation of Wesley Clark (referred commander) describing President Milosevic’s intent in. Being a CNN Military Analyst on action in the Iraq War. Biograph data Visualizations Synthesized video clips Contextual info Conceptual Overview -Semantic relations on entities - Harden with structured data - Perspective interpretation - Video ontology Context Analysis 0 0101 1100101 0100110 10101001 10001001 0011100 Extract Semantic Data -Scene classification - Event detection - Title/topic labeling - Named entity extraction - Verify entities with structured data Generate Information Contexture - Understand questions - Provide context-rich answers - Produce video biographs - Enable context-based iterative QA process Analyst Analyst’s Profile & QA History Multiple Multimedia Information Sources Structured Data Domestic Sources........ Foreign Sources........
4
Copyright © 2004 Scope of Work Extracting information from video sources for finding answers Applying broadcast TV news ontology (joint with USC) Understanding multimedia questions and their context Understanding the bias of source, topical, or rhetorical perspectives Integration and evaluation Incorporating video biographs and perspectives into answer contextures Contexture dialogue Learning from the analyst
5
Copyright © 2004 Scope of Work Applying broadcast TV news ontology (joint with USC) Extracting information from video sources for finding answers Understanding multimedia questions and their context Understanding the bias of source, topical, or rhetorical perspectives Incorporating video biographs and perspectives into answer contextures Contexture dialogue Learning from the analyst Integration and evaluation
6
Copyright © 2004 Scope of Work Extracting information from video sources for finding answers Applying broadcast TV news ontology (joint with USC) Understanding multimedia questions and their context Understanding the bias of source, topical, or rhetorical perspectives Integration and evaluation Incorporating video biographs and perspectives into answer contextures Contexture dialogue Learning from the analyst
7
Copyright © 2004 Understanding Multimedia Questions 1.Find shots of Pope John Paul II. 2.Find shots of a rocket or missile taking off 3.Find shots of the Tomb of the Unknown Soldier at Arlington National Cemetery. 4.Find shots of the front of the White House in the day-time with the fountain running.
8
Copyright © 2004 Automatic Video Retrieval System Multi-modal Query Pope John Paul II Video Library Weighted Fusion of Similarity Rankings Final Ranked List of Video Shots … Multiple Modality Video Analysis Experts Speech Trans. Video OCR Color Feature Semantic Class Filter Audio Feature Texture Feature
9
Copyright © 2004 Finding the combination weights Offline Online Video Library Learn Weights Training Queries Training Data Similarity rankings from multiple experts Query
10
Copyright © 2004 Finding the combination weights Offline Online Video Library Classify Queries Learn Weights Training Queries Training Data Similarity rankings from multiple experts Query
11
Copyright © 2004 Query Types for Video Retrieval Named person queries, possibly with constraints “Find shots of Yasser Arafat“, “Find shots of Ronald Reagan speaking". Named object queries for an object with a unique name. “Find shots of the Statue of Liberty“, “Find shots of the Mercedes logo". General object queries for a type of objects. They may be qualified. “Find shots of snow-covered mountains“, “Find shots of one or more cats". Scene queries for multiple types of objects in certain relationships. “Find shots of roads with lots of vehicles“, “Find shots of people spending leisure time on the beach".
12
Copyright © 2004 Finding the Combination Weights for Merging Search Results Uniform, fixed weights for all queries Individual weightings for each query Not enough known about each query Weightings for each of 4 query types Text search usually does better and is more consistent than any other single search modality
13
Copyright © 2004 Query Classification Named-Entity extraction POS tagging + NP chunking Syntactic parsing Person Q Specific Object Q people name organization or location name Scene Q multiple NPs single NP no proper name Scene Q General Object Q “Find shots of Bill Clinton” “Find shots of Capitol Hill” “Find shots with (multiple pedestrians) and (multiple vehicles in motion)” “Find shots of (a person diving into the water)” nested NPs no nested NP “Find shots of (one or more cats)” Query X
14
Copyright © 2004 Hierarchical Mixture of Experts Video Shots Query Text Retrieval Expert 1 Retrieval Expert n Query Type u l
15
Copyright © 2004 Performance of different weighting schemes
16
Copyright © 2004 Performance of different weighting schemes
17
Copyright © 2004 Current Limitations Unable to assign multiple query types to one query “Finding Bill Clinton speaking in front of a US flag” (person, object) Unable to capture the query-specific aspects “Finding day-time scenes of the Federal Reserve Building, Washington DC”
18
Copyright © 2004 Scope of Work Extracting information from video sources for finding answers Applying broadcast TV news ontology (joint with USC) Understanding multimedia questions and their context Understanding the bias of source, topical, or rhetorical perspectives Integration and evaluation Incorporating video biographs and perspectives into answer contextures Learning from the analyst Contexture dialogue
19
Copyright © 2004 Labeling Every Face with a News Structure Model (NSM) Sources of information: Audio transcripts + Named Entity extraction Overlaid text Speaker audio characteristics Temporal position of name w.r.t. video segment Temporal structure of news (“Grammar”) Constraints based on image similarity Constraints from speaker audio similarity
20
Copyright © 2004 Baseline Algorithm shot s Transcript clues exist for anchor OR s is first shot in story Transcript clues exist for reporter reporter name(s) by distance news-subject name(s) by distance anchor name Y YN N
21
Copyright © 2004 Overlaid Text with Video OCR Overlaid text Rep. NEWT GINGRICH VOCR text rgp nev~j ginuhicij Edit distance to names: Bill Clinton (0.67) Newt Gingrich (0.46) David Ensor (0.72) Saddam Hussein (0.78) Elizabeth Vargas (0.88) Bill Richardson (0.80)
22
Copyright © 2004 anchor Detection of Anchors, Reporters and News-Subjects reporter news-subject anchor
23
Copyright © 2004 Image and Audio Similarity Constraints
24
Copyright © 2004 Naming Accuracy of Different Approaches MAP for (count) Anchor (187) Reporter (64) News- Subject (125) Overall (376) Top-1 Baseline0.8340.3590.2560.561 NSM0.9570.7340.5120.771 Top-3 Baseline0.8770.5150.5600.710 NSM0.9830.9220.7520.896
25
Copyright © 2004 Feature extraction Visual Gender Classification Face detection Original scale Haar wavelets Boosting classifiers Output male female male Correct Error
26
Copyright © 2004 Interface Showing the People Labeled with Names
27
Copyright © 2004 Scope of Work Understanding the bias of source, topical, or rhetorical perspectives Contexture dialogue Extracting information from video sources for finding answers Applying broadcast TV news ontology (joint with USC) Understanding multimedia questions and their context Integration and evaluation Incorporating video biographs and perspectives into answer contextures Learning from the analyst
28
Copyright © 2004 Finding Stories with Different Perspectives
29
Copyright © 2004 Show length and shot type 22 sec 2 min 24 sec 12 sec
30
FOX and CNN news coverage of David Kay Report on the search for WMD in Iraq. Perspective of broadcaster can be seen in text overlay
31
FOX and CNN news coverage of David Kay Report on the search for WMD in Iraq. FOX uses faster cut rate and has more participation by the anchor
32
Copyright © 2004 Scope of Work Integration and evaluation Contexture dialogue Extracting information from video sources for finding answers Applying broadcast TV news ontology (joint with USC) Understanding multimedia questions and their context Understanding the bias of source, topical, or rhetorical perspectives Incorporating video biographs and perspectives into answer contextures Learning from the analyst
33
Copyright © 2004 Metrics-based Evaluations NIST TRECVID 2004 Video Search Evaluation Submitted classification results for 10 different semantic features Similar to a “Routing Task” for video clips Submitted Informedia system video search answers for Interactive runs comparing expert/novice users Interactive runs using either complete or only visual information Automatic/Manual runs contrasting components of the system Results to be announced later in October...
34
Copyright © 2004 Thank you Carnegie Mellon University Pittsburgh, PA USA
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.