Download presentation
Presentation is loading. Please wait.
Published byJustin Mathews Modified over 8 years ago
1
1 Evaluation of Multi-Media Data QA Systems AQUAINT Breakout Session – June 2002 Howard Wactlar, Carnegie Mellon Yiming Yang, Carnegie Mellon Herb Gish, BBN
2
2 Extracting Answers From … Web-based charts and graphs – Yiming Yang Continuous speech – Herb Gish Video – Howard Wactlar
3
3 Video Extraction Long term goal: Ability to respond to same challenge and metrics as current and forthcoming TREC-QA –But with different source data –Multiple channels of information (with temporal and spatial dimensions) simultaneously Many of the speech goals and measures are applicable Errors inherent in extraction and interpretation of what is seen and heard –…but parallel sources of data may corroborate or refute
4
4 Multimedia Data Extraction Long term goal: Ability to respond to same challenge and metrics as current and forthcoming TREC-QA –But with different source data –Multiple channels of information (with temporal and spatial dimensions) simultaneously Problem: Errors inherent in extraction and interpretation –…but parallel sources of data may corroborate or refute
5
5 Examples of Parallel Data Mention of person’s names with people in imagery Appearance of face in image with name and affiliation overlaid in text (interpreted visually) Description of scene (or activity or event) and corresponding imagery Spoken location name and appearance of corresponding map Spoken location name and appearance of street or store sign Spoken location, appearance of street sign, and embedded GPS (forthcoming in all cameras)
6
6 Answering Questions from Video Factoid answers may be contained only within an image (while audio track gets you to the region of discovery) Generally, collateral video information provides context and background Finding a relevant image in addition to the textual answer Answering a query that specifically seeks an image –who was accompanying Muhammed Atef when he was visiting in Ankara and was this the same person who was in Rihyad with him
7
7 Some measures Did we find a relevant image related to the textual answer (precision measures) Did we correctly answer a question that specifically seeks an image –Video TREC precision and recall* measures (*on fully truthed data) How long does it take to interactively extract a useful answer –Number of steps, total elapsed time –Qualitatively different from textual search Extraction performance –e.g., How does named entity and factoid extraction degrade with speech recognition accuracy
8
8 Evaluation Task Proposal: Multi-modal Biography Generation Task Description: –Given a name, generate a multi-modal summary of key characteristics and events of person’s life in a specified time frame. Response: –Natural language passages/sentences of key events –Recent/historical pictures of individual –video/radio segments (e.g., speech,interview) augmenting key events –Org chart, map of movements Data: –Web images and text –Video and radio from news sources Eval: –Subjective (analyst / other user) assessment –Objective: precision, recall, recency, coherence Goal for next meeting: Define application-based task and metrics
9
9 Attendees who indicated further interest Robert Irieirier@spawar.navy.mil Anita Kulmanakulman@starpower.net Jay Peltzfamfare@sprintmail.com Mark Mayburymaybury@mitre.org John Lowejblowe@socrates.berkeley.edu Yiming Yangyiming@cs.cmu.edu Donna BerryNGIC Robert Hecht-Nielsen r@hnc.com Herbert Gishhgish@bbn.com John Prangejprange@nsa.gov Carol Van Ess-Dykema cjvanes@afterlife.ncsc.mil Howard Wactlarwactlar@cmu.edu Alex Hauptmannhauptmann@cs.cmu.edu 5-6 others attended
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.