Download presentation
Presentation is loading. Please wait.
1
Digital Video Library Experience in Large Scale Content Management VIEW Technologies Symposium – CUHK – August 2002 Howard Wactlar Carnegie Mellon University, USA
2
Acquisition Surveillance Radio Broadcast TV Training Film Satellite Video Life Cycle Analysis and Organization 10101010 011011 100100 01 10 Speech Recognition Image Analysis Natural Language Interpretation Database ………………………. Digital Compression ………………………. …… Segmentation Distribution Cable PDA Cell Phone Internet
3
REQUIREMENTS: Automated process for information extraction from video Full-content search and retrieval from all spoken language and visual documents Establishment of large video libraries as a network searchable information resource Mission: Enable Search and Discovery in the Video Medium APPROACH: Integration of machine speech, image and natural language understanding for library creation and exploration Informedia Overview
4
CNN News Broadcasts 1997-2002 (2050 hours) 68,000 segments/stories 1.7 Million “shots” China Historical and Cultural Documentaries (100 hours) English language Western perspective Sample Corpora
5
Some Examples
6
Why is Multimedia Difficult?
7
Challenges of Data Extraction
8
Scene Text Detection Recognizing Scene Text and Faces
9
Interpreting Images Containing Similar Content
10
Style Variations careful, clear, articulated, formal, casual spontaneous, normal, read, dictated, intimate Voice Quality breathy, creaky, whispery, tense, lax, modal Context sport, professional, interview, free conversation, man-machine dialogue Speaking Rate normal, slow, fast, very fast Stress in noise, with increased vocal effort (Lombard reflex), emotional factors (e.g. angry), under cognitive load Understanding Speech in Natural Settings
11
Gathering Information with Faulty Technology Retrieval performance in the presence of inaccuracy and ambiguity in the underlying cognitive processing Approximate match in meaning and visualization Presentation and reuse of library content New data type with space and time dimensions Restricted use intellectual property Interoperability in the absence of standards
12
Challenge of Continuous Production
13
Commercial 4500 motion pictures -> 9,000 hours/year (4.5 TB) 33,000 TV stations x 4 hrs/day -> 48,000,000 hrs/yr (24,000 TB) 44,000 radio stations x 4 hrs/day -> 65,500,000 hrs/yr (3,275 TB) Personal Photographs: 80 billion images -> 410,000 TB/yr Home videos: 1.4 billion tapes -> 300,000 TB/yr X-rays: 2 billion -> 17,000 TB/yr Surveillance Airports: 14,000 terminals x 140 cameras x 24 hrs/day -> 48 M hrs/day Annual Video and Audio Production
14
Commercial 22,600 newspapers x 30 pgs/day -> 124 TB/year 80,000 periodicals x 5,000 pgs/yr -> 52 TB/yr 40,000 scholarly journals x 1,700 pgs/yr -> 9 TB/yr Annual Print Production
15
Video Visualization ____ Summarizing and Visualizing the Result Set
16
Map collage summarizing “El Niño effects” showing distribution by nation with overlaid thumbnails North Pacific Ocean South Pacific Ocean Summarizing Thousands of Videos Example: Map Collage Drought Fire Floods
17
The Need for Visualization Strategies As digital video assets grow, so do possible result sets We transmit with limited bandwidths to limited screen “real estate” As automated processing improves, more metadata enables more dimensions and interfaces into the video content Users want to apply multiple perspectives interchangeably Direct manipulation interfaces are required to place the user in control
18
Some Examples
19
Video Digests Overview first, zoom and filter, then details-on-demand Concatenate scene elements into a single panoramic view Visualize word-based relationships Establish timelines showing trends against time Present maps (or diagrams) showing geographic (or spatial) correlations Combine digests into a single view or animated into a temporal presentation (the auto-documentary)
20
Content-based Metadata Extraction Enables Video Visualization and Summarization Personalized Presentation Summarizer Metadata Extractor User Perspective Templates People Event Affiliation Location Topics Time
21
Information Goals Generate information perspectives on-demand: e.g., by time, location, personalities, events Eliminate redundancy Link all the way back to source content to interactively and dynamically provide any level of detail and summarization Communicate results
22
Knowledge Goals Detect trends Reveal relationships Infer causality Discover anomalies ….
23
Acquisition Surveillance Radio Broadcast TV Training Film Satellite Video Life Cycle Analysis and Organization 10101010 011011 100100 01 10 Speech Recognition Image Analysis Natural Language Interpretation Database ………………………. Digital Compression ………………………. …… Segmentation Distribution Cable PDA Cell Phone Internet $$ $$
24
Consumer and Business Evolving and archived news and information Education and training Sports and entertainment Interactive television Personal memory aids Professional and Enterprise Conventions and tradeshows Meetings/corporate memory Application Space
25
Digital Video Library Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.