Understanding the Semantics of Media Lecture Notes on Video Search & Mining, Spring 2012 Presented by Jun Hee Yoo Biointelligence Laboratory School of Computer Science and Engineering Seoul National Univertisy
Semantic Understanding There are some tools which attempt to segment video at a higher level. But this level of analysis does not tell us much about the meaning represented in the media. Problem Statement © 2012, SNU CSE Biointelligence Lab.,
Approach Segmentation Literature Use LSI because it allow us to quantify the position of a portion of the document in a multi-dimensional semantic space. Propose to summarize the text with LSI and analyze the signal with smooth Gaussians. Semantic Retrieval Literature Use mixtures of probability experts for semantic-audio retrieval (MPESAR) to model which more sophisticated model connecting words and media. © 2012, SNU CSE Biointelligence Lab.,
Analysis Tools © 2012, SNU CSE Biointelligence Lab.,
Segmenting Video Temporal Properties of Video Color: It provides robust evidence for a shot change in a video signal. However, it cannot tell us global structure of the video. Random words form a transcript: The words indicate a lot about the overall structure of the story. © 2012, SNU CSE Biointelligence Lab.,
Segmenting Video Test Material CNN Headline News (30min TV show). 21 st Century Jet (Documentary). Use automatic speech recognition(ASR) to provide a transcript of the audio. © 2012, SNU CSE Biointelligence Lab.,
Segmenting Video Scale Space Convert the original signal into scaled space. In scale space, we analyze a signal with many different kernels. © 2012, SNU CSE Biointelligence Lab., With Low Pass Filter Histogram
Segmenting Video Combined Image and Audio Data Combined color, words and scale space analysis. The result is a 20-dimensional vector function of time and scale. © 2012, SNU CSE Biointelligence Lab.,
Segmenting Video Hierarchical Segmentation Results Color and word autocorrelations for the Boeing 777 video © 2012, SNU CSE Biointelligence Lab.,
Segmenting Video Hierarchical Segmentation Results Grouping 4-8 sentences produces a larger semantic autocorrelation. © 2012, SNU CSE Biointelligence Lab.,
Segmenting Video Intermediate Results A scale-space segmentation algorithm produced a boundary map showing the edges in the signal. © 2012, SNU CSE Biointelligence Lab.,
Segmenting Video A comparison of ground truth. Left: estimated result. Right: ground truth. © 2012, SNU CSE Biointelligence Lab.,
Segmenting Video Shot Boundary Segmentation. Use commercial product, designed by YesVideo. © 2012, SNU CSE Biointelligence Lab.,
Segmenting Video Manual Segmentation result © 2012, SNU CSE Biointelligence Lab.,
Semantic Retrieval © 2012, SNU CSE Biointelligence Lab., MPESAR process
Semantic Retrieval Acoustic Signal processing chain Acoustic to Semantic Lookup © 2012, SNU CSE Biointelligence Lab.,
Semantic Retrieval © 2012, SNU CSE Biointelligence Lab., Testing
Retrieval Results © 2012, SNU CSE Biointelligence Lab., Histogram of true label ranks based on likelihoods from audio-to-semantic tests Histogram of true label ranks based on likelihoods from semantic-to-acoustic tests