Classifying Motion Picture Audio Eirik Gustavsen 07.06.07.

Slides:



Advertisements
Similar presentations
Aggregating local image descriptors into compact codes
Advertisements

CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advanced Speech Enhancement in Noisy Environments
Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This.
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Høgskolen i Gjøvik Saleh Alaliyat Video - based Fall Detection in Elderly's Houses.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Efficient Moving Object Segmentation Algorithm Using Background Registration Technique Shao-Yi Chien, Shyh-Yih Ma, and Liang-Gee Chen, Fellow, IEEE Hsin-Hua.
Computer Science Department A Speech / Music Discriminator using RMS and Zero-crossings Costas Panagiotakis and George Tziritas Department of Computer.
Handwritten Thai Character Recognition Using Fourier Descriptors and Robust C-Prototype Olarik Surinta Supot Nitsuwat.
Video Trails: Representing and Visualizing Structure in Video Sequences Vikrant Kobla David Doermann Christos Faloutsos.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Kinect Player Gender Recognition from Speech Analysis
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539.
Project 1 : Eigen-Faces Applied to Speech Style Classification Brad Keserich, Senior, Computer Engineering College of Engineering and Applied Science;
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 
COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.
Jacob Zurasky ECE5526 – Spring 2011
Multimodal Information Analysis for Emotion Recognition
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Sound-Event Partitioning and Feature Normalization for Robust Sound-Event Detection 2 Department of Electronic and Information Engineering The Hong Kong.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Digital Image Processing (DIP) Lecture # 5 Dr. Abdul Basit Siddiqui Assistant Professor-FURC 1FURC-BCSE7.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Instrument Classification in a Polyphonic Music Environment Yingkit Chow Spring 2005.
Dengsheng Zhang and Melissa Chen Yi Lim
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1.
Supervisor: Nakhmani Arie Semester: Winter 2007 Target Recognition Harmatz Isca.
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.
PHYSICS CLASS ACTIVITY. CLASS ACTIVITY: TUNING FORK FREQUENCY.
Musical Genre Categorization Using Support Vector Machines Shu Wang.
Chenn-Jung Huang a*, Yi-Ju Yang b, Dian-Xiu Yang a, You-Jia Chen a a Department of Computer and Information Science b Institute of Ecology and Environmental.
Music Emotion Classification: A Fuzzy Approach
Spectral subtraction algorithm and optimize Wanfeng Zou 7/3/2014.
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
Recognition of bumblebee species by their buzzing sound
Audio Segmentation, Classification, and Retrieval
Ch. 2 : Preprocessing of audio signals in time and frequency domain
Traffic State Detection Using Acoustics
Spoken Digit Recognition
? If a tree fell in a wood and there was no-one there to hear it – would it make a sound?
Historic Document Image De-Noising using Principal Component Analysis (PCA) and Local Pixel Grouping (LPG) Han-Yang Tang1, Azah Kamilah Muda1, Yun-Huoy.
Musical Style Classification
Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)
Analysis of Audio Using PCA
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Measuring the Similarity of Rhythmic Patterns
Music Signal Processing
Presentation transcript:

Classifying Motion Picture Audio Eirik Gustavsen

Outline Motivation Thesis State of the Art Proposed system Experimental setup Results Future work Conclusion

Motivation Most projects classify clear classes or classes with noise. Few clear boundaries in motion picture audio Subjective descriptions of movies Dificult to compare movie content

Thesis It is possible to automatically create a table of contents of a motion picture, based on its audio track only.

Research questions Find best LLDs to classify motion picture audio Detect boundaries between audio classes within complex audio segments Automatically create a TOC based on the audio track only

Pre-Processing Hz sample rate Mono 16 bits 30 ms windows (L W )

Low Level Descriptors Time domain Frequency domain

Low Level Descriptors Total of 23 low level descriptors TIME DOMAIN Audio Power Audio Wave Form Root-Mean Square Short Time Energy Low Short Time Energy Ratio Zero-Crossing Rate High Zero-Crossing Rate Ratio FREQUENCY DOMAIN Audio Spectrum Centroid Fundamental Frequency 10 Mel-Frequency Cepstral Coefficients Spectrum Flux

Dimensionally reduction Principal components analysis (PCA) is a technique used to reduce multidimensional data sets to lower dimensions for analysis. f(1) f(2) f(3) f(4) f(5)... f(23) PCA d(1) d(2) d(3)

K Nearest Neighbors

Proposed system Pre- Prosessing LLDNorm PCAKNN Post- Prosessing TOC Generation

Classifying Audio Speech Noise (white) Music ”Silence” Mixed audio classes

Class Boundary Detection

Finding most suitable LLDs Most Suitable: ASC AWF RMS HZCRR

Sample Results Music with low volume Clear speech Speech with background environmental sounds Fading between music and speech Speech with Background music Jingle ” Some mistakes”

Future Work To be done in this thesis – Post processing – TOC Open research questions for future works – New motion picture audio classes – Detecting sound objects – Speech recognition

Conclusion Pre-processing makes it possible to classify motion picture audio correctly Using right combination of LLDs enhances the result of the classification

Questions ?