Computer Science Department A Speech / Music Discriminator using RMS and Zero-crossings Costas Panagiotakis and George Tziritas Department of Computer.

Slides:



Advertisements
Similar presentations
Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis.
Advertisements

Applications of one-class classification
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Chapter 4: Representation of data in computer systems: Sound OCR Computing for GCSE © Hodder Education 2011.
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Probability distribution functions Normal distribution Lognormal distribution Mean, median and mode Tails Extreme value distributions.
Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Computer Science Department On the Curve Equipartition Problem: a brief exposition of basic issues Presented by: Costas Panagiotakis Multimedia Informatics.
Sample size computations Petter Mostad
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Ensemble Learning: An Introduction
Dynamic Time Warping Applications and Derivation
Independent Samples t-Test What is the Purpose?What are the Assumptions?How Does it Work?What is Effect Size?
Chapter 14 Introduction to Linear Regression and Correlation Analysis
0 - 1 © 2007 Texas Instruments Inc, Content developed in partnership with Tel-Aviv University From MATLAB ® and Simulink ® to Real Time with TI DSPs Measuring.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI LANGUAGE AND INTELLIGENCE U N I V E R S I T Y O F P I S A DEPARTMENT OF COMPUTER SCIENCE Automatic.
LE 460 L Acoustics and Experimental Phonetics L-13
SoundSense: Scalable Sound Sensing for People-Centric Application on Mobile Phones Hon Lu, Wei Pan, Nocholas D. lane, Tanzeem Choudhury and Andrew T. Campbell.
Applications of Independent Component Analysis Terrence Sejnowski Computational Neurobiology Laboratory The Salk Institute.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.
Presented by Tienwei Tsai July, 2005
Random Sampling, Point Estimation and Maximum Likelihood.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Multimodal Information Analysis for Emotion Recognition
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Experimental Design and Statistics. Scientific Method
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Indoor Location Detection By Arezou Pourmir ECE 539 project Instructor: Professor Yu Hen Hu.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 6 Hypothesis Tests with Means.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Performance Comparison of Speaker and Emotion Recognition
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
How to detect the change of model for fitting. 2 dimensional polynomial 3 dimensional polynomial Prepare for simple model (for example, 2D polynomial.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Building Valid, Credible & Appropriately Detailed Simulation Models
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Random signals Honza Černocký, ÚPGM.
PATTERN COMPARISON TECHNIQUES
Audio Segmentation, Classification, and Retrieval
Scatter-plot Based Blind Estimation of Mixed Noise Parameters
Statistics in Applied Science and Technology
Outline Announcement Texture modeling - continued Some remarks
Simple Probability Problem
REMOTE SENSING Multispectral Image Classification
6-1 Introduction To Empirical Models
Presented by Steven Lewis
Part IV Significantly Different Using Inferential Statistics
EE513 Audio Signals and Systems
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Recognition and Matching based on local invariant features
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

Computer Science Department A Speech / Music Discriminator using RMS and Zero-crossings Costas Panagiotakis and George Tziritas Department of Computer Science University of Crete Heraklion Greece

Computer Science Department Presentation Organization I.Introduction II.Segmentation III. III.Classification IV. IV.Results V. V.Conclusion EUSIPCO 2002, Toulouse France 1

Computer Science Department Introduction (1/3) Input Figure 1: Original Sound Signal (44100 or sample rate) Output Figure 2: Real time Segmentation and Classification (Speech,Music,Silence) EUSIPCO 2002, Toulouse France 2

Computer Science Department Introduction (2/3) Approaches Basic purpose Features extraction (energy,frequency) Feature based Segmentation and Classification Real time segmentation and classification Algorithmic - computation constraints Low feature number Low change extraction error (20 msec) Low minimum distance between two changes (1 sec) High accuracy (95 %)3 EUSIPCO 2002, Toulouse France

Computer Science Department Introduction (3/3) Root Mean Square (RMS) Basic Features Zero Crossings (ZC) Computed every 20 msec Independent characteristics Signal energy Figure 3: RMS in music Figure 4: RMS in speech Figure 5: ZC in music Figure 6: ZC in speech Mean frequency A = 4 EUSIPCO 2002, Toulouse France

Computer Science Department Segmentation (1/3) Basic characteristics RMS based χ 2 distribution fits well the RMS histograms Two stage algorithm Stage 1 1 sec accuracy (low computation cost) Stage 2 20 msec accuracy (high computation cost) m : mean, s 2 : variance Figure 8: Histogram RMS in speech, approximation by χ 2 distribution Figure 7: Histogram RMS in speech, approximation by χ 2 distribution Γ( a + 1) Γ( a + 1)5 EUSIPCO 2002, Toulouse France

Stage 1 Partitioning in 1 sec frames (50 RMS values) Change in Frame i  Frame i-1 and Frame i+1 have to differ Computation of frame distance D (Matusita Distance) using frame similarity (p) Frame i is candidate for Stage 2 (there is a change) If D(i) > threshold and D(i) local maximal Computer Science Department Segmentation (2/3) p( p 1, p 2 ) 6 EUSIPCO 2002, Toulouse France RMS time Frame i-1Frame i+1 HIGH Frame iFrame i+2 1 sec frames Distance Change in frame i LOW

Computer Science Department Segmentation (3/3) Stage 2 20 msec accuracy for each candidate frame (i) from stage 1 1. move 2 successive frames (1 sec) located before and after frame (i) 2. find the time instant where the 2 successive frames have the maximum Matusita distance in RMS distribution Possible oversegmentation Figure 10: The RMS data and the distance D Figure 11: The segmentation result and the RMS data7 EUSIPCO 2002, Toulouse France

Computer Science Department Classification (1/4) Basic purpose Segment classification in one of following classes Music Speech Silence Main Algorithm Hypothesis Segmentation gives homogenous segments Input Basic characteristics RMS, ZC Actual features computation of segment Classification based on actual features values 8 EUSIPCO 2002, Toulouse France

Computer Science Department Classification (2/4) Actual Features specification Normalized RMS variance, σ 2 Α σ 2 Α = Usually (86 %) σ 2 Α (music) < σ 2 Α (speech) The probability of null ZC, ZC0 Always ZC0 (music) = 0 Usually (40%) ZC0 (speech) > 0 Maximal mean frequency, max(ZC) Almost always in speech max(ZC) 2.4 kHz 9 EUSIPCO 2002, Toulouse France

Computer Science Department Joint RMS/ZC measure, Cz Speech : High correlation RMS, ZC many void intervals  low RMS and ZC Music : Essentially independent RMS, ZC Void intervals frequency, Fu Void intervals detection ( 20 msec ): (RMS < T1) && (RMS < 0.1max(RMS(i)) && (RMS < T2) || (ZC = 0) Group neighborly silent intervals Fu : frequency of grouped silent intervals Always in speech Fu > 0.6 In at least 65% of music Fu < 0.6 i  A Actual Features specification Classification (3/4) 10 EUSIPCO 2002, Toulouse France

Computer Science Department Silence segment recognition Segment is silence  E < Threshold  A i  A Classification (4/4) Decision making algorithm ομιλία Silence segment check Actual features checkSilence speechmusic 11 EUSIPCO 2002, Toulouse France

Computer Science Department Data Data source Segmentation performance Results sec speech sec music 70% audio CDs 15% WWW 15% recordings Actual features performance 97% detection probability Change accuracy ~ 0.2 sec Features12 EUSIPCO 2002, Toulouse France σ 2 Α Cz σ 2 Α Cz Cz σ 2 Α ZC0 σ 2 Α Fu σ 2 Α All Cz Cz Accuracy ZC0 σ 2 Α ZC0 σ 2 Α, ZC0 σ2Ασ2Ασ2Ασ2Α Features

Computer Science Department Complexity Conclusion Summary Minimum complexity O(N) Low computation cost Real time segmentation and classification in three classes Energy distribution (RMS) suffices for segmentation RMS – ZC suffices for classification Purpose : minimum cost and high performance Future extension Content-based indexing and retrieval audio signals Pre-processing stage for speech recognition 13 EUSIPCO 2002, Toulouse France

Computer Science Department Segmentation - Classification Demo

Computer Science Department Sound Player Demo