Communications & Multimedia Signal Processing Analysis of the Effects of Train noise on Recognition Rate using Formants and MFCC Esfandiar Zavarehei Department.

Slides:



Advertisements
Similar presentations
Current HOARSE related activities 6-7 Sept …include the following (+ more) Novel architectures 1.All-combinations HMM/ANN 2.Tandem HMM/ANN hybrid.
Advertisements

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003.
Variants, improvements etc. of activity recognition with wearable accelerometers Mitja Luštrek Jožef Stefan Institute Department of Intelligent Systems.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Communications & Multimedia Signal Processing Frequency Kalman Noise Reduction Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel.
Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July,
Communications & Multimedia Signal Processing Report of Work on Formant Tracking LP Models and Plans on Integration with Harmonic Plus Noise Model Qin.
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Data Transmission Slide 1 Continuous & Discrete Signals.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Communications & Multimedia Signal Processing Formant Based Synthesizer Qin Yan Communication & Multimedia Signal Processing Group Dept of Electronic.
Communications & Multimedia Signal Processing Formant Track Restoration in Train Noisy Speech Qin Yan Communication & Multimedia Signal Processing Group.
Communications & Multimedia Signal Processing 1 Speech Communication for Mobile and Hands-Free Devices in Noisy Environments EPSRC Project GR/S30238/01.
Speech Recognition in Noise
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.
Communications & Multimedia Signal Processing Analysis of Effects of Train/Car noise in Formant Track Estimation Qin Yan Department of Electronic and Computer.
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Representing Acoustic Information
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Study of Word-Level Accent Classification and Gender Factors
Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
USE OF IMPROVED FEATURE VECTORS IN SPECTRAL SUBTRACTION METHOD Emrah Besci, Semih Ergin, M.Bilginer Gülmezoğlu, Atalay Barkana Osmangazi University, Electrical.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.
Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga Maurício O. Tsugawa ©2002,
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
Madhulika Pannuri Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Correlation Dimension.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
GWDAW - 8 University of Wisconsin - Milwaukee, December
Spoken Digit Recognition
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Computational NeuroEngineering Lab
PROJECT PROPOSAL Shamalee Deshpande.
Chapter 2 Signal Bandwidth
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Missing feature theory
SNR-Invariant PLDA Modeling for Robust Speaker Verification
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
Measuring the Similarity of Rhythmic Patterns
Combination of Feature and Channel Compensation (1/2)
Speaker Recognition Experiment
Presentation transcript:

Communications & Multimedia Signal Processing Analysis of the Effects of Train noise on Recognition Rate using Formants and MFCC Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 28 January, 2004

Communications & Multimedia Signal Processing Contents The effect of noise on LP-Model Poles Formant Extraction using LP-Model of speech Recognition: – Formants Vs. MFCC Features – The effect of Maximum-Normalization and Mean Subtraction – Static Vs. Dynamic Features

Communications & Multimedia Signal Processing Histogram of Pole Frequencies for Different phonemes Male Speaker – Train Noise SNR = 0

Communications & Multimedia Signal Processing Formant Extraction Using LP-Model Poles Maximum BW of formants Limited frequency range Fixed number of formants Candidate Sets Distant measure Procedure in consonants LP-Modelling and LP-Pole Extraction Pre-Processing Signal Windowing Do poles meet Conditions? no Increase LP Order Yes Save the Formants, move to the next segment and repeat the procedure until the end of signal is reached.

Communications & Multimedia Signal Processing Using LP Formants as features for recognition In addition to the Frequency of poles their Band widths and Magnitudes are used as well The HMM models are trained on mono-phones.

Communications & Multimedia Signal Processing Recognition Results Formants Vs. MFCC MFCC Features contain C 0, Delta and Delta-Delta Features Appended Features are vectors of MFCC appended to formants (length=75)

Communications & Multimedia Signal Processing Maximum Normalizing and Mean Subtracting the features In Maximum Normalizing each row is divided by the maximum absolute value of that particular row. In Mean Subtraction the mean of each row is subtracted so that the mean of each row will be set to zero. Combining these two, first the features are mean subtracted, then maximum normalized.

Communications & Multimedia Signal Processing Recognition Results MFCC Vs. Mean Subtracted Max Normalized MFCC With C 0 C 0 is badly affected by noise.

Communications & Multimedia Signal Processing Recognition Results MFCC Vs. Mean Subtracted Max Normalized MFCC Without C 0 The effect of noise on C 0 can be compensated to some extents by Normalizing the features

Communications & Multimedia Signal Processing Recognition Results Formants Vs. Mean Subtracted Max Normalized Formants Normalization increases the Recognition rate 10% in noisy conditions

Communications & Multimedia Signal Processing MFCC - Dynamic Vs. ‘Static’ Features Dynamic Values are Delta and Acceleration Values ‘Static’ Values are the Actual Values SNR Clean Dynamic And ‘Static’ Dynamic Only ‘Static’ Dynamic And ‘Static’ Normalized Dynamic Only Normalized ‘Static’ Normalized

Communications & Multimedia Signal Processing Formants - Dynamic Vs. ‘Static’ Features SNR Clean Dynamic And ‘Static’ Dynamic Only ‘Static’ Dynamic And ‘Static’ Normalized Dynamic Only Normalized ‘Static’ Normalized Dynamic Values are Delta and Acceleration Values ‘Static’ Values are the Actual Values