RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

Building an ASR using HTK CS4706
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
統計圖等化法於雜訊語音辨識之進一步研究 An Improved Histogram Equalization Approach for Robust Speech Recognition 2012/05/22 報告人:汪逸婷 林士翔、葉耀明、陳柏琳 Department of Computer Science.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
Speech recognition from spectral dynamics HYNEK HERMANSKY The Johns Hopkins University, Baltimore, Maryland, USA Presenter : 張庭豪.
Advances in WP1 Turin Meeting – 9-10 March
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Speech Recognition in Noise
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Advances in WP1 and WP2 Paris Meeting – 11 febr
Communications & Multimedia Signal Processing Analysis of Effects of Train/Car noise in Formant Track Estimation Qin Yan Department of Electronic and Computer.
Why is ASR Hard? Natural speech is continuous
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
Representing Acoustic Information
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Speech and Language Processing
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Basics of Neural Networks Neural Network Topologies.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR Amin Fazel Sharif University of Technology Hossein Sameti, S. K. Ghiathi February 2005.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Robust Feature Extraction for Automatic Speech Recognition based on Data-driven and Physiologically-motivated Approaches Mark J. Harvilla1, Chanwoo Kim2.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Performance Comparison of Speaker and Emotion Recognition
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info.
Speech Processing Using HTK Trevor Bowden 12/08/2008.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Speech Enhancement Summer 2009
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Speech Processing Speech Recognition
Frequency Domain Perceptual Linear Predicton (FDPLP)
8-Speech Recognition Speech Recognition Concepts
A Tutorial on Bayesian Speech Feature Enhancement
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Missing feature theory
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
Human Speech Communication
Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Speech / Non-speech Detection
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
Measuring the Similarity of Rhythmic Patterns
Combination of Feature and Channel Compensation (1/2)
Dr. Babasaheb Ambedkar Marathwada University, Aurangabad
Presentation transcript:

RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005 Computer Engineering Department, Sharif University of Technology

Computer Engineering Department Sharif University of Technology Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Effect of Noise on ASR Two phase in most ASR systems Train Operating (Testing) Mismatch causes reduction in accuracy Mismatch occur because of Environment Microphone, babble, distance, transmission canal Speaker Specific speaker: speed,… Various speakers: gender, age, accent,… Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Effect of Noise on ASR Noise Additive noise Babble, car, subway Exhibit, office, … Convolutional Noise Canal, telephone line Microphone effect Distance of speaker to microphone Others Lombard noise, Reflection of building noise Stationary Non-stationary Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Effect of Noise on ASR Simple model Robust Speech Recognition is the study of building speech recognition that handle mismatch condition. Convolutional noise Corrupted Speech Additive noise Clean Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Robustness Methods Signal Speech enhancement Feature Robust feature extraction Model Change of the model parameters Model training Training phase Testing phase Speech Signal Features Model Feature Extraction Training Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Mel-Frequency Cepstral Coefficient Compute magnitude-squared of Fourier transform Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution Take log of outputs ( for RCC we take root instead of log) Compute cepstral using discrete cosine transform Smooth by dropping higher-order coefficients Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Temporal processing To capture the temporal features of the spectral envelop; to provide the robustness: Delta Feature: first and second order differences; regression Cepstral Mean Subtraction: For normalizing for channel effects and adjusting for spectral slope Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Perceptual Linear Prediction (PLP) Compute magnitude-squared of Fourier transform Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution Apply compressive nonlinearities Compute discrete cosine transform Smooth using autoregressive modeling Compute cepstral using linear recursion Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology PLP (Cont.) Algorithm Intensity-Loudness Conversion Inverse DFT Find Autoregressive Coefficients All pole model Critical Band Analysis Equal Loudness Pre-Emphasis Speech signal Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

RelAtive SpecTral Analysis Which makes PLP (and possibly also some other short-term spectrum based techniques) more robust to linear spectral distortions The new spectral estimate is less sensitive to slow variations in the short-term spectrum Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz) Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology RASTA (Cont.) Algorithm SPECTRAL ANALYSIS Bank of Compressing Static Nonlinearities Bank of Linear Band pass Filters Bank of Expanding Static Nonlinearities OPTIONAL PROCESSING SPEECH SIGNAL Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology RASTA-PLP Algorithm Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

RCC-Mean Normalization Root Cepstral Coefficients (RCC) Derived using root compression rather than log compression on the filterbank energies Advantage of RCC to MFCC More immune to noise Faster decoding Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

RCC-Mean Normalization If we approximate root with logarithm Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Experiment 1 Database TFARSDAT 64 Speakers 8 hours telephony speech data ASR Sharif ASR System HMM based Training: Segmental K-means Search: Beam Viterbi Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Experiment 1 Test results Accuracy Correctness% MFCC % 54.97 % 59.32 MFCC_CMS % 51.62 % 56.63 RASTA_PLP % 58.38 % 65.59 RCC % 55.67 % 59.85 RCC_MN % 56.89 % 64.31 Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Experiment 2 Aurora 2.0 Noisy connected digits recognition 4 hours training data, 2 hours test data in 70 Noise Types/SNR conditions HTK HMM based Model for each digit 16 states with 3 Gaussian mixtures Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Experiment 2 Average results on AURORA Average obtained on various SNRs of a noise Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Experiment 2 Subway noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Experiment 2 Babble noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Experiment 2 Car noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Experiment 2 Exhibition noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Summery Various robust features was tested Introduce of RCC_MN In first experiment RASTA-PLP Although RCC_MN is good In second experiment RCC_MN Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Computer Engineering Department Sharif University of Technology Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

Thanks for your patience !