RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005 Computer Engineering Department, Sharif University of Technology
Computer Engineering Department Sharif University of Technology Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Effect of Noise on ASR Two phase in most ASR systems Train Operating (Testing) Mismatch causes reduction in accuracy Mismatch occur because of Environment Microphone, babble, distance, transmission canal Speaker Specific speaker: speed,… Various speakers: gender, age, accent,… Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Effect of Noise on ASR Noise Additive noise Babble, car, subway Exhibit, office, … Convolutional Noise Canal, telephone line Microphone effect Distance of speaker to microphone Others Lombard noise, Reflection of building noise Stationary Non-stationary Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Effect of Noise on ASR Simple model Robust Speech Recognition is the study of building speech recognition that handle mismatch condition. Convolutional noise Corrupted Speech Additive noise Clean Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Robustness Methods Signal Speech enhancement Feature Robust feature extraction Model Change of the model parameters Model training Training phase Testing phase Speech Signal Features Model Feature Extraction Training Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Mel-Frequency Cepstral Coefficient Compute magnitude-squared of Fourier transform Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution Take log of outputs ( for RCC we take root instead of log) Compute cepstral using discrete cosine transform Smooth by dropping higher-order coefficients Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Temporal processing To capture the temporal features of the spectral envelop; to provide the robustness: Delta Feature: first and second order differences; regression Cepstral Mean Subtraction: For normalizing for channel effects and adjusting for spectral slope Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Perceptual Linear Prediction (PLP) Compute magnitude-squared of Fourier transform Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution Apply compressive nonlinearities Compute discrete cosine transform Smooth using autoregressive modeling Compute cepstral using linear recursion Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology PLP (Cont.) Algorithm Intensity-Loudness Conversion Inverse DFT Find Autoregressive Coefficients All pole model Critical Band Analysis Equal Loudness Pre-Emphasis Speech signal Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
RelAtive SpecTral Analysis Which makes PLP (and possibly also some other short-term spectrum based techniques) more robust to linear spectral distortions The new spectral estimate is less sensitive to slow variations in the short-term spectrum Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz) Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology RASTA (Cont.) Algorithm SPECTRAL ANALYSIS Bank of Compressing Static Nonlinearities Bank of Linear Band pass Filters Bank of Expanding Static Nonlinearities OPTIONAL PROCESSING SPEECH SIGNAL Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology RASTA-PLP Algorithm Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
RCC-Mean Normalization Root Cepstral Coefficients (RCC) Derived using root compression rather than log compression on the filterbank energies Advantage of RCC to MFCC More immune to noise Faster decoding Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
RCC-Mean Normalization If we approximate root with logarithm Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Experiment 1 Database TFARSDAT 64 Speakers 8 hours telephony speech data ASR Sharif ASR System HMM based Training: Segmental K-means Search: Beam Viterbi Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Experiment 1 Test results Accuracy Correctness% MFCC % 54.97 % 59.32 MFCC_CMS % 51.62 % 56.63 RASTA_PLP % 58.38 % 65.59 RCC % 55.67 % 59.85 RCC_MN % 56.89 % 64.31 Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Experiment 2 Aurora 2.0 Noisy connected digits recognition 4 hours training data, 2 hours test data in 70 Noise Types/SNR conditions HTK HMM based Model for each digit 16 states with 3 Gaussian mixtures Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Experiment 2 Average results on AURORA Average obtained on various SNRs of a noise Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Experiment 2 Subway noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Experiment 2 Babble noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Experiment 2 Car noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Experiment 2 Exhibition noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Summery Various robust features was tested Introduce of RCC_MN In first experiment RASTA-PLP Although RCC_MN is good In second experiment RCC_MN Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Computer Engineering Department Sharif University of Technology Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
Thanks for your patience !