Download presentation
Published byJosephine Hilary Gray Modified over 9 years ago
1
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005 Computer Engineering Department, Sharif University of Technology
2
Computer Engineering Department Sharif University of Technology
Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
3
Computer Engineering Department Sharif University of Technology
Effect of Noise on ASR Two phase in most ASR systems Train Operating (Testing) Mismatch causes reduction in accuracy Mismatch occur because of Environment Microphone, babble, distance, transmission canal Speaker Specific speaker: speed,… Various speakers: gender, age, accent,… Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
4
Computer Engineering Department Sharif University of Technology
Effect of Noise on ASR Noise Additive noise Babble, car, subway Exhibit, office, … Convolutional Noise Canal, telephone line Microphone effect Distance of speaker to microphone Others Lombard noise, Reflection of building noise Stationary Non-stationary Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
5
Computer Engineering Department Sharif University of Technology
Effect of Noise on ASR Simple model Robust Speech Recognition is the study of building speech recognition that handle mismatch condition. Convolutional noise Corrupted Speech Additive noise Clean Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
6
Computer Engineering Department Sharif University of Technology
Robustness Methods Signal Speech enhancement Feature Robust feature extraction Model Change of the model parameters Model training Training phase Testing phase Speech Signal Features Model Feature Extraction Training Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
7
Computer Engineering Department Sharif University of Technology
Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
8
Mel-Frequency Cepstral Coefficient
Compute magnitude-squared of Fourier transform Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution Take log of outputs ( for RCC we take root instead of log) Compute cepstral using discrete cosine transform Smooth by dropping higher-order coefficients Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
9
Computer Engineering Department Sharif University of Technology
Temporal processing To capture the temporal features of the spectral envelop; to provide the robustness: Delta Feature: first and second order differences; regression Cepstral Mean Subtraction: For normalizing for channel effects and adjusting for spectral slope Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
10
Perceptual Linear Prediction (PLP)
Compute magnitude-squared of Fourier transform Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution Apply compressive nonlinearities Compute discrete cosine transform Smooth using autoregressive modeling Compute cepstral using linear recursion Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
11
Computer Engineering Department Sharif University of Technology
PLP (Cont.) Algorithm Intensity-Loudness Conversion Inverse DFT Find Autoregressive Coefficients All pole model Critical Band Analysis Equal Loudness Pre-Emphasis Speech signal Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
12
RelAtive SpecTral Analysis
Which makes PLP (and possibly also some other short-term spectrum based techniques) more robust to linear spectral distortions The new spectral estimate is less sensitive to slow variations in the short-term spectrum Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz) Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
13
Computer Engineering Department Sharif University of Technology
RASTA (Cont.) Algorithm SPECTRAL ANALYSIS Bank of Compressing Static Nonlinearities Bank of Linear Band pass Filters Bank of Expanding Static Nonlinearities OPTIONAL PROCESSING SPEECH SIGNAL Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
14
Computer Engineering Department Sharif University of Technology
RASTA-PLP Algorithm Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
15
Computer Engineering Department Sharif University of Technology
Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
16
RCC-Mean Normalization
Root Cepstral Coefficients (RCC) Derived using root compression rather than log compression on the filterbank energies Advantage of RCC to MFCC More immune to noise Faster decoding Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
17
RCC-Mean Normalization
If we approximate root with logarithm Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
18
Computer Engineering Department Sharif University of Technology
Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
19
Computer Engineering Department Sharif University of Technology
Experiment 1 Database TFARSDAT 64 Speakers 8 hours telephony speech data ASR Sharif ASR System HMM based Training: Segmental K-means Search: Beam Viterbi Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
20
Computer Engineering Department Sharif University of Technology
Experiment 1 Test results Accuracy Correctness% MFCC % 54.97 % 59.32 MFCC_CMS % 51.62 % 56.63 RASTA_PLP % 58.38 % 65.59 RCC % 55.67 % 59.85 RCC_MN % 56.89 % 64.31 Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
21
Computer Engineering Department Sharif University of Technology
Experiment 2 Aurora 2.0 Noisy connected digits recognition 4 hours training data, 2 hours test data in 70 Noise Types/SNR conditions HTK HMM based Model for each digit 16 states with 3 Gaussian mixtures Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
22
Computer Engineering Department Sharif University of Technology
Experiment 2 Average results on AURORA Average obtained on various SNRs of a noise Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
23
Computer Engineering Department Sharif University of Technology
Experiment 2 Subway noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
24
Computer Engineering Department Sharif University of Technology
Experiment 2 Babble noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
25
Computer Engineering Department Sharif University of Technology
Experiment 2 Car noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
26
Computer Engineering Department Sharif University of Technology
Experiment 2 Exhibition noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
27
Computer Engineering Department Sharif University of Technology
Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
28
Computer Engineering Department Sharif University of Technology
Summery Various robust features was tested Introduce of RCC_MN In first experiment RASTA-PLP Although RCC_MN is good In second experiment RCC_MN Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
29
Computer Engineering Department Sharif University of Technology
Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology
30
Thanks for your patience !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.