Download presentation
Presentation is loading. Please wait.
1
Communications & Multimedia Signal Processing Analysis of the Effects of Train noise on Recognition Rate using Formants and MFCC Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 28 January, 2004
2
Communications & Multimedia Signal Processing Contents The effect of noise on LP-Model Poles Formant Extraction using LP-Model of speech Recognition: – Formants Vs. MFCC Features – The effect of Maximum-Normalization and Mean Subtraction – Static Vs. Dynamic Features
3
Communications & Multimedia Signal Processing Histogram of Pole Frequencies for Different phonemes Male Speaker – Train Noise SNR = 0
4
Communications & Multimedia Signal Processing Formant Extraction Using LP-Model Poles Maximum BW of formants Limited frequency range Fixed number of formants Candidate Sets Distant measure Procedure in consonants LP-Modelling and LP-Pole Extraction Pre-Processing Signal Windowing Do poles meet Conditions? no Increase LP Order Yes Save the Formants, move to the next segment and repeat the procedure until the end of signal is reached.
5
Communications & Multimedia Signal Processing Using LP Formants as features for recognition In addition to the Frequency of poles their Band widths and Magnitudes are used as well The HMM models are trained on mono-phones.
6
Communications & Multimedia Signal Processing Recognition Results Formants Vs. MFCC MFCC Features contain C 0, Delta and Delta-Delta Features Appended Features are vectors of MFCC appended to formants (length=75)
7
Communications & Multimedia Signal Processing Maximum Normalizing and Mean Subtracting the features In Maximum Normalizing each row is divided by the maximum absolute value of that particular row. In Mean Subtraction the mean of each row is subtracted so that the mean of each row will be set to zero. Combining these two, first the features are mean subtracted, then maximum normalized.
8
Communications & Multimedia Signal Processing Recognition Results MFCC Vs. Mean Subtracted Max Normalized MFCC With C 0 C 0 is badly affected by noise.
9
Communications & Multimedia Signal Processing Recognition Results MFCC Vs. Mean Subtracted Max Normalized MFCC Without C 0 The effect of noise on C 0 can be compensated to some extents by Normalizing the features
10
Communications & Multimedia Signal Processing Recognition Results Formants Vs. Mean Subtracted Max Normalized Formants Normalization increases the Recognition rate 10% in noisy conditions
11
Communications & Multimedia Signal Processing MFCC - Dynamic Vs. ‘Static’ Features Dynamic Values are Delta and Acceleration Values ‘Static’ Values are the Actual Values SNR051015202530Clean Dynamic And ‘Static’ 35.946.7257.2465.6872.0177.2880.4788.52 Dynamic Only 42.4449.8158.1864.8871.3376.1779.9584.60 ‘Static’ 25.1433.6343.4051.9860.0465.2469.1976.38 Dynamic And ‘Static’ Normalized 47.1954.5763.9871.0076.6480.7483.1585.93 Dynamic Only Normalized 46.2052.6959.2366.3372.5976.6380.0684.32 ‘Static’ Normalized 37.1741.1047.8354.7860.2966.3168.9876.04
12
Communications & Multimedia Signal Processing Formants - Dynamic Vs. ‘Static’ Features SNR051015202530Clean Dynamic And ‘Static’ 34.6939.1844.1050.6356.9062.8869.1379.22 Dynamic Only 40.1342.2143.7747.0050.4753.6257.0471.08 ‘Static’ 31.9737.7445.0250.1755.3060.3463.8471.45 Dynamic And ‘Static’ Normalized 44.4846.5950.9355.1559.5264.2969.0878.32 Dynamic Only Normalized 39.8841.1342.0444.4547.0351.0853.8768.01 ‘Static’ Normalized 38.6839.3042.3546.5651.8157.6462.4470.80 Dynamic Values are Delta and Acceleration Values ‘Static’ Values are the Actual Values
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.