Download presentation
Presentation is loading. Please wait.
Published byAshlee Ward Modified over 9 years ago
1
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06
2
Outline Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering Conclusions and applications
3
Outline Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering Conclusions and applications
4
Mel Frequency Cepstral Coefficients(MFCC) The most common used feature in speech recognition Advantages: High accuracy and low complexity 39 dimension
5
Mel Frequency Cepstral Coefficients(MFCC) The framework of feature extraction: Speech signal Pre-emphasis Window DFT Mel filter-bank Log(| | 2 ) IDFT MFCC energy derivatives x(n)x(n) x’(n) xt(n)xt(n) At(k) Yt(m)Yt(m) Yt’(m)Yt’(m) yt (j)yt (j) etet
6
Pre-emohasis Pre-emphasis of spectrum at higher frequencies Pre-emphasis x[n ] x’[n ]
7
End-point Detection(Voice activity detection) Noise(silence) Speech
8
Windowing Rectangle window Hamming window
9
Mel-filter bank After DFT we get spectrum frequency amplitude
10
Mel-filter bank Triangular shape in frequency(overlaped) Uniformly spaced below 1kHz Logarithmic scale above 1kHz frequency amplitude
11
Delta Coefficients 1 st/2 nd order differences 1 st order 13 dimension 2 nd order 39 dimension
12
Outline Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering Conclusions and applications
13
Mismatch in Statistical Speech Recognition Possible Approaches for Acoustic Environment Mismatch h[n] acoustic reception microphone distortion phone/wireless channel n 1 (t) n 2 (t) Feature Extraction Search Speech Corpus Acoustic Models Lexicon Language Model Text Corpus y[n] O =o 1 o 2 …o T feature vectors input signal additive noise convolutional noise additive noise output sentences original speech x[n] W=w 1 w 2...w R (training) (recognition) Feature Extraction Feature Extraction Model Training Search and Recognition Acoustic Models Acoustic Models Speech Enhancement Feature-based ApproachesModel-based Approaches y[n] x[n]
14
Outline Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering Conclusions and applications
15
Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN) Cepstral Mean Substraction(CMS)—Convolutional Noise Convolutional noise in time domain becomes additive in cepstral domain y[n] = x[n] h[n] y = x+h,x, y, h in cepstral domain most convolutional noise changes only very slightly for some reasonable time interval x = y h Cepstral Mean Substraction(CMS) assuming E[ x ] = 0,then E[ y ] = h x CMS = y E[ y ] P(x) P(y) P(x) P(y) CMS P P
16
Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN) CMVN: variance normalized as well x CMVN = x CMS /[Var(x CMS )] 1/2 P(x) CMS CMVN P(y)
17
Feature-based Approach-HEQ(Histogram Equalization) The whole distribution equalized y=CDF y -1 [CDF x (x)] CDFy P P yx CDFx P=0.2 33.5
18
Outline Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering Conclusions and applications
19
Feature-based Approach-RASTA amplitude f modulation frequency amplitude f Perform filtering on these signals(temporal filtering)
20
Feature-based Approach-RASTA(Relative Spectral Temporal filtering) Assume the rate of change of noise often lies outside the typical rate of vocal tract shape A specially designed temporal filter Modulation Frequency (H z ) Emphasize speech
21
Data-driven Temporal filtering PCA(Principal Component Analysis) x y e
22
Data-driven Temporal filtering We should not guess our filter, but get it from data Frame index B1(z)B1(z) B2(z)B2(z) Bn(z)Bn(z) L z k (1) z k (2) z k (3) Original feature stream y t filter convolution
23
Outline Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering Conclusions and applications
24
Speech Enhancement- Spectral Subtraction(SS) producing a better signal by trying to remove the noise for listening purposes or recognition purposes Noise n[n] changes fast and unpredictably in time domain, but relatively slowly in frequency domain, N(w) t amplitudespeech noise f speech noise amplitude
25
Outline Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering Conclusions and applications
26
Conclusions We give a general framework of how to extract speech feature We introduce the mainstream robustness There are still numerous noise reduction methods(leave in the reference)
27
References
28
Q & A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.