Presentation is loading. Please wait.

Presentation is loading. Please wait.

Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.

Similar presentations


Presentation on theme: "Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06."— Presentation transcript:

1 Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06

2 Outline  Mel Frequency Cepstral Coefficient(MFCC)  Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering  Conclusions and applications

3 Outline  Mel Frequency Cepstral Coefficient(MFCC)  Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering  Conclusions and applications

4 Mel Frequency Cepstral Coefficients(MFCC)  The most common used feature in speech recognition  Advantages: High accuracy and low complexity 39 dimension

5 Mel Frequency Cepstral Coefficients(MFCC)  The framework of feature extraction: Speech signal Pre-emphasis Window DFT Mel filter-bank Log(| | 2 ) IDFT MFCC energy derivatives x(n)x(n) x’(n) xt(n)xt(n) At(k) Yt(m)Yt(m) Yt’(m)Yt’(m) yt (j)yt (j) etet

6 Pre-emohasis  Pre-emphasis of spectrum at higher frequencies Pre-emphasis x[n ] x’[n ]

7 End-point Detection(Voice activity detection) Noise(silence) Speech

8 Windowing Rectangle window Hamming window

9 Mel-filter bank  After DFT we get spectrum frequency amplitude

10 Mel-filter bank Triangular shape in frequency(overlaped) Uniformly spaced below 1kHz Logarithmic scale above 1kHz frequency amplitude

11 Delta Coefficients  1 st/2 nd order differences 1 st order 13 dimension 2 nd order 39 dimension

12 Outline  Mel Frequency Cepstral Coefficient(MFCC)  Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering  Conclusions and applications

13 Mismatch in Statistical Speech Recognition  Possible Approaches for Acoustic Environment Mismatch h[n] acoustic reception microphone distortion phone/wireless channel n 1 (t) n 2 (t) Feature Extraction Search Speech Corpus Acoustic Models Lexicon Language Model Text Corpus y[n] O =o 1 o 2 …o T feature vectors input signal additive noise convolutional noise additive noise output sentences original speech x[n] W=w 1 w 2...w R (training) (recognition) Feature Extraction Feature Extraction Model Training Search and Recognition Acoustic Models Acoustic Models Speech Enhancement Feature-based ApproachesModel-based Approaches y[n] x[n]

14 Outline  Mel Frequency Cepstral Coefficient(MFCC)  Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering  Conclusions and applications

15 Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN)  Cepstral Mean Substraction(CMS)—Convolutional Noise Convolutional noise in time domain becomes additive in cepstral domain y[n] = x[n]  h[n]  y = x+h,x, y, h in cepstral domain most convolutional noise changes only very slightly for some reasonable time interval x = y  h  Cepstral Mean Substraction(CMS) assuming E[ x ] = 0,then E[ y ] = h x CMS = y  E[ y ] P(x) P(y) P(x) P(y) CMS P P

16 Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN)  CMVN: variance normalized as well x CMVN = x CMS /[Var(x CMS )] 1/2 P(x) CMS CMVN P(y)

17 Feature-based Approach-HEQ(Histogram Equalization)  The whole distribution equalized y=CDF y -1 [CDF x (x)] CDFy P P yx CDFx P=0.2 33.5

18 Outline  Mel Frequency Cepstral Coefficient(MFCC)  Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering  Conclusions and applications

19 Feature-based Approach-RASTA amplitude f modulation frequency amplitude f Perform filtering on these signals(temporal filtering)

20 Feature-based Approach-RASTA(Relative Spectral Temporal filtering)  Assume the rate of change of noise often lies outside the typical rate of vocal tract shape  A specially designed temporal filter Modulation Frequency (H z ) Emphasize speech

21 Data-driven Temporal filtering  PCA(Principal Component Analysis) x y e

22 Data-driven Temporal filtering  We should not guess our filter, but get it from data Frame index B1(z)B1(z) B2(z)B2(z) Bn(z)Bn(z) L z k (1) z k (2) z k (3) Original feature stream y t filter convolution

23 Outline  Mel Frequency Cepstral Coefficient(MFCC)  Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering  Conclusions and applications

24 Speech Enhancement- Spectral Subtraction(SS)  producing a better signal by trying to remove the noise  for listening purposes or recognition purposes  Noise n[n] changes fast and unpredictably in time domain, but relatively slowly in frequency domain, N(w) t amplitudespeech noise f speech noise amplitude

25 Outline  Mel Frequency Cepstral Coefficient(MFCC)  Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering  Conclusions and applications

26 Conclusions  We give a general framework of how to extract speech feature  We introduce the mainstream robustness  There are still numerous noise reduction methods(leave in the reference)

27 References

28 Q & A


Download ppt "Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06."

Similar presentations


Ads by Google