Download presentation
Presentation is loading. Please wait.
Published byLucy Welch Modified over 8 years ago
1
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering University of Washington Presenter: Shih-Hsiang( 士翔 ) ICSLP 2002
2
2 Introduction The performance of ASR systems often decreases dramatically when the noise level increases The degradation is minor when the signal-to-noise ratio (SNR) is high, but quite significant at low SNR level In the past, a variety of techniques have been proposed Principle component analysis and a discriminative neural network (Ellis et al. 2001) Missing Data theory (Cooke et al. 2001) Voice activity detector (VAD) and variable frame rate are used to drop noisy feature vector to reduce insertion error (John et al. 2001) Nonlinear spectral subtraction, noise masking, feature filters, and model adaptation (Lieb et al. 2001) data-driven temporal filters, on-line mean and variance normalization, voice activity detection, and server side discriminate features are integrated together to improve noise robustness (Morgan et al. 2001) … etc
3
3 Literature Review Ellis et al. 2001 John et al. 2001 Variable frame rate processing An observation vector is discarded if it does not differ much from the previous observation vector. In our implementation of VFR, frame-to-frame variation is estimated as the Euclidean norm of the sub-vector corresponding to the delta- cepstrum. Voice activity detection
4
4 Literature Review Morgan et al. (2001)
5
5 Proposed method The first step is standard mean subtraction (MS) The second step is variance normalization (VN) The third step is auto-regression moving average (ARMA) feature vector (cepstral coefficient) the order of the ARMA filter
6
6 Choosing a proper order M of the filter There are zeros in the frequency response of the ARMA filter is approximately proportional to its order It support that a large M will perform poorly since it could filter out important speech information The transfer function is: The frequency response of the ARMA filter of order M is:
7
7 Gain and phase shifts of the ARMA filter
8
8 The time sequences of the cepstral coefficient c1 for the digit string 5376869 corrupted with different levels of noises
9
9 Evaluation Evaluate on Aurora 2.0 noisy digits database Two training sets and three test sets Training sets : clean training set only / multi-condition speech Test sets: stationary-noise sets / non-stationary-noise sets / convolutional noise 7 different levels of noises Clean, 20dB, 15 dB, 10dB, 5dB, 0dB, -5dB Recognizer Simple HMM-based system using whole-word models Zero ~ Nine and Oh : 16 states per word, 3 mixture Gaussian per state silence : 3-states
10
10 Recognition results Top: multi-condition training Bottom: clean training Word accuracies (as percentages)
11
11 A comparison of different orders of the ARMA filtering A small M will retain the short-term cepstral information but is more vulnerable to noise A large M will make the processed features less corrupted by noise, but the short-term cepstral information will be lost. Top: multi-condition training Bottom: clean training
12
12 Test the effectiveness of proposed technique The results show that while variance normalization and mean subtraction improves performance over the baseline, the addition of the ARMA filter provides significant further improvements
13
13 Comparison of different filter causal ARMA filter non-causal MA filter causal MA filter
14
14 Comparison of different filter (cont.) Top: multi-condition training Bottom: clean training
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.