Download presentation
Presentation is loading. Please wait.
1
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR
2
2 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Schedule VAD for noise suppression & frame-dropping Long-Term Spectral divergence Subband OS-based detector Non-linear feature normalization Histogram equalization OS-based equalization Segmental implementation
3
3 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 VAD (1) VAD: motivation To get an estimation of the background noise for Wiener filter design Spectral subtraction To discard non-speech frames WIENER FILTER / SS VAD FRAME DROPPING NOISE ESTIMATION RECOGNIZER NOISY SPEECH
4
4 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 VAD (2) Our approach Use of rather long time spans (~100ms) instead of instantaneous measures Increase discrimination Use an statistical model in the log-FBE domain Smoother estimations Use a feedback decision coupled with noise suppression VAD works on less noisy speech Use of Order Statistics More robust estimation
5
5 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Long-Term Spectral Divergence (1) J. Ramírez, J.C. Segura, C. Benítez, A. de la Torre and A.J. Rubio, Efficient voice activity detection algorithms using long-term speech information, Speech Communication 42 (2004) 271–287
6
6 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Long-Term Spectral Divergence (2)
7
7 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Long-Term Spectral Divergence (3)
8
8 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Long-Term Spectral Divergence (4)
9
9 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Long-Term Spectral Divergence (5)
10
10 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Long-Term Spectral Divergence (7) Recognition experiments with AURORA 2 and 3
11
11 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Long-Term Spectral Divergence (6)
12
12 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Subband OSF VAD (1) J. Ramírez, J.C. Segura, C. Benítez, A. de la Torre, and A.J. Rubio, An Effective Subband OSF-based VAD with Noise Reduction for Robust Speech Recognition, IEEE Trans. On Speech and Audio Processing (to appear in 2005) Decision is based on averaged QSNR defined as a inter-quantile difference Feedback structure VAD operates over the noise-reduced signal
13
13 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Subband OSF VAD (2)
14
14 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Subband OSF VAD (3)
15
15 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Subband OSF VAD (4)
16
16 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Subband OSF VAD (5)
17
17 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Accurate VAD Open topics New alternatives to improve the performance New decision criteria based on OS- filters Already used for edge detection in images Computational efficiency Development of computationally efficient algorithms
18
18 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Feature normalization Objective Transform features to remove undesired variability Linear techniques CMS Cepstral mean subtraction Removes the effect of linear channel distortion CMVN Cepstral mean and variance normalization Extension of CMS to deal with variance reduction caused by the additive noise
19
19 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Feature normalization Non-linear feature distortion Environment effects are non-linear for MFCC features And can hardly be removed with linear techniques Because not only the location (mean) and scale (variance) of the feature distributions are affected, but also the shape (affecting higher order moments of the distribution) Non-linear extensions CDF-matching approaches (HEQ and related) Have been proved to be more effective than linear ones Give normalization for not only the two first moments of the probability distributions
20
20 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 CDF-matching based equalization The main idea Transform the features to match a given PDF In the one-dimensional case CDF-matching gives the solution
21
21 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Equalization and robust classifiers
22
22 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Invariance CMS is invariant to additive bias CMVN is invariant to linear transformations Equalization to a reference distribution is invariant to any invertible transformation (including non-linear ones)
23
23 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 HEQ for robust speech recognition (1) A. de la Torre, A.M. Peinado, J.C. Segura, J.L. Pérez, C. Benítez and A.J. Rubio, Histogram equalization of speech representation for robust speech recognition, IEEE Tans. On Speech and Audio Processing (to appear in 2005) Transformation of each component of the MFCC vector to a Gaussian reference Cumulative distribution are estimated using histograms Performance compared with CMS, CMVN and model-based feature compensation (VTS) Combination with (VTS)
24
24 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 HEQ for robust speech recognition (2)
25
25 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 HEQ for robust speech recognition (3)
26
26 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 HEQ for robust speech recognition (4)
27
27 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 HEQ for robust speech recognition (5)
28
28 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Segmental HEQ (1) J.C. Segura, C. Benítez, A. de la Torre, A.J. Rubio and J. Ramírez, Cepstral Domain Segmental Nonlinear Feature Transformations for Robust Speech Recognition, IEEE Signal Processing Letters, 11(5), May 2004 A segmental implementation of HEQ for non-stationary noise A temporal buffer is used for the histogram estimation instead of the full sentence The algorithmic delay is T frames
29
29 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Segmental HEQ (2)
30
30 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 OSEQ: An efficient implementation (1) A very computationally efficient algorithm based on Order Statistics
31
31 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 OSEQ: An efficient implementation (2)
32
32 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Feature normalization Open topics Reference distribution Clean speech / Gaussian / ¿Others? Dynamic features normalization ( and ) After, before or simultaneously [Obuchi, Stern, EUSP’03] Progressive normalization Not all MFCC are equally affected and do not have equal discriminative power [de Wet, …, ICASSP’03] Lower order moments normalization [Hsu, Lee, ICASSP’04] Parametric techniques Actual approaches are non-parametric [ Haverinen, Kiss, EUSP’03] New applications Speaker independence and adaptation Multi-stream normalization
33
33 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 Combination of techniques Development of a combined robust front-end An accurate VAD For noise parameter estimation A noise reduction technique Spectral subtraction or Wiener filter Statistical feature compensation A Frame-Dropping algorithm To discard non-speech frames And a Feature normalization block For residual non-linear distortion compensation
34
34 José C. Segura Luna HIWIRE Meeting – Crete, 23-24 September, 2004 VAD (1) Development of a combined robust front-end WIENER FILTER / SS VAD FRAME DROPPING NOISE ESTIMATION FEATURE EQUALIZATION NOISY SPEECH RECOGNIZER
35
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.