Download presentation
Published byJared Holmes Modified over 9 years ago
1
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University of New South Wales, Sydney, NSW 2052, Australia Reporter : Chen, Hung-Bin ICASSP 2004
2
Outline Introduction Testing data The Chi-Square test
This paper proposes Testing data speech plus noise The Chi-Square test Noise estimation using the Chi-Square test Block diagram of the proposed VAD Experiments and results Summary and Conclusions ICASSP 2004
3
Introduction - VAD significance advantage proposed system
Speech recognition systems designed for real world conditions, a robust discrimination of speech from other sounds is a crucial step. advantage Speech discrimination can also be used for coding or telecommunication applications. proposed system a feature set inspired by investigations of various stages of the auditory system ICASSP 2004
4
This paper proposes This paper proposes a voice activity detector (VAD) that makes the speech/noise classification by applying the statistical chi-square test to each frame. It also uses a continuous update of the background noise estimate. If the chi-square test determines that they are close, the frame is declared to be noise, otherwise speech. ( but not mean that is a real speech data ) ICASSP 2004
5
Testing data There are two ways of interpreting noise model:
1. The ‘additive noise’ point of view; i.e. the speech signal is corrupted by additive noise 2. The ‘additive signal’ point of view; i.e. the residual noise signal has speech added to it. Noise model in this paper, the speech signal s(t) is corrupted by uncorrelated additive noise w(t), giving the degraded composite signal y(t): y(t) = s (t) + w(t) . ICASSP 2004
6
The Chi-Square test In this paper, the chi-square test has been applied to the two problems of noise estimation and voice activity detection. The chi-square test seeks to determine if there is a good fit between the frequencies of the noise observed data and the frequencies of the expected or theoretical data. hypothesis is proposed for each frame: :noise-only frame :noise-pluse-speech frame ICASSP 2004
7
Noise estimation using the Chi-Square test
To estimate the noise with improved spectral sensitivity, the input signal is first passed through a bandpass filterbank H1, …, HM, as shown in Figure. Each of these sub-banded signals is then divided into time frames of size L, where M<<L. ICASSP 2004
8
Noise estimation using the Chi-Square test
Let , be the signal in sub-band k and frame p. The samples in the noise vector , will be given by The noise estimates , from the previous frame, p-1, are first grouped into N bins. The vector o of observation is obtained in the same way from the current signal , , using the same N bins. 1 i K (i) (L) Wk,p ei ICASSP 2004
9
Noise estimation using the Chi-Square test
The chi-square test is then applied to these bins, where the chi-square statistic is given by The hypothesis test can thus be written as effectively Background noise is usually assumed to have a Gaussian distribution. Speech is (relatively) non-stationary and is non-Gaussian. The chi-square test can be used to effectively identify the noise-only segments of the signal. ICASSP 2004
10
simple one-pole smoothing filter λ
if the current frame is determined to be a noise-only frame similar to the current noise estimate, the pdf of the noise is updated using a simple one-pole smoothing filter with smoothing coefficientλ. Testing found that a smoothing coefficient of 0.95 gave the best results. ICASSP 2004
11
Block diagram of the proposed VAD
The proposed VAD consists of three main components: 1. Chi-square based noise estimator (as in Section 3), 2. Noise reduction system, and 3. Chi-square based decision module. A block diagram of the complete system is shown in Figure ICASSP 2004
12
Implementation Details
An analysis frame size of 256 samples was used, with a step size of 64 samples. The optimum band gains were estimated from the spectrally decomposed noisy signal using the Ephraim and Malah gain function with a smoothing parameter of 0.98. Noise estimates were obtained from the chi-square noise estimator described in Section 3. ICASSP 2004
13
Implementation Details
The VAD decision was made using another chisquare detector on the synthesised noise-reduced signal. In it a frame size of 15 ms, i.e. 125 samples, was used with an overlap of 25 samples at a sampling frequency of 8192 Hz. The signal was divided into 8 equal sub-bands using a digital IIR filterbank of elliptic bandpass filters of order 10, with bandwidths of Hz. The histogram of the current frame was calculated and divided into 7 classes (or bins)(ei). The first three frames were recursively averaged to provide the starting value of the noise vector. ICASSP 2004
14
Implementation Details
The current input frame was combined with seven previous frames, giving frame sizes of 122 ms for noise estimation. The observed and expected distributions in each sub-band were divided into 7 classes (ei) and tested using the chi-square test. Only those frames in which the null hypothesis was accepted for all 8 sub-bands were declared to be noise. The noise estimate was then updated using a single-sided one-pole recursive filter. Testing found that a smoothing coefficient of 0.95 gave the best results. ICASSP 2004
15
Testing data The VAD was applied to 12 sentences with added babble, car, pink, and white noises at SNRs of 0, 5, and 10 dB. The chi-square detector manages to pick up short bursts of speech that are only a few hundred samples (20 ms) long. ICASSP 2004
16
Experimental Results ICASSP 2004
17
Experimental Results ICASSP 2004
18
Reference VAD proposed by Sohn et al. [5]
The developed VAD employ the decision directed parameter estimation method for the likelihood ratio test. ICASSP 2004
19
Summary and Conclusions
The proposed VAD shows better performances in various environmental conditions, while requires only a few parameters to optimize when compared with the others. Applications such as automatic classification segmentation of animal sounds an efficient encoding of speech and music 藉由耳聽覺描述方式,可以有效的運用於更多的關於聲音和語音的地方 ICASSP 2004
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.