A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
Advanced Speech Enhancement in Noisy Environments
2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.
An Energy Search Approach to Variable Frame Rate Front-End Processing for Robust ASR Julien Epps and Eric H. C. Choi National ICT Australia Presenter:
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Dual Tone Multi-Frequency System Michael Odion Okosun Farhan Mahmood Benjamin Boateng Project Participants: Dial PulseDTMF.
HIWIRE MEETING Torino, March 9-10, 2006 José C. Segura, Javier Ramírez.
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Speech Recognition in Noise
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.
Advances in WP1 and WP2 Paris Meeting – 11 febr
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
Speech Enhancement Using Spectral Subtraction
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
Basics of Neural Networks Neural Network Topologies.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Baseband Demodulation/Detection
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Gammachirp Auditory Filter
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
1 Chapter 9 Detection of Spread-Spectrum Signals.
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
Study of Broadband Postbeamformer Interference Canceler Antenna Array Processor using Orthogonal Interference Beamformer Lal C. Godara and Presila Israt.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans.
Subband Coding Jennie Abraham 07/23/2009. Overview Previously, different compression schemes were looked into – (i)Vector Quantization Scheme (ii)Differential.
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Bandpass Modulation & Demodulation Detection
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Cameron Rowe.  Introduction  Purpose  Implementation  Simple Example Problem  Extended Kalman Filters  Conclusion  Real World Examples.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
LIGO- G Z AJW, Caltech, LIGO Project1 A Coherence Function Statistic to Identify Coincident Bursts Surjeet Rajendran, Caltech SURF Alan Weinstein,
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
WAVELET NOISE REMOVAL FROM BASEBAND DIGITAL SIGNALS IN BANDLIMITED CHANNELS Dr. Robert Barsanti SSST March 2010, University of Texas At Tyler.
Sampling Rate Conversion by a Rational Factor, I/D
PART II: TRANSIENT SUPPRESSION. IntroductionIntroduction Cohen, Gannot and Talmon\11 2 Transient Interference Suppression Transient Interference Suppression.
Institute for Experimental Mathematics Ellernstrasse Essen - Germany DATA COMMUNICATION introduction A.J. Han Vinck May 10, 2003.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Speech Enhancement Summer 2009
A 2 veto for Continuous Wave Searches
Applications of Multirate Signal Processing
Vocoders.
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
朝陽科技大學 資訊工程系 謝政勳 Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學 資訊工程系 謝政勳
A maximum likelihood estimation and training on the fly approach
Speech / Non-speech Detection
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University of New South Wales, Sydney, NSW 2052, Australia Reporter : Chen, Hung-Bin ICASSP 2004

Outline Introduction Testing data The Chi-Square test This paper proposes Testing data speech plus noise The Chi-Square test Noise estimation using the Chi-Square test Block diagram of the proposed VAD Experiments and results Summary and Conclusions ICASSP 2004

Introduction - VAD significance advantage proposed system Speech recognition systems designed for real world conditions, a robust discrimination of speech from other sounds is a crucial step. advantage Speech discrimination can also be used for coding or telecommunication applications. proposed system a feature set inspired by investigations of various stages of the auditory system ICASSP 2004

This paper proposes This paper proposes a voice activity detector (VAD) that makes the speech/noise classification by applying the statistical chi-square test to each frame. It also uses a continuous update of the background noise estimate. If the chi-square test determines that they are close, the frame is declared to be noise, otherwise speech. ( but not mean that is a real speech data ) ICASSP 2004

Testing data There are two ways of interpreting noise model: 1. The ‘additive noise’ point of view; i.e. the speech signal is corrupted by additive noise 2. The ‘additive signal’ point of view; i.e. the residual noise signal has speech added to it. Noise model in this paper, the speech signal s(t) is corrupted by uncorrelated additive noise w(t), giving the degraded composite signal y(t): y(t) = s (t) + w(t) . ICASSP 2004

The Chi-Square test In this paper, the chi-square test has been applied to the two problems of noise estimation and voice activity detection. The chi-square test seeks to determine if there is a good fit between the frequencies of the noise observed data and the frequencies of the expected or theoretical data. hypothesis is proposed for each frame: :noise-only frame :noise-pluse-speech frame ICASSP 2004

Noise estimation using the Chi-Square test To estimate the noise with improved spectral sensitivity, the input signal is first passed through a bandpass filterbank H1, …, HM, as shown in Figure. Each of these sub-banded signals is then divided into time frames of size L, where M<<L. ICASSP 2004

Noise estimation using the Chi-Square test Let , be the signal in sub-band k and frame p. The samples in the noise vector , will be given by The noise estimates , from the previous frame, p-1, are first grouped into N bins. The vector o of observation is obtained in the same way from the current signal , , using the same N bins. 1 i K (i) (L) Wk,p ei ICASSP 2004

Noise estimation using the Chi-Square test The chi-square test is then applied to these bins, where the chi-square statistic is given by The hypothesis test can thus be written as effectively Background noise is usually assumed to have a Gaussian distribution. Speech is (relatively) non-stationary and is non-Gaussian. The chi-square test can be used to effectively identify the noise-only segments of the signal. ICASSP 2004

simple one-pole smoothing filter λ if the current frame is determined to be a noise-only frame similar to the current noise estimate, the pdf of the noise is updated using a simple one-pole smoothing filter with smoothing coefficientλ. Testing found that a smoothing coefficient of 0.95 gave the best results. ICASSP 2004

Block diagram of the proposed VAD The proposed VAD consists of three main components: 1. Chi-square based noise estimator (as in Section 3), 2. Noise reduction system, and 3. Chi-square based decision module. A block diagram of the complete system is shown in Figure ICASSP 2004

Implementation Details An analysis frame size of 256 samples was used, with a step size of 64 samples. The optimum band gains were estimated from the spectrally decomposed noisy signal using the Ephraim and Malah gain function with a smoothing parameter of 0.98. Noise estimates were obtained from the chi-square noise estimator described in Section 3. ICASSP 2004

Implementation Details The VAD decision was made using another chisquare detector on the synthesised noise-reduced signal. In it a frame size of 15 ms, i.e. 125 samples, was used with an overlap of 25 samples at a sampling frequency of 8192 Hz. The signal was divided into 8 equal sub-bands using a digital IIR filterbank of elliptic bandpass filters of order 10, with bandwidths of 487.5 Hz. The histogram of the current frame was calculated and divided into 7 classes (or bins)(ei). The first three frames were recursively averaged to provide the starting value of the noise vector. ICASSP 2004

Implementation Details The current input frame was combined with seven previous frames, giving frame sizes of 122 ms for noise estimation. The observed and expected distributions in each sub-band were divided into 7 classes (ei) and tested using the chi-square test. Only those frames in which the null hypothesis was accepted for all 8 sub-bands were declared to be noise. The noise estimate was then updated using a single-sided one-pole recursive filter. Testing found that a smoothing coefficient of 0.95 gave the best results. ICASSP 2004

Testing data The VAD was applied to 12 sentences with added babble, car, pink, and white noises at SNRs of 0, 5, and 10 dB. The chi-square detector manages to pick up short bursts of speech that are only a few hundred samples (20 ms) long. ICASSP 2004

Experimental Results ICASSP 2004

Experimental Results ICASSP 2004

Reference VAD proposed by Sohn et al. [5] The developed VAD employ the decision directed parameter estimation method for the likelihood ratio test. ICASSP 2004

Summary and Conclusions The proposed VAD shows better performances in various environmental conditions, while requires only a few parameters to optimize when compared with the others. Applications such as automatic classification segmentation of animal sounds an efficient encoding of speech and music 藉由耳聽覺描述方式,可以有效的運用於更多的關於聲音和語音的地方 ICASSP 2004