语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Multiplicative Update of AR gains in Codebook- driven Speech.

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Advertisements

VSMC MIMO: A Spectral Efficient Scheme for Cooperative Relay in Cognitive Radio Networks 1.
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Speech Enhancement through Noise Reduction By Yating & Kundan.
Advanced Speech Enhancement in Noisy Environments
Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis 1. EECS Department, Northwestern University 2. Advanced Technology Labs, Adobe Systems Inc. 3. University.
Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10.
Advances in WP1 Turin Meeting – 9-10 March
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.
Advances in WP1 Nancy Meeting – 6-7 July
Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July,
Communications & Multimedia Signal Processing Report of Work on Formant Tracking LP Models and Plans on Integration with Harmonic Plus Noise Model Qin.
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Communications & Multimedia Signal Processing Formant Track Restoration in Train Noisy Speech Qin Yan Communication & Multimedia Signal Processing Group.
Communications & Multimedia Signal Processing 1 Speech Communication for Mobile and Hands-Free Devices in Noisy Environments EPSRC Project GR/S30238/01.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.
Synchronization in Software Radio (Timing Recovery) Presented by: Shima kheradmand.
1 Speech Enhancement Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
EE513 Audio Signals and Systems Wiener Inverse Filter Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Adaptive Signal Processing
Normalised Least Mean-Square Adaptive Filtering
Traffic modeling and Prediction ----Linear Models
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey  Single-channel.
Digital Audio Signal Processing Lecture-4: Noise Reduction Marc Moonen/Alexander Bertrand Dept. E.E./ESAT-STADIUS, KU Leuven
Non Negative Matrix Factorization
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Noise Compensation for Speech Recognition with Arbitrary Additive Noise Ji Ming School of Computer Science Queen’s University Belfast, Belfast BT7 1NN,
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
CHAPTER 4 Adaptive Tapped-delay-line Filters Using the Least Squares Adaptive Filtering.
Nico De Clercq Pieter Gijsenbergh.  Problem  Solutions  Single-channel approach  Multichannel approach  Our assignment Overview.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
In-car Speech Recognition Using Distributed Microphones Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research.
USE OF IMPROVED FEATURE VECTORS IN SPECTRAL SUBTRACTION METHOD Emrah Besci, Semih Ergin, M.Bilginer Gülmezoğlu, Atalay Barkana Osmangazi University, Electrical.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans.
Performance Comparison of Speaker and Emotion Recognition
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 3) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.
PHASE-BASED DUAL-MICROPHONE SPEECH ENHANCEMENT USING A PRIOR SPEECH MODEL Guangji Shi, M.A.Sc. Ph.D. Candidate University of Toronto Research Supervisor:
Statistical Signal Processing Research Laboratory(SSPRL) UT Acoustic Laboratory(UTAL) A TWO-STAGE DATA-DRIVEN SINGLE MICROPHONE SPEECH ENHANCEMENT WITH.
Suppression of Musical Noise Artifacts in Audio Noise Reduction by Adaptive 2D filtering Alexey Lukin AES Member Moscow State University, Moscow, Russia.
Speech Enhancement based on
UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.
Zhaoxia Fu, Yan Han Measurement Volume 45, Issue 4, May 2012, Pages 650–655 Reporter: Jing-Siang, Chen.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Speech Enhancement Algorithm for Digital Hearing Aids
Speech Enhancement Summer 2009
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
朝陽科技大學 資訊工程系 謝政勳 Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學 資訊工程系 謝政勳
A Tutorial on Bayesian Speech Feature Enhancement
Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Emad M. Grais Hakan Erdogan
16. Mean Square Estimation
Combination of Feature and Channel Compensation (1/2)
Speech Enhancement Based on Nonparametric Factor Analysis
Presentation transcript:

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Multiplicative Update of AR gains in Codebook- driven Speech Enhancement Qi He 1, Changchun Bao 1, and Feng Bao 2 1 Beijing University of Technology, China 2 The University of Auckland, New Zealand

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 2 Outline Speech Enhancement Review  Background  Traditional Methods Multiplicative Update of AR Gains in Codebook-driven Speech Enhancement  Estimation of spectral shape of noise  Estimation of AR gains  Bayesian MMSE estimation  Codebook-driven Wiener filter  Experimental Results

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 3 Background Noises exist everywhere Office noise Factory noise Street noise Babble noise

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 4 Speech enhancement applications Mobile phone/ Communication Hearing aids Robust speech / speaker/ language recognition, etc. Background

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 5 Background Speech enhancement aims at  suppressing the noise in noisy speech  improving the quality and intelligibility of enhanced speech Enhanced speech Speech enhancement Speech Noise Noisy (1) where n is the frame index.

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 6 Traditional Methods Traditional speech enhancement methods  Spectral subtraction  Wiener filtering  Subspace method  …… Performance of these methods: For Stationary Noises: Good For Non-Stationary Noises: Bad No a Priori Information

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 7 Traditional Methods Codebook-based methods 1)Codebook-based method using ML estimator [1]. 2)Codebook-based method using Bayesian MMSE estimator [2]. AR: auto-regressive Speech corpusNoise corpus Speech codebook Noise codebook Noisy speech FFT AR gains estimate IFFT Enhanced speech Wiener filter ML or Bayesian MMSE estimation Noisy spectrum

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 8 Traditional Methods Traditional method for AR gains estimation For each pair of code-words from speech and noise codebooks, the corresponding AR gains should be obtained by with (2) observed noisy spectrum modeled noisy spectrum

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 9 Traditional method for AR gains estimation Traditional Methods Since there is no closed-form solution for optimal speech and noise AR gains estimation, the conventional codebook-driven methods indirectly obtain the AR gain estimation based on the log-spectral (LS) distortion, which has a closed-form solution by applying the series expansion. That is with (3)

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 10 Traditional method for AR gains estimation Traditional Methods By differentiating Eq.3 with respect to the AR gains and setting the results to zero, the AR gains can be calculated by After getting the AR gains corresponding to each code-word combination, we can use the following ML estimator or Bayesian MMSE estimator to obtain the AR parameters of speech and noise (4) (5)

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 11 Traditional method for de-noising A Wiener filter constructed by the estimated AR parameters of speech and noise is used to enhance noisy speech. Although the codebook-driven speech enhancement methods are more suitable for eliminating non-stationary noise, there are still some problems to be addressed. 1)Noise classification; 2)The accuracy of gain estimation can be further improved; 3)The residual noise between the harmonics of noisy speech should be further suppressed ; Traditional Methods (6)

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 12 The estimation of spectral shape of noise Proposed Method To solve the problem of noise classification, the spectral shape of noise is estimated online by the Minima Controlled Recursive Averaging(MCRA) algorithm in the proposed method. (7) (8)

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 13 Proposed Method The estimation of AR gains In this paper, we use a multiplicative update rule [3-4] to obtain approximately closed-form solution of IS distortion. Since we only train the shape codebook of speech spectrum offline and the spectral shape of noise is estimated online, for each speech code-word, we can rewrite the modeled noisy spectrum as follows By expressing the Eq.9 in matrix form, we can get: with (9)

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 14 Proposed Method The estimation of AR gains The IS distortion is rewritten as By differentiating Eq.10 with respect to gain matrices, we have [3-4] : The symbol ‘. ’ indicates the point-wise multiplication. By simplifying the above formula, we can get: (10) (11) (12)

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 15 The estimation of AR gains Proposed Method The and are obtained by iterating the following multiplicative rules to minimize the IS distortion: Then we have (13) (14)

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 16 Proposed Method The estimation of AR gains An example of average IS distortion is illustrated in Fig.1. The average IS distortion is defined as follows The N x is the size of speech codebook. The AR gains are estimated by the conventional and proposed methods, respectively. The speech material is corrupted by white noise with the SNR of 5dB. Fig. 1 the average IS distortion comparison

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 17 Proposed Method Bayesian MMSE estimation Let θ x denote the random variable corresponding to the speech AR coefficients. And let g x and g w denote the random variables corresponding to the speech and noise AR gains, respectively. Let θ=[θ x, g x, g w ] denote the set of random variables. After getting each, the desired Bayesian MMSE estimate can be written as follows with and (15)

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Fig. 2 AR gain estimation of clean speech Proposed Method

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 19 Proposed Method Modified codebook-driven Wiener filter Conventional codebook-driven Wiener filter is constructed by the estimated spectral envelopes of speech and noise, which usually causes an inaccurate fitting for the spectra between the harmonics of speech. Consequently, the residual noise still remains between the harmonics of the enhanced speech. In this section, we introduce the SPP to modify the traditional codebook-driven Wiener filter for suppressing the residual noise. with where and (16)

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 20 Performance Evaluation Enhancement Methods Average PESQ 0dB5dB10dB Noisy ML-CB [1] MMSE-CB [2] Proposed Four types of noise: white, babble, office, and street The test materials : 9 utterances from 4 female speakers and 5 male speakers. The sampling rate: 8KHz The size of speech codebook: 6bit Experiments TABLE.1. TEST RESULTS OF PESQ

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 21 Enhancement Methods Average LSD 0dB5dB10dB Noisy ML-CB [1] MMSE-CB [2] Proposed Enhancement Methods Average SSNR Improvement 0dB5dB10dB Noisy-- ML-CB [1] MMSE-CB [2] Proposed Performance Evaluation TABLE.2. TEST RESULTS OF SSNR IMPROVEMENT TABLE.3. TEST RESULTS OF LSD

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 22 Demos (a)clean speech (b)noisy speech (white noise, SNR=10dB), (c)enhanced speech using ML-CB, (d)enhanced speech using MMSE-CB, (e)enhanced speech using our method without SPP, (f)enhanced speech using our method with SPP.

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 23 Demos (a)clean speech (b)noisy speech (babble noise, SNR=10dB), (c)enhanced speech using ML-CB, (d)enhanced speech using MMSE-CB, (e)enhanced speech using our method without SPP, (f)enhanced speech using our method with SPP.

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 24 References [1] S. Srinivasan, J. Samuelsson, and W. B. Kleijn, “Codebook driven short-term predictor parameter estimation for speech enhancement”, IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 163–176, Jan [2] S. Srinivasan, J. Samuelsson, and W. B. Kleijn, “Codebook-based Bayesian speech enhancement for nonstationary environments”, IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 2, pp. 441–452,Feb [3] Daniel D. Lee and H. Sebastian Seung, “Algorithms for non-negative matrix factorization,” in NIPS, 2000, pp. 556–562. [4] C. Févotte, N. Bertin, and J. L. Durrieu, “Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis,” Neural Comput., vol. 21, pp. 793–830, 2009.

语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Q & A