Download presentation
Presentation is loading. Please wait.
Published byLizbeth Wilkins Modified over 8 years ago
2
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab http://www.bjut.edu.cn/sci/voice/index.htm Multiplicative Update of AR gains in Codebook- driven Speech Enhancement Qi He 1, Changchun Bao 1, and Feng Bao 2 1 Beijing University of Technology, China 2 The University of Auckland, New Zealand 2016-3-25
3
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 2 http://www.bjut.edu.cn/sci/voice/index.htm Outline Speech Enhancement Review Background Traditional Methods Multiplicative Update of AR Gains in Codebook-driven Speech Enhancement Estimation of spectral shape of noise Estimation of AR gains Bayesian MMSE estimation Codebook-driven Wiener filter Experimental Results
4
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 3 http://www.bjut.edu.cn/sci/voice/index.htm Background Noises exist everywhere Office noise Factory noise Street noise Babble noise
5
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 4 http://www.bjut.edu.cn/sci/voice/index.htm Speech enhancement applications Mobile phone/ Communication Hearing aids Robust speech / speaker/ language recognition, etc. Background
6
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 5 http://www.bjut.edu.cn/sci/voice/index.htm Background Speech enhancement aims at suppressing the noise in noisy speech improving the quality and intelligibility of enhanced speech Enhanced speech Speech enhancement Speech Noise Noisy (1) where n is the frame index.
7
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 6 http://www.bjut.edu.cn/sci/voice/index.htm Traditional Methods Traditional speech enhancement methods Spectral subtraction Wiener filtering Subspace method …… Performance of these methods: For Stationary Noises: Good For Non-Stationary Noises: Bad No a Priori Information
8
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 7 http://www.bjut.edu.cn/sci/voice/index.htm Traditional Methods Codebook-based methods 1)Codebook-based method using ML estimator [1]. 2)Codebook-based method using Bayesian MMSE estimator [2]. AR: auto-regressive Speech corpusNoise corpus Speech codebook Noise codebook Noisy speech FFT AR gains estimate IFFT Enhanced speech Wiener filter ML or Bayesian MMSE estimation Noisy spectrum
9
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 8 http://www.bjut.edu.cn/sci/voice/index.htm Traditional Methods Traditional method for AR gains estimation For each pair of code-words from speech and noise codebooks, the corresponding AR gains should be obtained by with (2) observed noisy spectrum modeled noisy spectrum
10
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 9 http://www.bjut.edu.cn/sci/voice/index.htm Traditional method for AR gains estimation Traditional Methods Since there is no closed-form solution for optimal speech and noise AR gains estimation, the conventional codebook-driven methods indirectly obtain the AR gain estimation based on the log-spectral (LS) distortion, which has a closed-form solution by applying the series expansion. That is with (3)
11
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 10 http://www.bjut.edu.cn/sci/voice/index.htm Traditional method for AR gains estimation Traditional Methods By differentiating Eq.3 with respect to the AR gains and setting the results to zero, the AR gains can be calculated by After getting the AR gains corresponding to each code-word combination, we can use the following ML estimator or Bayesian MMSE estimator to obtain the AR parameters of speech and noise (4) (5)
12
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 11 http://www.bjut.edu.cn/sci/voice/index.htm Traditional method for de-noising A Wiener filter constructed by the estimated AR parameters of speech and noise is used to enhance noisy speech. Although the codebook-driven speech enhancement methods are more suitable for eliminating non-stationary noise, there are still some problems to be addressed. 1)Noise classification; 2)The accuracy of gain estimation can be further improved; 3)The residual noise between the harmonics of noisy speech should be further suppressed ; Traditional Methods (6)
13
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 12 http://www.bjut.edu.cn/sci/voice/index.htm The estimation of spectral shape of noise Proposed Method To solve the problem of noise classification, the spectral shape of noise is estimated online by the Minima Controlled Recursive Averaging(MCRA) algorithm in the proposed method. (7) (8)
14
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 13 http://www.bjut.edu.cn/sci/voice/index.htm Proposed Method The estimation of AR gains In this paper, we use a multiplicative update rule [3-4] to obtain approximately closed-form solution of IS distortion. Since we only train the shape codebook of speech spectrum offline and the spectral shape of noise is estimated online, for each speech code-word, we can rewrite the modeled noisy spectrum as follows By expressing the Eq.9 in matrix form, we can get: with (9)
15
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 14 http://www.bjut.edu.cn/sci/voice/index.htm Proposed Method The estimation of AR gains The IS distortion is rewritten as By differentiating Eq.10 with respect to gain matrices, we have [3-4] : The symbol ‘. ’ indicates the point-wise multiplication. By simplifying the above formula, we can get: (10) (11) (12)
16
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 15 http://www.bjut.edu.cn/sci/voice/index.htm The estimation of AR gains Proposed Method The and are obtained by iterating the following multiplicative rules to minimize the IS distortion: Then we have (13) (14)
17
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 16 http://www.bjut.edu.cn/sci/voice/index.htm Proposed Method The estimation of AR gains An example of average IS distortion is illustrated in Fig.1. The average IS distortion is defined as follows The N x is the size of speech codebook. The AR gains are estimated by the conventional and proposed methods, respectively. The speech material is corrupted by white noise with the SNR of 5dB. Fig. 1 the average IS distortion comparison
18
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 17 http://www.bjut.edu.cn/sci/voice/index.htm Proposed Method Bayesian MMSE estimation Let θ x denote the random variable corresponding to the speech AR coefficients. And let g x and g w denote the random variables corresponding to the speech and noise AR gains, respectively. Let θ=[θ x, g x, g w ] denote the set of random variables. After getting each, the desired Bayesian MMSE estimate can be written as follows with and (15)
19
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Fig. 2 AR gain estimation of clean speech Proposed Method
20
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 19 http://www.bjut.edu.cn/sci/voice/index.htm Proposed Method Modified codebook-driven Wiener filter Conventional codebook-driven Wiener filter is constructed by the estimated spectral envelopes of speech and noise, which usually causes an inaccurate fitting for the spectra between the harmonics of speech. Consequently, the residual noise still remains between the harmonics of the enhanced speech. In this section, we introduce the SPP to modify the traditional codebook-driven Wiener filter for suppressing the residual noise. with where and (16)
21
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 20 http://www.bjut.edu.cn/sci/voice/index.htm Performance Evaluation Enhancement Methods Average PESQ 0dB5dB10dB Noisy 1.872.202.55 ML-CB [1] 2.052.452.71 MMSE-CB [2] 2.332.642.90 Proposed 2.402.763.06 Four types of noise: white, babble, office, and street The test materials : 9 utterances from 4 female speakers and 5 male speakers. The sampling rate: 8KHz The size of speech codebook: 6bit Experiments TABLE.1. TEST RESULTS OF PESQ
22
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 21 http://www.bjut.edu.cn/sci/voice/index.htm Enhancement Methods Average LSD 0dB5dB10dB Noisy 14.5712.6610.88 ML-CB [1] 10.688.997.92 MMSE-CB [2] 9.097.736.46 Proposed 7.416.115.15 Enhancement Methods Average SSNR Improvement 0dB5dB10dB Noisy-- ML-CB [1] 9.748.717.59 MMSE-CB [2] 13.2311.9710.69 Proposed 16.4215.4614.16 Performance Evaluation TABLE.2. TEST RESULTS OF SSNR IMPROVEMENT TABLE.3. TEST RESULTS OF LSD
23
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 22 http://www.bjut.edu.cn/sci/voice/index.htm Demos (a)clean speech (b)noisy speech (white noise, SNR=10dB), (c)enhanced speech using ML-CB, (d)enhanced speech using MMSE-CB, (e)enhanced speech using our method without SPP, (f)enhanced speech using our method with SPP.
24
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 23 http://www.bjut.edu.cn/sci/voice/index.htm Demos (a)clean speech (b)noisy speech (babble noise, SNR=10dB), (c)enhanced speech using ML-CB, (d)enhanced speech using MMSE-CB, (e)enhanced speech using our method without SPP, (f)enhanced speech using our method with SPP.
25
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab 24 http://www.bjut.edu.cn/sci/voice/index.htm References [1] S. Srinivasan, J. Samuelsson, and W. B. Kleijn, “Codebook driven short-term predictor parameter estimation for speech enhancement”, IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 163–176, Jan.2006. [2] S. Srinivasan, J. Samuelsson, and W. B. Kleijn, “Codebook-based Bayesian speech enhancement for nonstationary environments”, IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 2, pp. 441–452,Feb. 2007 [3] Daniel D. Lee and H. Sebastian Seung, “Algorithms for non-negative matrix factorization,” in NIPS, 2000, pp. 556–562. [4] C. Févotte, N. Bertin, and J. L. Durrieu, “Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis,” Neural Comput., vol. 21, pp. 793–830, 2009.
26
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab http://www.bjut.edu.cn/sci/voice/index.htm Q & A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.