Speech Enhancement with Binaural Cues Derived from a Priori Codebook Students and Teachers, good afternoon. I am glad to have the chance to give my presentation here. Today I would like to talk to you about some of our work in the field of the codebook-based speech enhancement. The tile of my presentation is”…” Reporter:Nan Chen Beijing University of Technology http://www.bjut.edu.cn/sci/voice/index.htm
Results and Conclusions Contents Introduction 1 The Proposed Method 2 Results and Conclusions 3 4 I’d like to give this presentation in three parts. At first, I want to talk about the introduction of the presentation. Then the proposed method is described in detail. At last, we give the experimental results and the conclusions are summarized here. http://www.bjut.edu.cn/sci/voice/index.htm
Introduction 1 http://www.bjut.edu.cn/sci/voice/index.htm
Noise Introduction Street Car Babble office http://www.bjut.edu.cn/sci/voice/index.htm
The traditional method of speech enhancement Introduction Spectral-Subtractive Algorithms Wiener Filtering Statistical-Model-Based Methods Subspace Algorithms 1 2 3 4 The traditional method of speech enhancement Until now the monaural speech enhancement is a challenging task for speech communication, such as speech coding and speech recognition, . The traditional method …have obtained a good performance for stationary noise, but the performance of these methods become worse when the non-stationary noise is introduced. The reason why this problem happens is that we cannot gain the accurate noise estimation from the noisy observation. If we can know some prior information about speech and noise in advance, the performance will be better. http://www.bjut.edu.cn/sci/voice/index.htm
Introduction Binaural Cue Coding(BCC) Framework Purpose: recovering the perception of the original input signals BCC analysis: extract the side information of input signals BCC synthesis: recover the input signals by making use of the side information and the mono signal Now I want to talk some about BCC framework. The figure 1 show the BCC framework. The purpose of BCC is …,From figure 1, we can know that… Figure 1 :Block diagram of analysis and synthesis for BCC http://www.bjut.edu.cn/sci/voice/index.htm
Introduction Once the Discrete Fourier transform (DFT) coefficients of mono signal is known, the DFT coefficients of each output channel Sc,k can be calculated as Where is the ICLD between channel 1 and channel c for the nth sub-band. , is a random variable which is controlled by ICC (1) (2) As can be seen in figure 1,,,, where f is used to determine a level modification of DFT coefficients, c is the index of the channel and n is the frequency index (3) http://www.bjut.edu.cn/sci/voice/index.htm
Introduction BCC : recovering the perception of the original input signals. speech enhancement : separate clean signal from the noisy signal. The BCC principle is introduced to estimate the clean signal. The noisy speech is enhanced by BCC principle where the channel 1 is assumed as the clean speech and the channel 2 is regarded as the noise. Clean speech Clean speech Noisy speech Noise Noise BCC aims at recovering the perception of the original input signals. Meanwhile, the main purpose of speech enhancement is to separate clean signal from the noisy signal. Due to this, we introduce the technique of BCC to the procedure of monaural speech enhancement 。。。But we need to find the appropriate side information. http://www.bjut.edu.cn/sci/voice/index.htm
The Proposed Method 2 4 http://www.bjut.edu.cn/sci/voice/index.htm
The Proposed Method Side Information The Clean Cue speech and noise level difference (SNLD) speech and noise correlation (SNC) The Pre-enhanced Cue pre-enhanced speech and noise level difference (PNLD) pre-enhanced speech and noise correlation(PNC) posterior SNR (PSNR) speech presence probability (SPP) In BCC scheme, the binaural cues are considered as side information, but here. the clean cue, which is corresponding the binaural cues, can not be got directly. We obtain the clean cue through the pre-enhanced cue. So in out method, the side information contain the clean cue and pre---enhanced cue. the clean cue is …,the pre-enhanced cue is … http://www.bjut.edu.cn/sci/voice/index.htm
The Proposed Method Figure 2 describes the proposed method. We can see that The proposed method have two parts. One is offline training stage and the other is online enhancing stage. At training stage, the pre-enhanced speech is obtained through pre-processing. Then we can get the pre-enhanced cue. The clean cue is extracted from clean speech. And the noisy speech and the clean speech is one-to-one corresponding. At last, the pre-enhanced cue and the clean cue is used to train the codebook. At enhancing stage, we obtain the pre-enhanced speech first, then the online clean cue is estimated by weight codebook mapping with the trained codebook and online pre-enhanced cue. Figure 2: Block diagram of the proposed monaural speech enhancement method http://www.bjut.edu.cn/sci/voice/index.htm
The Proposed Method weighted codebook mapping algorithm Figure 3 shows the scheme of weighted codebook mapping (WCBM) algorithm. Figure 3: Block diagram of the weighted codebook mapping http://www.bjut.edu.cn/sci/voice/index.htm
The Proposed Method Estimation of the clean cue: 1) By comparing the Euclidean distance (ED) between the online pre-enhanced cue and the trained pre-enhanced cue, we can choose M code-vectors with relative small ED from the trained codebook. 2) calculate the degree of membership ρ of the chosen code-vectors 3) the weight of each chosen code-vector can be defined as 4) the online clean cue is obtained by weighting the trained clean cue stored in the chosen code-vector. (4) The way to …by wcm algorithm is introduced here. (5) http://www.bjut.edu.cn/sci/voice/index.htm
The Proposed Method Speech Enhancement: According to the BCC principle, we have: where is a random function with zero mean and constant variance. Finally, the noisy speech is enhanced by: (6) (7) after we get the online clean cue, which contain the speech and noise level deffirents and speech and noise correlation. We can enhance the noisy speech. (8) http://www.bjut.edu.cn/sci/voice/index.htm
Results and Conclusions 3 4 http://www.bjut.edu.cn/sci/voice/index.htm
Results SSNR: This table shows the result of the segmental SNR improvement under different input SNR conditions in various noise. denotes the MMSE spectral amplitude estimate method And Ref. B indicates the codebook-based MMSE method From this table, we can see that the proposed method could get a better performance than the other two references in most cases. http://www.bjut.edu.cn/sci/voice/index.htm
Results PESQ: This table gives the test results of PESQ But we can find that the proposed method performs better than the other two references, especially under the noisy condition with high inut SNR. http://www.bjut.edu.cn/sci/voice/index.htm
Results LSD: In table 3, we show the test results of log spectrum distance. According to the results in table 3, the proposed method performs better than Ref. A. However, compared to Ref. B, it cannot have good performance in some noisy conditions. Ref. B models the spectral envelope, which makes it have a good performance in this table. http://www.bjut.edu.cn/sci/voice/index.htm
Results 5dB babble clean Ref.A poposed Ref.B These are some demos for the reference and the two proposed methods, I will show you in the end. poposed Ref.B http://www.bjut.edu.cn/sci/voice/index.htm
Results 10dB babble clean Ref.A Ref.B poposed http://www.bjut.edu.cn/sci/voice/index.htm
Conclusions We enhance the noisy speech by modeling the spectral detail, which is the reason why it can reduce the noise between harmonics. The noise classification is cancelled because we introduce the binaural cues, which are not correlated with the type of noise, as priori information. In my presentation, we present two contributions http://www.bjut.edu.cn/sci/voice/index.htm
Thank You! http://www.bjut.edu.cn/sci/voice/index.htm