Download presentation
Presentation is loading. Please wait.
Published byAlexina Allison Modified over 9 years ago
1
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach
2
Intellectual Property Advanced front-end feature extraction algorithm ETSI ES 202 050 V1.1.3 (2003-11) European Telecommunications Standards Institute ETSI Technical Committee Speech Processing, Transmission and Quality Aspects (STQ). Advanced front-end feature extraction algorithm ETSI ES 202 050 V1.1.3 (2003-11) European Telecommunications Standards Institute ETSI Technical Committee Speech Processing, Transmission and Quality Aspects (STQ).
3
Noise Reduction Based on Weiner filter theory Noise reduction is performed in two stages Input signal is de-noised in the first stage. Second stage – dynamic noise reduction based on SNR of processed signal Based on Weiner filter theory Noise reduction is performed in two stages Input signal is de-noised in the first stage. Second stage – dynamic noise reduction based on SNR of processed signal
4
First Stage Spectrum Estimation PSD Mean WF Design Mel Filter-Bank Mel IDCT Apply Filter VADNest To Second Stage
5
Second Stage Spectrum Estimation PSD Mean WF Design Mel Filter-Bank Gain Factorization Mel IDCT Apply Filter From First Stage OFF Output
6
Buffering Buffer 1Buffer 2 01230123 ABCDEFGH BCD new FGH De-noised (1 st Stage) De-noised (output) 1 frame = 80 samples 1 buffer = 4 frames A De-noised (output)
7
Spectrum Estimation Input signal is divided into overlapping frames of N in = 200 samples. A 25ms frame length and 10ms frame shift (80 samples) are used. Each frame S w (n) is windowed with a Hanning window of length N in. Input signal is divided into overlapping frames of N in = 200 samples. A 25ms frame length and 10ms frame shift (80 samples) are used. Each frame S w (n) is windowed with a Hanning window of length N in.
8
Spectrum Estimation where Padding from N in up to N FFT -1, N FFT = 256
9
Spectrum Estimation Frequency representation: Power spectrum: Smoothing:
10
Power Spectral Density Mean Compute for each P in (bin) the mean over the last T PSD = 2 frames.
11
Wiener Filter Design A forgetting factor (weight) is computed for each frame, λ NSE. If (t < 100 frames) λ NSE = 1 – 1/t else λ NSE = 0.99
12
Wiener Filter Design First stage noise spectrum estimate is updated based on VAD flag: If flag = 0 P 1/2 noise (bin,t n ) = min(λ NSE ● P 1/2 noise (bin,t n -1)+(1- λ NSE ) ● PSD mean,exp(-10)) If flag = 1 P 1/2 noise (bin,t) = P 1/2 noise (bin,t n ) (last non speech frame) First stage noise spectrum estimate is updated based on VAD flag: If flag = 0 P 1/2 noise (bin,t n ) = min(λ NSE ● P 1/2 noise (bin,t n -1)+(1- λ NSE ) ● PSD mean,exp(-10)) If flag = 1 P 1/2 noise (bin,t) = P 1/2 noise (bin,t n ) (last non speech frame)
13
Wiener Filter Design Second stage is updated permanently: If (t < 11) P noise (bin,t) = λ NSE ● P noise (bin,t n -1)+(1- λ NSE ) ● PSD mean else update = 0.9 + 0.1×P inPSD (bin,t)/(P inPSD (bin,t)+ P noise (bin,t-1) ) ×(1+1/(1+0.1×P inPSD (bin,t) /(P inPSD (bin,t-1))) P noise (bin,t) = P noise (bin,t-1)×update Second stage is updated permanently: If (t < 11) P noise (bin,t) = λ NSE ● P noise (bin,t n -1)+(1- λ NSE ) ● PSD mean else update = 0.9 + 0.1×P inPSD (bin,t)/(P inPSD (bin,t)+ P noise (bin,t-1) ) ×(1+1/(1+0.1×P inPSD (bin,t) /(P inPSD (bin,t-1))) P noise (bin,t) = P noise (bin,t-1)×update
14
Wiener Filter Design Noiseless spectrum is estimated: P 1/2 den (bin,t) = 0.98×P 1/2 den (bin,t-1)+(1-0.98)×T[PSD mean -P 1/2 noise (bin,t) ] where the threshold function T is Noiseless spectrum is estimated: P 1/2 den (bin,t) = 0.98×P 1/2 den (bin,t-1)+(1-0.98)×T[PSD mean -P 1/2 noise (bin,t) ] where the threshold function T is
15
Wiener Filter Design The priori SNR is calculated: The filter transfer function is
16
Wiener Filter Design The filter transfer function is used to improve noiseless signal estimation: The improved priori SNR is:
17
Voice Activity Detection VAD is used to detect noise frames Find frame energy: VAD is used to detect noise frames Find frame energy: If frame threshold < 10 long term energy factor ( LTE ) = 1 - 1/t Else LTE = 0.97; Calculate frame energy:
18
Voice Activity Detection Use frame energy to update mean energy: If frame energy - mean energy < 20 (SNR threshold) or t < 10 Then if (frameEn < meanEn) or (t < 10) meanEn = meanEn + (1 - LTE ) * (frameEn - meanEn) ElsemeanEn = meanEn+(1 - 0.99) * (frameEn - meanEn) If (meanEn < 80) meanEn = 80
19
Voice Activity Detection Is the current frame speech? If t > 4 if (frameEn - meanEn) > 15 IT IS SPEECH nbSpeechFrame++ else if nbSpeechFrame > 4 hangover = 15, nbSpeechFrame = 0 if (hangover != 0) IT IS SPEECH else IT IS NOT SPEECH
20
Mel Filter Bank The linear frequency Weiner filter coefficients are smoothed and transformed to the Mel- frequency scale. The mel scale is a scale of pitches judged by listeners to be equal in distance one from another. The linear frequency Weiner filter coefficients are smoothed and transformed to the Mel- frequency scale. The mel scale is a scale of pitches judged by listeners to be equal in distance one from another.
21
Mel IDCT The time-domain impulse response of the Wiener filter is computed from the Mel-Wiener filter coefficients by using Mel-warped inverse Discrete Cosine Transform:
22
Gain Factorization Factorization of the Wiener filter Mel-warped coefficients is performed to control the aggression of noise reduction in the second stage. The de-noised frame signal energy is calculated as: Factorization of the Wiener filter Mel-warped coefficients is performed to control the aggression of noise reduction in the second stage. The de-noised frame signal energy is calculated as:
23
Gain Factorization The noise energy of the current frame is estimated as:
24
Gain Factorization The smoothed SNR is evaluated using 3 de- noised frame energies and the noise energy If (Ratio > 0.0001) Then SNR avg (t) = 6.67 × log 10 (Ratio) Else SNR avg (t) = -33.3
25
Gain Factorization To decide the degree of aggression, the SNR is tracked: If {(SNR avg (t) – SNR low-track (t-1)) < 10 ort < 10} calculate λ SNR (t) SNR low-track (t) = λ SNR (t)× SNR low-track (t -1)+(1- λ SNR (t))×SNR avg (t) Else SNR low-track (t) = SNR low-track (t -1)
26
Gain Factorization Gain factorization applies more aggressive noise reduction to purely noisy frames and less to frames containing speech. The aggression coefficient takes on a value of 10% for speech + noise frames and 80% for noise frames. Gain factorization applies more aggressive noise reduction to purely noisy frames and less to frames containing speech. The aggression coefficient takes on a value of 10% for speech + noise frames and 80% for noise frames.
27
Apply Filter The causal impulse response is obtained, truncated and weighted by a Hanning window. The input signal is filtered with the filter impulse response to produce the noise-reduced signal. The causal impulse response is obtained, truncated and weighted by a Hanning window. The input signal is filtered with the filter impulse response to produce the noise-reduced signal.
28
Offset Compensation A filter is used to remove the DC offset over the frame length interval (80 samples). Where Snr is the noise reduced signal
29
Results Noisy test file: After de-noise:
30
Results Footloose: Not Footloose:
31
Results: why didn’t this work? Hair dryer: Still there?!?!:
32
Results Hair dryer: Gone:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.