Presentation is loading. Please wait.

Presentation is loading. Please wait.

Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.

Similar presentations


Presentation on theme: "Noise Reduction Two Stage Mel-Warped Weiner Filter Approach."— Presentation transcript:

1 Noise Reduction Two Stage Mel-Warped Weiner Filter Approach

2 Intellectual Property Advanced front-end feature extraction algorithm ETSI ES 202 050 V1.1.3 (2003-11) European Telecommunications Standards Institute ETSI Technical Committee Speech Processing, Transmission and Quality Aspects (STQ). Advanced front-end feature extraction algorithm ETSI ES 202 050 V1.1.3 (2003-11) European Telecommunications Standards Institute ETSI Technical Committee Speech Processing, Transmission and Quality Aspects (STQ).

3 Noise Reduction Based on Weiner filter theory Noise reduction is performed in two stages Input signal is de-noised in the first stage. Second stage – dynamic noise reduction based on SNR of processed signal Based on Weiner filter theory Noise reduction is performed in two stages Input signal is de-noised in the first stage. Second stage – dynamic noise reduction based on SNR of processed signal

4 First Stage Spectrum Estimation PSD Mean WF Design Mel Filter-Bank Mel IDCT Apply Filter VADNest To Second Stage

5 Second Stage Spectrum Estimation PSD Mean WF Design Mel Filter-Bank Gain Factorization Mel IDCT Apply Filter From First Stage OFF Output

6 Buffering Buffer 1Buffer 2 01230123 ABCDEFGH BCD new FGH De-noised (1 st Stage) De-noised (output) 1 frame = 80 samples 1 buffer = 4 frames A De-noised (output)

7 Spectrum Estimation Input signal is divided into overlapping frames of N in = 200 samples. A 25ms frame length and 10ms frame shift (80 samples) are used. Each frame S w (n) is windowed with a Hanning window of length N in. Input signal is divided into overlapping frames of N in = 200 samples. A 25ms frame length and 10ms frame shift (80 samples) are used. Each frame S w (n) is windowed with a Hanning window of length N in.

8 Spectrum Estimation where Padding from N in up to N FFT -1, N FFT = 256

9 Spectrum Estimation Frequency representation: Power spectrum: Smoothing:

10 Power Spectral Density Mean Compute for each P in (bin) the mean over the last T PSD = 2 frames.

11 Wiener Filter Design A forgetting factor (weight) is computed for each frame, λ NSE. If (t < 100 frames) λ NSE = 1 – 1/t else λ NSE = 0.99

12 Wiener Filter Design First stage noise spectrum estimate is updated based on VAD flag: If flag = 0 P 1/2 noise (bin,t n ) = min(λ NSE ● P 1/2 noise (bin,t n -1)+(1- λ NSE ) ● PSD mean,exp(-10)) If flag = 1 P 1/2 noise (bin,t) = P 1/2 noise (bin,t n ) (last non speech frame) First stage noise spectrum estimate is updated based on VAD flag: If flag = 0 P 1/2 noise (bin,t n ) = min(λ NSE ● P 1/2 noise (bin,t n -1)+(1- λ NSE ) ● PSD mean,exp(-10)) If flag = 1 P 1/2 noise (bin,t) = P 1/2 noise (bin,t n ) (last non speech frame)

13 Wiener Filter Design Second stage is updated permanently: If (t < 11) P noise (bin,t) = λ NSE ● P noise (bin,t n -1)+(1- λ NSE ) ● PSD mean else update = 0.9 + 0.1×P inPSD (bin,t)/(P inPSD (bin,t)+ P noise (bin,t-1) ) ×(1+1/(1+0.1×P inPSD (bin,t) /(P inPSD (bin,t-1))) P noise (bin,t) = P noise (bin,t-1)×update Second stage is updated permanently: If (t < 11) P noise (bin,t) = λ NSE ● P noise (bin,t n -1)+(1- λ NSE ) ● PSD mean else update = 0.9 + 0.1×P inPSD (bin,t)/(P inPSD (bin,t)+ P noise (bin,t-1) ) ×(1+1/(1+0.1×P inPSD (bin,t) /(P inPSD (bin,t-1))) P noise (bin,t) = P noise (bin,t-1)×update

14 Wiener Filter Design Noiseless spectrum is estimated: P 1/2 den (bin,t) = 0.98×P 1/2 den (bin,t-1)+(1-0.98)×T[PSD mean -P 1/2 noise (bin,t) ] where the threshold function T is Noiseless spectrum is estimated: P 1/2 den (bin,t) = 0.98×P 1/2 den (bin,t-1)+(1-0.98)×T[PSD mean -P 1/2 noise (bin,t) ] where the threshold function T is

15 Wiener Filter Design The priori SNR is calculated: The filter transfer function is

16 Wiener Filter Design The filter transfer function is used to improve noiseless signal estimation: The improved priori SNR is:

17 Voice Activity Detection VAD is used to detect noise frames Find frame energy: VAD is used to detect noise frames Find frame energy: If frame threshold < 10 long term energy factor ( LTE ) = 1 - 1/t Else LTE = 0.97; Calculate frame energy:

18 Voice Activity Detection Use frame energy to update mean energy: If frame energy - mean energy < 20 (SNR threshold) or t < 10 Then if (frameEn < meanEn) or (t < 10) meanEn = meanEn + (1 - LTE ) * (frameEn - meanEn) ElsemeanEn = meanEn+(1 - 0.99) * (frameEn - meanEn) If (meanEn < 80) meanEn = 80

19 Voice Activity Detection Is the current frame speech? If t > 4 if (frameEn - meanEn) > 15 IT IS SPEECH nbSpeechFrame++ else if nbSpeechFrame > 4 hangover = 15, nbSpeechFrame = 0 if (hangover != 0) IT IS SPEECH else IT IS NOT SPEECH

20 Mel Filter Bank The linear frequency Weiner filter coefficients are smoothed and transformed to the Mel- frequency scale. The mel scale is a scale of pitches judged by listeners to be equal in distance one from another. The linear frequency Weiner filter coefficients are smoothed and transformed to the Mel- frequency scale. The mel scale is a scale of pitches judged by listeners to be equal in distance one from another.

21 Mel IDCT The time-domain impulse response of the Wiener filter is computed from the Mel-Wiener filter coefficients by using Mel-warped inverse Discrete Cosine Transform:

22 Gain Factorization Factorization of the Wiener filter Mel-warped coefficients is performed to control the aggression of noise reduction in the second stage. The de-noised frame signal energy is calculated as: Factorization of the Wiener filter Mel-warped coefficients is performed to control the aggression of noise reduction in the second stage. The de-noised frame signal energy is calculated as:

23 Gain Factorization The noise energy of the current frame is estimated as:

24 Gain Factorization The smoothed SNR is evaluated using 3 de- noised frame energies and the noise energy If (Ratio > 0.0001) Then SNR avg (t) = 6.67 × log 10 (Ratio) Else SNR avg (t) = -33.3

25 Gain Factorization To decide the degree of aggression, the SNR is tracked: If {(SNR avg (t) – SNR low-track (t-1)) < 10 ort < 10} calculate λ SNR (t) SNR low-track (t) = λ SNR (t)× SNR low-track (t -1)+(1- λ SNR (t))×SNR avg (t) Else SNR low-track (t) = SNR low-track (t -1)

26 Gain Factorization Gain factorization applies more aggressive noise reduction to purely noisy frames and less to frames containing speech. The aggression coefficient takes on a value of 10% for speech + noise frames and 80% for noise frames. Gain factorization applies more aggressive noise reduction to purely noisy frames and less to frames containing speech. The aggression coefficient takes on a value of 10% for speech + noise frames and 80% for noise frames.

27 Apply Filter The causal impulse response is obtained, truncated and weighted by a Hanning window. The input signal is filtered with the filter impulse response to produce the noise-reduced signal. The causal impulse response is obtained, truncated and weighted by a Hanning window. The input signal is filtered with the filter impulse response to produce the noise-reduced signal.

28 Offset Compensation A filter is used to remove the DC offset over the frame length interval (80 samples). Where Snr is the noise reduced signal

29 Results Noisy test file: After de-noise:

30 Results Footloose: Not Footloose:

31 Results: why didn’t this work? Hair dryer: Still there?!?!:

32 Results Hair dryer: Gone:


Download ppt "Noise Reduction Two Stage Mel-Warped Weiner Filter Approach."

Similar presentations


Ads by Google