Download presentation
Presentation is loading. Please wait.
Published byShana James Modified over 9 years ago
1
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method
2
Process Flow
3
Segmenting of Signal The sample is divided into frames whose length is equal to 25ms with a shift percentage of 40% or 10ms. The Window Length is equal to the 25ms times the Sampling Frequency. – Example Sampling Frequency is equal to 8000 samples/s Window Length = 0.025s * 8000 samples/s = 200 samples Each frame is then windowed using a Hamming window.
4
Initial Silence Segments The initial silence or speech inactivity period is assumed to be 250ms. – This is to allow for a sufficient amount of data to be analyzed for the Noise Spectrum prior to attempting Voice Activity Detection (VAD). The Number of Initial Silence Segments (NISS) = (Initial Silence * Sampling Frequency - Window Length)/(Shift Percentage* Window Length). – Example Using our previous values. NISS = (0.25s * 8000 samples/s - 200 samples)/0.4*200 samples = 22.5. – The value is rounded down to the nearest whole number
5
Phase Calculation using FFT The Fast Fourier Transform of each frame is calculated. The phase component of the FFT is calculated for use in reconstruction of the enhanced signal.
6
Noise Power Spectrum An initial Noise Power Spectrum and the Noise Power Spectrum Variance (λ d ) is calculated using the mean values of the FFT for the NISS. For each frame in the NISS, the Noise Power Spectrum and the Noise Power Spectrum Variance are updated. The frames after the NISS are evaluated using a Voice Activity Detector (VAD) which utilizes the Noise Power Spectrum. – If the frames are determined to contain only noise, then the Noise Power Spectrum and the Noise Power Spectrum Variance are updated.
7
Signal to Noise Ratio Using the Noise Power Spectrum, the a priori SNR (ξ k ) and the a posteriori SNR (γ k ) are calculated. a priori SNR: – γ k =R k 2 /λ d (k) where R k is the modulus of the signal plus noise resultant spectral component a posteriori SNR – ξ k (n)=αG 2 γ k (n-1)+(1- α)P [γ k (n)-1] where α = 0.99 and is a smoothing factor. and G is the Gain Function from the MMSE and P[x] is defined as x if x>0 or 0 otherwise
8
Gain Calculation The gain (G) of the signal is then updated using the Signal to Noise Ratios. – G= ξ k /(1- ξ k )e (η/2) Where η= λ d ξ k /(1- ξ k )
9
Signal Enhancement and Reconstruction The signal is then cleaned by combining the FFT of each frame with the gain. The signal is reconstructed using the overlap add method utilizing the phase of the FFT.
10
Sample – Hair Dryer Background
11
Sample – Jack Hammer Background
12
Sample – Air Conditioner Background
13
Sample – Cafeteria Background
14
Sample – Automobile Background
15
Sample – Coffee Grinder Background
16
Sample – Fan Background
17
Sample – Feedback Background
18
Sample – White Noise Background
19
Sample – Static Background
20
References Ephraim, Yariv. “Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”. IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 32, No. 6, December 1984
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.