Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.

Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method

Process Flow

Segmenting of Signal The sample is divided into frames whose length is equal to 25ms with a shift percentage of 40% or 10ms. The Window Length is equal to the 25ms times the Sampling Frequency. – Example Sampling Frequency is equal to 8000 samples/s Window Length = 0.025s * 8000 samples/s = 200 samples Each frame is then windowed using a Hamming window.

Initial Silence Segments The initial silence or speech inactivity period is assumed to be 250ms. – This is to allow for a sufficient amount of data to be analyzed for the Noise Spectrum prior to attempting Voice Activity Detection (VAD). The Number of Initial Silence Segments (NISS) = (Initial Silence * Sampling Frequency - Window Length)/(Shift Percentage* Window Length). – Example Using our previous values. NISS = (0.25s * 8000 samples/s - 200 samples)/0.4*200 samples = 22.5. – The value is rounded down to the nearest whole number

Phase Calculation using FFT The Fast Fourier Transform of each frame is calculated. The phase component of the FFT is calculated for use in reconstruction of the enhanced signal.

Noise Power Spectrum An initial Noise Power Spectrum and the Noise Power Spectrum Variance (λ d ) is calculated using the mean values of the FFT for the NISS. For each frame in the NISS, the Noise Power Spectrum and the Noise Power Spectrum Variance are updated. The frames after the NISS are evaluated using a Voice Activity Detector (VAD) which utilizes the Noise Power Spectrum. – If the frames are determined to contain only noise, then the Noise Power Spectrum and the Noise Power Spectrum Variance are updated.

Signal to Noise Ratio Using the Noise Power Spectrum, the a priori SNR (ξ k ) and the a posteriori SNR (γ k ) are calculated. a priori SNR: – γ k =R k 2 /λ d (k) where R k is the modulus of the signal plus noise resultant spectral component a posteriori SNR – ξ k (n)=αG 2 γ k (n-1)+(1- α)P [γ k (n)-1] where α = 0.99 and is a smoothing factor. and G is the Gain Function from the MMSE and P[x] is defined as x if x>0 or 0 otherwise

Gain Calculation The gain (G) of the signal is then updated using the Signal to Noise Ratios. – G= ξ k /(1- ξ k )e (η/2) Where η= λ d ξ k /(1- ξ k )

Signal Enhancement and Reconstruction The signal is then cleaned by combining the FFT of each frame with the gain. The signal is reconstructed using the overlap add method utilizing the phase of the FFT.

Sample – Hair Dryer Background

Sample – Jack Hammer Background

Sample – Air Conditioner Background

Sample – Cafeteria Background

Sample – Automobile Background

Sample – Coffee Grinder Background

Sample – Fan Background

Sample – Feedback Background

Sample – White Noise Background

Sample – Static Background

References Ephraim, Yariv. “Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”. IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 32, No. 6, December 1984

Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.

Similar presentations

Presentation on theme: "Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.

Similar presentations

Presentation on theme: "Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method."— Presentation transcript:

Similar presentations

About project

Feedback