Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.

Slides:



Advertisements
Similar presentations
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
Advertisements

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Advanced Speech Enhancement in Noisy Environments
ACHIZITIA IN TIMP REAL A SEMNALELOR. Three frames of a sampled time domain signal. The Fast Fourier Transform (FFT) is the heart of the real-time spectrum.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.
Speech & Audio Processing
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Advances in WP1 Nancy Meeting – 6-7 July
Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July,
HIWIRE MEETING Torino, March 9-10, 2006 José C. Segura, Javier Ramírez.
FFT-based filtering and the Short-Time Fourier Transform (STFT) R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.
Communications & Multimedia Signal Processing Formant Track Restoration in Train Noisy Speech Qin Yan Communication & Multimedia Signal Processing Group.
Speech Recognition in Noise
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
CELLULAR COMMUNICATIONS DSP Intro. Signals: quantization and sampling.
Systems: Definition Filter
Representing Acoustic Information
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Discrete-Time and System (A Review)
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Speech Enhancement Using Spectral Subtraction
Implementing a Speech Recognition System on a GPU using CUDA
Jacob Zurasky ECE5526 – Spring 2011
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Basics of Neural Networks Neural Network Topologies.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
CEPSTRAL ANALYSIS Cepstral analysis synthesis on the mel frequency scale, and an adaptative algorithm for it. Cecilia Caruncho Llaguno.
Chapter 6 Spectrum Estimation § 6.1 Time and Frequency Domain Analysis § 6.2 Fourier Transform in Discrete Form § 6.3 Spectrum Estimator § 6.4 Practical.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
7- 1 Chapter 7: Fourier Analysis Fourier analysis = Series + Transform ◎ Fourier Series -- A periodic (T) function f(x) can be written as the sum of sines.
revision Transfer function. Frequency Response
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Digital Signal Processing
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
EEE Chapter 6 Random Processes and LTI Huseyin Bilgekul EEE 461 Communication Systems II Department of Electrical and Electronic Engineering Eastern.
Chapter 4 Discrete-Time Signals and transform
Speech Enhancement Summer 2009
PATTERN COMPARISON TECHNIQUES
LECTURE 30: SYSTEM ANALYSIS USING THE TRANSFER FUNCTION
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
FFT-based filtering and the
Design of Digital Filter Bank and General Purpose Digital Shaper
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
朝陽科技大學 資訊工程系 謝政勳 Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學 資訊工程系 謝政勳
Ningping Fan, Radu Balan, Justinian Rosca
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
1-D DISCRETE COSINE TRANSFORM DCT
Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Dealing with Acoustic Noise Part 1: Spectral Estimation
Lecture 7 Spatial filtering.
Presenter: Shih-Hsiang(士翔)
Measuring the Similarity of Rhythmic Patterns
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Noise Reduction Two Stage Mel-Warped Weiner Filter Approach

Intellectual Property Advanced front-end feature extraction algorithm ETSI ES V1.1.3 ( ) European Telecommunications Standards Institute ETSI Technical Committee Speech Processing, Transmission and Quality Aspects (STQ). Advanced front-end feature extraction algorithm ETSI ES V1.1.3 ( ) European Telecommunications Standards Institute ETSI Technical Committee Speech Processing, Transmission and Quality Aspects (STQ).

Noise Reduction Based on Weiner filter theory Noise reduction is performed in two stages Input signal is de-noised in the first stage. Second stage – dynamic noise reduction based on SNR of processed signal Based on Weiner filter theory Noise reduction is performed in two stages Input signal is de-noised in the first stage. Second stage – dynamic noise reduction based on SNR of processed signal

First Stage Spectrum Estimation PSD Mean WF Design Mel Filter-Bank Mel IDCT Apply Filter VADNest To Second Stage

Second Stage Spectrum Estimation PSD Mean WF Design Mel Filter-Bank Gain Factorization Mel IDCT Apply Filter From First Stage OFF Output

Buffering Buffer 1Buffer ABCDEFGH BCD new FGH De-noised (1 st Stage) De-noised (output) 1 frame = 80 samples 1 buffer = 4 frames A De-noised (output)

Spectrum Estimation Input signal is divided into overlapping frames of N in = 200 samples. A 25ms frame length and 10ms frame shift (80 samples) are used. Each frame S w (n) is windowed with a Hanning window of length N in. Input signal is divided into overlapping frames of N in = 200 samples. A 25ms frame length and 10ms frame shift (80 samples) are used. Each frame S w (n) is windowed with a Hanning window of length N in.

Spectrum Estimation where Padding from N in up to N FFT -1, N FFT = 256

Spectrum Estimation Frequency representation: Power spectrum: Smoothing:

Power Spectral Density Mean Compute for each P in (bin) the mean over the last T PSD = 2 frames.

Wiener Filter Design A forgetting factor (weight) is computed for each frame, λ NSE. If (t < 100 frames) λ NSE = 1 – 1/t else λ NSE = 0.99

Wiener Filter Design First stage noise spectrum estimate is updated based on VAD flag: If flag = 0 P 1/2 noise (bin,t n ) = min(λ NSE ● P 1/2 noise (bin,t n -1)+(1- λ NSE ) ● PSD mean,exp(-10)) If flag = 1 P 1/2 noise (bin,t) = P 1/2 noise (bin,t n ) (last non speech frame) First stage noise spectrum estimate is updated based on VAD flag: If flag = 0 P 1/2 noise (bin,t n ) = min(λ NSE ● P 1/2 noise (bin,t n -1)+(1- λ NSE ) ● PSD mean,exp(-10)) If flag = 1 P 1/2 noise (bin,t) = P 1/2 noise (bin,t n ) (last non speech frame)

Wiener Filter Design Second stage is updated permanently: If (t < 11) P noise (bin,t) = λ NSE ● P noise (bin,t n -1)+(1- λ NSE ) ● PSD mean else update = ×P inPSD (bin,t)/(P inPSD (bin,t)+ P noise (bin,t-1) ) ×(1+1/(1+0.1×P inPSD (bin,t) /(P inPSD (bin,t-1))) P noise (bin,t) = P noise (bin,t-1)×update Second stage is updated permanently: If (t < 11) P noise (bin,t) = λ NSE ● P noise (bin,t n -1)+(1- λ NSE ) ● PSD mean else update = ×P inPSD (bin,t)/(P inPSD (bin,t)+ P noise (bin,t-1) ) ×(1+1/(1+0.1×P inPSD (bin,t) /(P inPSD (bin,t-1))) P noise (bin,t) = P noise (bin,t-1)×update

Wiener Filter Design Noiseless spectrum is estimated: P 1/2 den (bin,t) = 0.98×P 1/2 den (bin,t-1)+(1-0.98)×T[PSD mean -P 1/2 noise (bin,t) ] where the threshold function T is Noiseless spectrum is estimated: P 1/2 den (bin,t) = 0.98×P 1/2 den (bin,t-1)+(1-0.98)×T[PSD mean -P 1/2 noise (bin,t) ] where the threshold function T is

Wiener Filter Design The priori SNR is calculated: The filter transfer function is

Wiener Filter Design The filter transfer function is used to improve noiseless signal estimation: The improved priori SNR is:

Voice Activity Detection VAD is used to detect noise frames Find frame energy: VAD is used to detect noise frames Find frame energy: If frame threshold < 10 long term energy factor ( LTE ) = 1 - 1/t Else LTE = 0.97; Calculate frame energy:

Voice Activity Detection Use frame energy to update mean energy: If frame energy - mean energy < 20 (SNR threshold) or t < 10 Then if (frameEn < meanEn) or (t < 10) meanEn = meanEn + (1 - LTE ) * (frameEn - meanEn) ElsemeanEn = meanEn+( ) * (frameEn - meanEn) If (meanEn < 80) meanEn = 80

Voice Activity Detection Is the current frame speech? If t > 4 if (frameEn - meanEn) > 15 IT IS SPEECH nbSpeechFrame++ else if nbSpeechFrame > 4 hangover = 15, nbSpeechFrame = 0 if (hangover != 0) IT IS SPEECH else IT IS NOT SPEECH

Mel Filter Bank The linear frequency Weiner filter coefficients are smoothed and transformed to the Mel- frequency scale. The mel scale is a scale of pitches judged by listeners to be equal in distance one from another. The linear frequency Weiner filter coefficients are smoothed and transformed to the Mel- frequency scale. The mel scale is a scale of pitches judged by listeners to be equal in distance one from another.

Mel IDCT The time-domain impulse response of the Wiener filter is computed from the Mel-Wiener filter coefficients by using Mel-warped inverse Discrete Cosine Transform:

Gain Factorization Factorization of the Wiener filter Mel-warped coefficients is performed to control the aggression of noise reduction in the second stage. The de-noised frame signal energy is calculated as: Factorization of the Wiener filter Mel-warped coefficients is performed to control the aggression of noise reduction in the second stage. The de-noised frame signal energy is calculated as:

Gain Factorization The noise energy of the current frame is estimated as:

Gain Factorization The smoothed SNR is evaluated using 3 de- noised frame energies and the noise energy If (Ratio > ) Then SNR avg (t) = 6.67 × log 10 (Ratio) Else SNR avg (t) = -33.3

Gain Factorization To decide the degree of aggression, the SNR is tracked: If {(SNR avg (t) – SNR low-track (t-1)) < 10 ort < 10} calculate λ SNR (t) SNR low-track (t) = λ SNR (t)× SNR low-track (t -1)+(1- λ SNR (t))×SNR avg (t) Else SNR low-track (t) = SNR low-track (t -1)

Gain Factorization Gain factorization applies more aggressive noise reduction to purely noisy frames and less to frames containing speech. The aggression coefficient takes on a value of 10% for speech + noise frames and 80% for noise frames. Gain factorization applies more aggressive noise reduction to purely noisy frames and less to frames containing speech. The aggression coefficient takes on a value of 10% for speech + noise frames and 80% for noise frames.

Apply Filter The causal impulse response is obtained, truncated and weighted by a Hanning window. The input signal is filtered with the filter impulse response to produce the noise-reduced signal. The causal impulse response is obtained, truncated and weighted by a Hanning window. The input signal is filtered with the filter impulse response to produce the noise-reduced signal.

Offset Compensation A filter is used to remove the DC offset over the frame length interval (80 samples). Where Snr is the noise reduced signal

Results Noisy test file: After de-noise:

Results Footloose: Not Footloose:

Results: why didn’t this work? Hair dryer: Still there?!?!:

Results Hair dryer: Gone: