Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.

Slides:



Advertisements
Similar presentations
Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08.
Advertisements

Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Digital Image Processing In The Name Of God Digital Image Processing Lecture3: Image enhancement M. Ghelich Oghli By: M. Ghelich Oghli
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
7.0 Speech Signals and Front-end Processing
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
HIWIRE MEETING Torino, March 9-10, 2006 José C. Segura, Javier Ramírez.
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Representing Acoustic Information
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Speech and Language Processing
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
11.0 Robustness for Acoustic Environment References: , 10.6 of Huang 2. “Robust Speech Recognition in Additive and Convolutional Noise Using Parallel.
15.0 Robustness for Acoustic Environment References: , 10.6 of Huang 2. “Robust Speech Recognition in Additive and Convolutional Noise Using Parallel.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
Jacob Zurasky ECE5526 – Spring 2011
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Basics of Neural Networks Neural Network Topologies.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.
1 LSA 352 Summer 2007 LSA 352 Speech Recognition and Synthesis Dan Jurafsky Lecture 6: Feature Extraction and Acoustic Modeling IP Notice: Various slides.
Subproject II: Robustness in Speech Recognition. Members (1/2) Hsiao-Chuan Wang (PI) National Tsing Hua University Jeih-Weih Hung (Co-PI) National Chi.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Speech Processing Using HTK Trevor Bowden 12/08/2008.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
7.0 Speech Signals and Front-end Processing References: , 3.4 of Becchetti of Huang.
15.0 Robustness for Acoustic Environment
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
ARTIFICIAL NEURAL NETWORKS
Spoken Digit Recognition
Cepstrum and MFCC Cepstrum MFCC Speech processing.
Linear Prediction.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Speech Processing Speech Recognition
Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Digital Systems: Hardware Organization and Design
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
Uses of filters To remove unwanted components in a signal
Speech / Non-speech Detection
Speech Signal Representations
Presenter: Shih-Hsiang(士翔)
Measuring the Similarity of Rhythmic Patterns
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06

Outline  Mel Frequency Cepstral Coefficient(MFCC)  Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering  Conclusions and applications

Outline  Mel Frequency Cepstral Coefficient(MFCC)  Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering  Conclusions and applications

Mel Frequency Cepstral Coefficients(MFCC)  The most common used feature in speech recognition  Advantages: High accuracy and low complexity 39 dimension

Mel Frequency Cepstral Coefficients(MFCC)  The framework of feature extraction: Speech signal Pre-emphasis Window DFT Mel filter-bank Log(| | 2 ) IDFT MFCC energy derivatives x(n)x(n) x’(n) xt(n)xt(n) At(k) Yt(m)Yt(m) Yt’(m)Yt’(m) yt (j)yt (j) etet

Pre-emohasis  Pre-emphasis of spectrum at higher frequencies Pre-emphasis x[n ] x’[n ]

End-point Detection(Voice activity detection) Noise(silence) Speech

Windowing Rectangle window Hamming window

Mel-filter bank  After DFT we get spectrum frequency amplitude

Mel-filter bank Triangular shape in frequency(overlaped) Uniformly spaced below 1kHz Logarithmic scale above 1kHz frequency amplitude

Delta Coefficients  1 st/2 nd order differences 1 st order 13 dimension 2 nd order 39 dimension

Outline  Mel Frequency Cepstral Coefficient(MFCC)  Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering  Conclusions and applications

Mismatch in Statistical Speech Recognition  Possible Approaches for Acoustic Environment Mismatch h[n] acoustic reception microphone distortion phone/wireless channel n 1 (t) n 2 (t) Feature Extraction Search Speech Corpus Acoustic Models Lexicon Language Model Text Corpus y[n] O =o 1 o 2 …o T feature vectors input signal additive noise convolutional noise additive noise output sentences original speech x[n] W=w 1 w 2...w R (training) (recognition) Feature Extraction Feature Extraction Model Training Search and Recognition Acoustic Models Acoustic Models Speech Enhancement Feature-based ApproachesModel-based Approaches y[n] x[n]

Outline  Mel Frequency Cepstral Coefficient(MFCC)  Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering  Conclusions and applications

Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN)  Cepstral Mean Substraction(CMS)—Convolutional Noise Convolutional noise in time domain becomes additive in cepstral domain y[n] = x[n]  h[n]  y = x+h,x, y, h in cepstral domain most convolutional noise changes only very slightly for some reasonable time interval x = y  h  Cepstral Mean Substraction(CMS) assuming E[ x ] = 0,then E[ y ] = h x CMS = y  E[ y ] P(x) P(y) P(x) P(y) CMS P P

Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN)  CMVN: variance normalized as well x CMVN = x CMS /[Var(x CMS )] 1/2 P(x) CMS CMVN P(y)

Feature-based Approach-HEQ(Histogram Equalization)  The whole distribution equalized y=CDF y -1 [CDF x (x)] CDFy P P yx CDFx P=

Outline  Mel Frequency Cepstral Coefficient(MFCC)  Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering  Conclusions and applications

Feature-based Approach-RASTA amplitude f modulation frequency amplitude f Perform filtering on these signals(temporal filtering)

Feature-based Approach-RASTA(Relative Spectral Temporal filtering)  Assume the rate of change of noise often lies outside the typical rate of vocal tract shape  A specially designed temporal filter Modulation Frequency (H z ) Emphasize speech

Data-driven Temporal filtering  PCA(Principal Component Analysis) x y e

Data-driven Temporal filtering  We should not guess our filter, but get it from data Frame index B1(z)B1(z) B2(z)B2(z) Bn(z)Bn(z) L z k (1) z k (2) z k (3) Original feature stream y t filter convolution

Outline  Mel Frequency Cepstral Coefficient(MFCC)  Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering  Conclusions and applications

Speech Enhancement- Spectral Subtraction(SS)  producing a better signal by trying to remove the noise  for listening purposes or recognition purposes  Noise n[n] changes fast and unpredictably in time domain, but relatively slowly in frequency domain, N(w) t amplitudespeech noise f speech noise amplitude

Outline  Mel Frequency Cepstral Coefficient(MFCC)  Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral substraction 、 wiener filtering  Conclusions and applications

Conclusions  We give a general framework of how to extract speech feature  We introduce the mainstream robustness  There are still numerous noise reduction methods(leave in the reference)

References

Q & A