1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.

Slides:



Advertisements
Similar presentations
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Advertisements

Advanced Speech Enhancement in Noisy Environments
統計圖等化法於雜訊語音辨識之進一步研究 An Improved Histogram Equalization Approach for Robust Speech Recognition 2012/05/22 報告人:汪逸婷 林士翔、葉耀明、陳柏琳 Department of Computer Science.
Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
An Energy Search Approach to Variable Frame Rate Front-End Processing for Robust ASR Julien Epps and Eric H. C. Choi National ICT Australia Presenter:
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Advances in WP1 Turin Meeting – 9-10 March
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.
Advances in WP1 Nancy Meeting – 6-7 July
Communications & Multimedia Signal Processing Frequency Kalman Noise Reduction Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel.
HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre.
HIWIRE MEETING Torino, March 9-10, 2006 José C. Segura, Javier Ramírez.
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Speech Recognition in Noise
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Advances in WP1 and WP2 Paris Meeting – 11 febr
HIWIRE MEETING Trento, January 11-12, 2007 José C. Segura, Javier Ramírez.
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
Robust Automatic Speech Recognition by Transforming Binary Uncertainties DeLiang Wang (Jointly with Dr. Soundar Srinivasan) Oticon A/S, Denmark (On leave.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
An Analysis of the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical.
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 663 Mid Year Progress Report December 2008 Professor Radu Balan 1.
Survey of ICASSP 2013 section: feature for robust automatic speech recognition Repoter: Yi-Ting Wang 2013/06/19.
Research & Technology Experiments on different feature sets; comparison with DC baseline system RESPITE workshop Jan Martigny Joan Mari Hilario.
Noise Compensation for Speech Recognition with Arbitrary Additive Noise Ji Ming School of Computer Science Queen’s University Belfast, Belfast BT7 1NN,
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
IMPROVING RECOGNITION PERFORMANCE IN NOISY ENVIRONMENTS Joseph Picone 1 Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Basics of Neural Networks Neural Network Topologies.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
USE OF IMPROVED FEATURE VECTORS IN SPECTRAL SUBTRACTION METHOD Emrah Besci, Semih Ergin, M.Bilginer Gülmezoğlu, Atalay Barkana Osmangazi University, Electrical.
Robust Feature Extraction for Automatic Speech Recognition based on Data-driven and Physiologically-motivated Approaches Mark J. Harvilla1, Chanwoo Kim2.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Performance Comparison of Speaker and Emotion Recognition
Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition Jeih-weih Hung, Member, IEEE, and.
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
PHASE-BASED DUAL-MICROPHONE SPEECH ENHANCEMENT USING A PRIOR SPEECH MODEL Guangji Shi, M.A.Sc. Ph.D. Candidate University of Toronto Research Supervisor:
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
Performance of Digital Communications System
Exploring the Use of Speech Features and Their Corresponding Distribution Characteristics for Robust Speech Recognition Shih-Hsiang Lin, Berlin Chen, Yao-Ming.
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Linear Prediction.
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
AN ANALYSIS OF TWO COMMON REFERENCE POINTS FOR EEGS
DeLiang Wang (Jointly with Dr. Soundar Srinivasan) Oticon A/S, Denmark
Missing feature theory
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
Speech / Non-speech Detection
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering University of Washington Presenter: Shih-Hsiang( 士翔 ) ICSLP 2002

2 Introduction  The performance of ASR systems often decreases dramatically when the noise level increases The degradation is minor when the signal-to-noise ratio (SNR) is high, but quite significant at low SNR level  In the past, a variety of techniques have been proposed Principle component analysis and a discriminative neural network (Ellis et al. 2001) Missing Data theory (Cooke et al. 2001) Voice activity detector (VAD) and variable frame rate are used to drop noisy feature vector to reduce insertion error (John et al. 2001) Nonlinear spectral subtraction, noise masking, feature filters, and model adaptation (Lieb et al. 2001) data-driven temporal filters, on-line mean and variance normalization, voice activity detection, and server side discriminate features are integrated together to improve noise robustness (Morgan et al. 2001) … etc

3 Literature Review  Ellis et al  John et al Variable frame rate processing  An observation vector is discarded if it does not differ much from the previous observation vector. In our implementation of VFR, frame-to-frame variation is estimated as the Euclidean norm of the sub-vector corresponding to the delta- cepstrum. Voice activity detection

4 Literature Review  Morgan et al. (2001)

5 Proposed method  The first step is standard mean subtraction (MS)  The second step is variance normalization (VN)  The third step is auto-regression moving average (ARMA) feature vector (cepstral coefficient) the order of the ARMA filter

6 Choosing a proper order M of the filter There are zeros in the frequency response of the ARMA filter is approximately proportional to its order It support that a large M will perform poorly since it could filter out important speech information The transfer function is: The frequency response of the ARMA filter of order M is:

7 Gain and phase shifts of the ARMA filter

8 The time sequences of the cepstral coefficient c1 for the digit string corrupted with different levels of noises

9 Evaluation  Evaluate on Aurora 2.0 noisy digits database Two training sets and three test sets  Training sets : clean training set only / multi-condition speech  Test sets: stationary-noise sets / non-stationary-noise sets / convolutional noise 7 different levels of noises  Clean, 20dB, 15 dB, 10dB, 5dB, 0dB, -5dB  Recognizer Simple HMM-based system using whole-word models  Zero ~ Nine and Oh : 16 states per word, 3 mixture Gaussian per state  silence : 3-states

10 Recognition results Top: multi-condition training Bottom: clean training Word accuracies (as percentages)

11 A comparison of different orders of the ARMA filtering  A small M will retain the short-term cepstral information but is more vulnerable to noise  A large M will make the processed features less corrupted by noise, but the short-term cepstral information will be lost. Top: multi-condition training Bottom: clean training

12 Test the effectiveness of proposed technique  The results show that while variance normalization and mean subtraction improves performance over the baseline, the addition of the ARMA filter provides significant further improvements

13 Comparison of different filter  causal ARMA filter  non-causal MA filter  causal MA filter

14 Comparison of different filter (cont.) Top: multi-condition training Bottom: clean training