Presented by Chen-Wei Liu

Slides:



Advertisements
Similar presentations
Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08.
Advertisements

Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Current HOARSE related activities 6-7 Sept …include the following (+ more) Novel architectures 1.All-combinations HMM/ANN 2.Tandem HMM/ANN hybrid.
Modulation Spectrum Factorization for Robust Speech Recognition Wen-Yi Chu 1, Jeih-weih Hung 2 and Berlin Chen 1 Presenter : 張庭豪.
Adaptive Hough transform for the search of periodic sources P. Astone, S. Frasca, C. Palomba Universita` di Roma “La Sapienza” and INFN Roma Talk outline.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10.
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
HIWIRE MEETING Torino, March 9-10, 2006 José C. Segura, Javier Ramírez.
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Speech Recognition in Noise
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Communications & Multimedia Signal Processing Analysis of Effects of Train/Car noise in Formant Track Estimation Qin Yan Department of Electronic and Computer.
Representing Acoustic Information
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Speech Enhancement Using Spectral Subtraction
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Basics of Neural Networks Neural Network Topologies.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
Quantile Based Histogram Equalization for Noise Robust Speech Recognition von Diplom-Physiker Florian Erich Hilger aus Bonn - Bad Godesberg Berichter:
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Performance Comparison of Speaker and Emotion Recognition
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
The Wavelet Tutorial: Part2 Dr. Charturong Tantibundhit.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Speech Processing Using HTK Trevor Bowden 12/08/2008.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Speech Enhancement Summer 2009
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Hierarchical Multi-Stream Posterior Based Speech Recognition System
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Statistical Models for Automatic Speech Recognition
EE513 Audio Signals and Systems
Missing feature theory
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Speech / Non-speech Detection
Learning Long-Term Temporal Features
Presenter: Shih-Hsiang(士翔)
Measuring the Similarity of Rhythmic Patterns
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Presented by Chen-Wei Liu Multi-Resolution Spectral Entropy based Feature for Robust ASR Hemant Misra, Shajith Ikbal, Henve Bourland IDIAP Research Institute, Martigny, Switzerland Graduate Institute of Computer Science, National Taiwan Normal University 2005/4/21 Presented by Chen-Wei Liu

Introduction (1/2) Spectral Entropy based Feature for Robust ASR Most of the state of the art ASR systems Use cepstral features derived from short time Fourier transform spectrum of speech signal While cepstral features are fairly good representation, they capture the absolute energy response of the spectrum Further, we are not sure that all the information present in the STFT spectrum is captured by them This paper suggests to capture further information from the spectrum by computing its entropy

Introduction (2/2) Spectral Entropy based Feature for Robust ASR For voiced sounds, spectra have clear formants Entropies of such spectra will be low On the other hand spectra of unvoiced sounds are flatter Their entropies should be higher Therefore, entropy of a spectrum can be used as an estimate for voicing/un-voicing decision This paper extends the idea further and introduce multi-band/multi-resolution entropy feature

Multi-Resolution Spectral Entropy Feature Entropy can be used to capture the peakiness of a probability mass function (PMF) A PMF with sharp peak will have low entropy while a PMF with flat distribution will have high entropy In case of STFT spectra of speech, distinct peaks and the position of these peaks in the spectra Are dependent on the phoneme under consideration

Multi-Resolution Spectral Entropy Feature To compute entropy of a spectrum we converted the spectrum into a PMF like function by normalizing it Entropy for each frame was computed by

Multi-Resolution Spectral Entropy Feature Realizing that full-band entropy can capture only the gross peakiness of the spectrum but not the location of the formants While multi-resolution entropy feature can capture the location of the formants To extract multi-band entropy features, we divided the full-band spectrum into J non-overlapping sub-bands of equal size Entropy was computed for each sub-band and we obtained one entropy value for each sub-band These sub-band entropy values indicate the presence or absence of formants in that sub-band

Modifications Previous paper advised that each sub-band spectrum should be converted into a sub-band PMF and entropy be computed for each sub-band PMF Converting each sub-band into a PMF enhances the false local peaks as the normalization was done only for that sub-band This false relative peak enhancement gives us low entropy values even for peaks which were smaller in the full-band But were observed as high relative peaks in a normalized sub-band

Modifications To overcome this problem, we decide not to convert each sub-band into a PMF separately Instead the full-band was converted into a PMF and then divided into sub-bands to obtain entropy feature from each sub-band This ensured that the peaks retained their relative strength in each sub-band This is equivalent to computing the entropy contribution of each sub-band to the full-band entropy In this paper, we change the parameter J from 1 to 32 Working with 129 point STFT spectrum to have better frequency resolution

Modifications In this paper, we also explored the idea of overlapping sub-bands of unequal size By dividing the full-band into 24 overlapping sub-bands defined by Mel-scale and computed entropy from each sub-bands Moreover, in order to incorporate the temporal information, we also used the first and second order time derivatives of the multi-band entropy feature in the system

Experimental Setup Numbers95 database US English connected digits telephone speech 30 words in the database represented by 27 phonemes Training is performed on clean speech Testing is corrupted by factory noise from Noisex92 database added at different SNRs to Numbers95 database 3330 utterences for training ; 1143 utterences for testing PLP features are used 13 dimensional cepstral coefficients appended with their first and second order time derivatives

Results Entropy feature is used alone in this experiment The two entropy values are appended to form a 2-dimensional entropy feature vector used for training and testing the system As the number of sub-bands increased, the performance improves

Results Appending the time derivatives of the entropy feature to the entropy feature

Results Feature combination

Discussion This paper suggested dividing the normalized full-band spectrum into sub-bands and obtaining contribution to entropy from each sub-band The ideas of overlapping sub-bands and the time derivatives being appended to the feature improve the performance further Improved performance is obtained when multi-band entropy feature is appended to the usual PLP cepstral features