Mplp(t) derived from PLP cepstra,. This observation

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
On the Dimensionality of Face Space Marsha Meytlis and Lawrence Sirovich IEEE Transactions on PAMI, JULY 2007.
Lecture 3 Nonparametric density estimation and classification
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Cognitive Processes PSY 334 Chapter 2 – Perception April 9, 2003.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
Statistical Methods for long-range forecast By Syunji Takahashi Climate Prediction Division JMA.
Pulse Modulation 1. Introduction In Continuous Modulation C.M. a parameter in the sinusoidal signal is proportional to m(t) In Pulse Modulation P.M. a.
1 Patch Complexity, Finite Pixel Correlations and Optimal Denoising Anat Levin, Boaz Nadler, Fredo Durand and Bill Freeman Weizmann Institute, MIT CSAIL.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.
Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch.
By Sarita Jondhale1 Pattern Comparison Techniques.
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
Linear Prediction Coding of Speech Signal Jun-Won Suh.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Dealing with Unknown Unknowns (in Speech Recognition) Hynek Hermansky Processing speech in multiple parallel processing streams, which attend to different.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
AGA 4/28/ NIST LID Evaluation On Use of Temporal Dynamics of Speech for Language Identification Andre Adami Pavel Matejka Petr Schwarz Hynek Hermansky.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Discriminative Training and Acoustic Modeling for Automatic Speech Recognition - Chap. 4 Discriminative Training Wolfgang Macherey Von der Fakult¨at f¨ur.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,
CCN COMPLEX COMPUTING NETWORKS1 This research has been supported in part by European Commission FP6 IYTE-Wireless Project (Contract No: )
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
Quantile Based Histogram Equalization for Noise Robust Speech Recognition von Diplom-Physiker Florian Erich Hilger aus Bonn - Bad Godesberg Berichter:
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
Speech Enhancement based on
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
On the relevance of facial expressions for biometric recognition Marcos Faundez-Zanuy, Joan Fabregas Escola Universitària Politècnica de Mataró (Barcelona.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Speech Recognition
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Hierarchical Multi-Stream Posterior Based Speech Recognition System
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Statistical Models for Automatic Speech Recognition
By Viput Subharngkasen
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
Missing feature theory
LECTURE 23: INFORMATION THEORY REVIEW
Parametric Methods Berlin Chen, 2005 References:
Human Speech Communication
Presented by Chen-Wei Liu
CH2 Time series.
Using Clustering to Make Prediction Intervals For Neural Networks
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Mplp(t) derived from PLP cepstra,. This observation MEAN TEMPORAL DISTANCE: PREDICTING ASR ERROR FROM TEMPORAL PROPERTIES OF SPEECH SIGNAL Hynek Hermansky, Ehsan Variani & Vijayaditya Peddinti Center for Language and Speech Processing, Johns Hopkins University ABSTRACT MEANS & VARIANCES OF M(△t) CURVES DURATION vs ERROR ESTIMATION ACCURACY Phoneme posterior features with symmetric Kullback-Leibler distance A simple but effective method of estimating ASR performance by computing a function M(△t), which represents the mean distance between speech feature vectors evaluated over certain finite time interval, determined as a function of temporal distance t between the vectors. Mplp(t) derived from PLP cepstra,. This observation supports the notion speech is composed of sequences of phonemes spaced roughly 70 ms apart, overlapping with their immediate neighbors. The 200 ms duration signifies extent after which the longest phoneme ceases to influence its neighboring phonemes. Mpost(t) computed from phoneme posterior features does not exhibit a peak at phoneme center time span because posterior features are trained to be constant over the phoneme lengths. Value of M(t) for longer time-spans (t > 200ms) decreases with decreasing SNR as stationary distortions dominate the signal and feature vector values, making these vectors more similar. As a result, the divergence values among these feature vectors decrease. Thus M(t) appears to be a good predictor of error rate of phoneme classifier that was trained on clean data. PLP-derived curves exhibit peaks at around 70 ms, which corresponds to average spacing between center of neighboring phonemes flatten at about t > 200ms, due to phoneme coarticulation Stabilizes at syllable length Feature vectors in each sound are stationary and all sounds are of equal lengths Feature vectors are not stationary but gradually change due to coarticulation in speech production Feature vectors are corrupted by slowly varying or stationary distortions Mplp(t) measure has the weakest correlation. However, it does not require estimation posterior probabilities of phonemes, is much easier to compute PLP features with Euclidean distance. M(△t) CURVE FOR STREAM SELECTION IN MULTI-STREAM SYSTEMS Peaks at about average phoneme length and stays approximately constant after syllable length Environment PLP-MLP Multi-Stream + PM Oracle Stream Selection Clean 31% 28% 25% Car at 0 dB SNR 54% 38% M(△t) CURVES IN CLEAN AND NOISY SPEECH REFERENCES MEAN OF M(△t) vs ACCURACY E. Variani and H. Hermansky, “Estimating classifier performance in unknown noise,” in Proc. Interspeech, 2012. N. Mesgarani, S. Thomas, and H. Hermansky, “A multistream multiresolution framework for phoneme recognition,” in Proc. , Interspeech, 2010, pp. 318–321 H. Hermansky and J. Cohen, “Report from ws 1996,” Tech. Rep., Center for Language and Speech Processing, the Johns Hopkins University, 1996 * The research presented in this paper was partially funded by the DARPA RATS program under D10PC20015, and the JHU Human Language Technology Center of Excellence. Any opinions, findings and conclusions or recommendations expressed in this material are the of the author(s) and do not necessarily reflect the views of DARPA or the JHU HLTCOE. PLP POSTERIORS