Error Analysis: Indicators of the Success of Adaptation Arindam Mandal, Mari Ostendorf, & Ivan Bulyko University of Washington.

Slides:



Advertisements
Similar presentations
Chapter 9: Simple Regression Continued
Advertisements

Stages of Learning Chapter 5.
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Speech Recognition with Hidden Markov Models Winter 2011
SRI 2001 SPINE Evaluation System Venkata Ramana Rao Gadde Andreas Stolcke Dimitra Vergyri Jing Zheng Kemal Sonmez Anand Venkataraman.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Week 2 Video 4 Metrics for Regressors.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Adaptation Resources: RS: Unsupervised vs. Supervised RS: Unsupervised.
Psychology 202b Advanced Psychological Statistics, II February 10, 2011.
Detecting Prosody Improvement in Oral Rereading Minh Duong and Jack Mostow Project LISTEN Carnegie Mellon University The research.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
BA 555 Practical Business Analysis
Comparing Distributions III: Chi squared test, ANOVA By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls.
Multiple Linear Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Speaker Detection Without Models Dan Gillick July 27, 2004.
Detecting missrecognitions Predicting with prosody.
Incorporating Tone-related MLP Posteriors in the Feature Representation for Mandarin ASR Overview Motivation Tone has a crucial role in Mandarin speech.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
Optimal Adaptation for Statistical Classifiers Xiao Li.
1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science.
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Creating Empirical Models Constructing a Simple Correlation and Regression-based Forecast Model Christopher Oludhe, Department of Meteorology, University.
Introduction to Linear Regression and Correlation Analysis
Introduction Mohammad Beigi Department of Biomedical Engineering Isfahan University
1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.
Speech and Language Processing
March 24, 2005EARS STT Workshop1 A Study of Some Factors Impacting SuperARV Language Modeling Wen Wang 1 Andreas Stolcke 1 Mary P. Harper 2 1. Speech Technology.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Testing Hypotheses about Differences among Several Means.
Data Sampling & Progressive Training T. Shinozaki & M. Ostendorf University of Washington In collaboration with L. Atlas.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Regression Lines. Today’s Aim: To learn the method for calculating the most accurate Line of Best Fit for a set of data.
Handing Uncertain Observations in Unsupervised Topic-Mixture Language Model Adaptation Ekapol Chuangsuwanich 1, Shinji Watanabe 2, Takaaki Hori 2, Tomoharu.
Higher Grade / Intermediate 2 Skills & Techniques.
CHAPTER 7 COMPARING TWO MEANS Difference between two groups.
1 Update on WordWave Fisher Transcription Owen Kimball, Chia-lin Kao, Jeff Ma, Rukmini Iyer, Rich Schwartz, John Makhoul.
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.
ISL Meeting Recognition Hagen Soltau, Hua Yu, Florian Metze, Christian Fügen, Yue Pan, Sze-Chen Jou Interactive Systems Laboratories.
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
Performance Comparison of Speaker and Emotion Recognition
Fita Ariyana Rombel 7 (Thursday 9 am).
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
Multivariate Transformation. Multivariate Transformations  Started in statistics of psychology and sociology.  Also called multivariate analyses and.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
H ADVANCES IN MANDARIN BROADCAST SPEECH RECOGNITION Overview Goal Build a highly accurate Mandarin speech recognizer for broadcast news (BN) and broadcast.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Detecting Prosody Improvement in Oral Rereading
CHAPTER 29: Multiple Regression*
Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )
Jun Wu and Sanjeev Khudanpur Center for Language and Speech Processing
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Speaker Identification:
Partners: ______________ & _____________
Presentation transcript:

Error Analysis: Indicators of the Success of Adaptation Arindam Mandal, Mari Ostendorf, & Ivan Bulyko University of Washington

Goals and Motivation Compare WER1 and WER2 Identify situations where MLLR breaks (then later try to improve these cases) Understand the role of WER vs. speaker characteristics new hyp 1 st pass recognizer MLLR 2nd pass recognizer speech hyp + conf WER1 WER2

Some Possible Analyses Predict relative WER reduction at the speaker level: (WER1-WER2)/WER1 Predict relative WER reduction at the utterance level Predict error correction (and introduction) at the word level Our current focus is the speaker level.

Experimental Paradigm Data Train: Eval Eval03 (Fisher set) [472 speakers] Test: Eval03 (swbd set) [72 speakers] System: SRI Aug 03, mfcc sub-system 1 st pass: phone-loop adapt + rescore* 2 nd pass: unsupervised MLLR + rescore* * Rescoring with 4-gram LM, duration model, pron model & pause model

WER Gains for Different Speakers More than 15% of the speakers take a hit when we use MLLR!

Prediction Approach Model: Multiple linear regression Mixed forward & backward feature selection using cross-validation Features Various functions of confidence before/after rescore Mean and variance of rate of speech for this speaker Speaker VTL warp factor F0 mean/sigma, F0 variation Energy mean/sigma, energy variation

Results Mean % change for test data = 6.9 RMSE predicting % change in WER Features RMSE None (use training mean) 8.72 WER (oracle) 8.65 Confidence features 8.07 Confidence + f0 + rate 8.06

Results (cont.) Since overall prediction is not great, we looked at the “goats” (negative % change) Regression prediction does much worse for these speakers! (9.2 vs. 7.9 rmse) Can we predict “sheep vs. goats”? Somewhat better than WER change… 33% error relative to 50% chance

Summary Some observations Confidence is the single most important factor; WER alone does not explain changes Easier to predict categories of speakers than relative WER change Future work: New acoustic-prosodic features Analyses aimed at how to use confidence in adaptation