Mohamed Kamel Omar and Lidia Mangu ICASSP 2007

Slides:

Advertisements

Similar presentations

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0)

Advertisements

PHONE MODELING AND COMBINING DISCRIMINATIVE TRAINING FOR MANDARIN-ENGLISH BILINGUAL SPEECH RECOGNITION Yanmin Qian, Jia Liu ICASSP2010 Pei-Ning Chen CSIE.

歡迎 IBM Watson 研究員詹益毅博士蒞臨國立台灣師範大學. Hai-Son Le, Ilya Oparin, Alexandre Allauzen, Jean-Luc Gauvain, Franc¸ois Yvon ICASSP 2011 許曜麒 Structured Output.

Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.

Introduction to Hidden Markov Models

 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.

Minimum Classification Error Networks Based on book chapter 9, by Shigeru Katagiri Jaakko Peltonen, 28 th February, 2002.

ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.

A Survey of ICASSP 2013 Language Model Department of Computer Science & Information Engineering National Taiwan Normal University 報告者：郝柏翰 2013/06/19.

Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School.

Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.

1IBM T.J. Waston CLSP, The Johns Hopkins University Using Random Forests Language Models in IBM RT-04 CTS Peng Xu 1 and Lidia Mangu 2 1. CLSP, the Johns.

Speech and Language Processing

Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.

Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.

1 Sentence-extractive automatic speech summarization and evaluation techniques Makoto Hirohata, Yosuke Shinnaka, Koji Iwano, Sadaoki Furui Presented by.

DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.

Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,

Improving out of vocabulary name resolution The Hanks David Palmer and Mari Ostendorf Computer Speech and Language 19 (2005) Presented by Aasish Pappu,

Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.

Discriminative Training and Acoustic Modeling for Automatic Speech Recognition - Chap. 4 Discriminative Training Wolfgang Macherey Von der Fakult¨at f¨ur.

8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.

Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.

The Use of Virtual Hypothesis Copies in Decoding of Large-Vocabulary Continuous Speech Frank Seide IEEE Transactions on Speech and Audio Processing 2005.

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Tight Coupling between ASR and MT in Speech-to-Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan

1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.

NTNU SPEECH AND MACHINE INTELEGENCE LABORATORY Discriminative pronunciation modeling using the MPE criterion Meixu SONG, Jielin PAN, Qingwei ZHAO, Yonghong.

Sridhar Raghavan and Joseph Picone URL:

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Tight Coupling between ASR and MT in Speech-to-Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar.

Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy

Bayes Risk Minimization using Metric Loss Functions R. Schlüter, T. Scharrenbach, V. Steinbiss, H. Ney Present by Fang-Hui, Chu.

A NONPARAMETRIC BAYESIAN APPROACH FOR

Automatic Speech Recognition

Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky

An overview of decoding techniques for LVCSR

Conditional Random Fields for ASR

Statistical Models for Automatic Speech Recognition

專題研究 week3 Language Model and Decoding

8.0 Search Algorithms for Speech Recognition

Tight Coupling between ASR and MT in Speech-to-Speech Translation

Statistical Models for Automatic Speech Recognition

Jeremy Morris & Eric Fosler-Lussier 04/19/2007

Automatic Speech Recognition: Conditional Random Fields for ASR

Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky

LECTURE 15: REESTIMATION, EM AND MIXTURES

Presenter : Jen-Wei Kuo

2017 APSIPA A Study on Landmark Detection Based on CTC and Its Application to Pronunciation Error Detection Chuanying Niu1, Jinsong Zhang1, Xuesong Yang2.

Presentation transcript:

An Evaluation of Lattice Scoring using A Smoothed Estimate of Word Accuracy Mohamed Kamel Omar and Lidia Mangu ICASSP 2007 IBM T.J. Watson Research Center

Outline Introduction Problem formulation Implementation MSWA algorithm MSWA-CN algorithm Experiments and results Conclusions

Introduction In ASR systems, the maximum a posteriori probability is the standard decoding criterion To minimize an estimate of the sentence-level error Inconsistent with the evaluation metrics of ASR The motivation of this paper is.. To select a hypothesis which minimizes an estimate of the word error rate of the hypothesis lattice To avoid the computational infeasibility of calculating the pair-wise Levenshtein distance between each two possible paths

Introduction In LVCSR systems, it is commonly the case the word lattices are used as a compact representation of the alternative hypotheses However, calculating pair-wise word error rates for different hypotheses in the lattice is computationally infeasible [L. Mangu et al. 2000] Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks [V. Goel et al. 2006] Segmental Minimum Bayes-Risk Decoding for Automatic Speech Recognition Systems [F. Wessel et al. 2001] Explicit Word Error Minimization Using Word Hypothesis Posterior Probabilities

Formulation Given two lattices, a measure of the word accuracy of the hypothesis lattice with respect to the reference lattice can be approximated by is the expected value over the joint probability mass function (PMF) of the hypothesis word sequence, H, and the reference word sequence, R, given the observation vector Y is the word accuracy of with respect to is the posterior probability of the reference string estimated from the reference lattice is the posterior probability of the hypothesis string estimated from the hypothesis lattice is a smoothed approximation of the word accuracy of with respect to which takes phonetic similarity into consideration (1)

Maximum Smoothed Word Accuracy Approach Our goal is to select the word sequence in the hypothesis lattice which maximizes the estimate of the word accuracy -- (MSWA) This word sequence can be estimated using the Viterbi algorithm Alternatively we can assign to each word arc, , in the hypothesis lattice, the conditional value of the objective function in eq(1) given this word arc, that is (2) (3)

Smoothed Estimate of Word Accuracy The approximate measure of the accuracy of a word arc , in the hypothesis lattice with respect to a path , which start or end in the middle of an arc in the reference lattice such that it coincides with in time, is SIL 國立台灣師範大學鍋粒颱風排彎吃飯司法 Hyp 國立粒彎台灣 SIL 師範大學鍋排吃飯颱風司法 Ref 颱風 t_a ai f_e eng 台灣師 t_a ai sic_u uan sh_empt

Implementation Two requirements (for , , ): The forward-backward algorithm has to be applied to the reference lattice The state sequence for each arc in the reference lattice has to be known In this paper, two approaches were used to estimate the hypothesis which will approximately maximize the objective function The Viterbi-based MSWA algorithm were used to estimate the word sequence according eq(2) The MSWA-CN algorithm, which is based on the CN algorithm, were used to estimate the best word sequence using the conditional values in eq(3)

Viterbi-Based MSWA Algorithm 1. Initialization: for each starting arc in the hypothesis lattice, 2. Forward Propagation: the update equations of the viterbi algorithm for each non-starting arc, , is SIL 國立台灣師範大學鍋粒颱風排彎吃飯司法國立粒彎台灣 SIL 師範大學鍋排吃飯颱風司法

Viterbi-Based MSWA Algorithm 3. Backtracking: 4. Set and exit with the output word sequence

MSWA-CN Algorithm 1. Initialization: For each starting arc in the hypothesis lattice, For each ending arc in the hypothesis lattice, 2. Forward Propagation: the update equations of the forward propagation part of the algorithm for each non-starting arc, , is

MSWA-CN Algorithm 3. Backward Propagation: the update equations of the backward propagation part of the algorithm for each non-ending arc, , is 4. For each word arc in the hypothesis lattice The confusion network algorithm is then used to find the best path in the hypothesis lattice after replacing the word posterior probability in the original algorithm with

Experimental Setup Corpus Two systems Feature extraction DARPA 2004 Rich Transcription evaluation data (RT04) 2005 broadcast news Arabic test set (BNAT05) Two systems unvowelized system vowelized system Feature extraction LDA+MLLT Acoustic Modeling (penta-phone) 4000K Gaussians trained with a combination fMPE and MPE Language Modeling (Lexicon size:617K) 4-gram LM trained with modified Kneser-Ney smoothing

Experimental Results The main difference between the two systems is the explicit modeling by the vowelized system of the short vowels which are pronounced in Arabic but almost never transcribed.

Conclusions In this paper, a new smoothed word accuracy (SWA) objective function for lattice scoring have been examined Two algorithms which use the SWA objective function to estimate the best hypothesis in a given lattice have been described: The Viterbi-based MSWA algorithm The MSWA-CN algorithm In the future, the authors will intend to assess the usefulness of the conditional score in eq(3) for confidence annotation

Derivations