Presenter : Jen-Wei Kuo

Slides:

Advertisements

Similar presentations

Yasuhiro Fujiwara (NTT Cyber Space Labs)

Advertisements

Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.

Confidence Measures for Speech Recognition Reza Sadraei.

Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.

A Low-Power Low-Memory Real-Time ASR System. Outline Overview of Automatic Speech Recognition (ASR) systems Sub-vector clustering and parameter quantization.

1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.

1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.

CAFE router: A Fast Connectivity Aware Multiple Nets Routing Algorithm for Routing Grid with Obstacles Y. Kohira and A. Takahashi School of Computer Science.

Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.

1IBM T.J. Waston CLSP, The Johns Hopkins University Using Random Forests Language Models in IBM RT-04 CTS Peng Xu 1 and Lidia Mangu 2 1. CLSP, the Johns.

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

11/24/2006 CLSP, The Johns Hopkins University Random Forests for Language Modeling Peng Xu and Frederick Jelinek IPAM: January 24, 2006.

March 24, 2005EARS STT Workshop1 A Study of Some Factors Impacting SuperARV Language Modeling Wen Wang 1 Andreas Stolcke 1 Mary P. Harper 2 1. Speech Technology.

MODELING AND ANALYSIS OF MANUFACTURING SYSTEMS Session 12 MACHINE SETUP AND OPERATION SEQUENCING E. Gutierrez-Miravete Spring 2001.

1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.

DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.

Discriminative Training and Acoustic Modeling for Automatic Speech Recognition - Chap. 4 Discriminative Training Wolfgang Macherey Von der Fakult¨at f¨ur.

Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie.

HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.

8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.

MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.

Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.

1 Update on WordWave Fisher Transcription Owen Kimball, Chia-lin Kao, Jeff Ma, Rukmini Iyer, Rich Schwartz, John Makhoul.

Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

1 Channel Coding (III) Channel Decoding. ECED of 15 Topics today u Viterbi decoding –trellis diagram –surviving path –ending the decoding u Soft.

MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.

Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.

Tight Coupling between ASR and MT in Speech-to-Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.

MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.

Pruning Analysis for the Position Specific Posterior Lattices for Spoken Document Search Jorge Silva University of Southern California Ciprian Chelba and.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Tight Coupling between ASR and MT in Speech-to-Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar.

Bayes Risk Minimization using Metric Loss Functions R. Schlüter, T. Scharrenbach, V. Steinbiss, H. Ney Present by Fang-Hui, Chu.

Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI

An overview of decoding techniques for LVCSR

Coupling between ASR and MT in Speech-to-Speech Translation

Conditional Random Fields for ASR

8.0 Search Algorithms for Speech Recognition

Statistical Machine Translation Part III – Phrase-based SMT / Decoding

Tight Coupling between ASR and MT in Speech-to-Speech Translation

CSCI 5822 Probabilistic Models of Human and Machine Learning

The connected word recognition problem Problem definition: Given a fluently spoken sequence of words, how can we determine the optimum match in terms.

Mohamed Kamel Omar and Lidia Mangu ICASSP 2007

Jianping Fan Dept of Computer Science UNC-Charlotte

Automatic Speech Recognition: Conditional Random Fields for ASR

N-Gram Model Formulas Word sequences Chain rule of probability

The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’

Connected Word Recognition

Coupling between ASR and MT in Speech-to-Speech Translation

LECTURE 15: REESTIMATION, EM AND MIXTURES

Dynamic Programming Search

Computational Genomics Lecture #3a

A word graph algorithm for large vocabulary continuous speech recognition Stefan Ortmanns, Hermann Ney, Xavier Aubert Bang-Xuan Huang Department of Computer.

Parallel Programming in C with MPI and OpenMP

August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab

Fragment Assembly 7/30/2019.

Search and Decoding in Speech Recognition

Presentation transcript:

Presenter : Jen-Wei Kuo Sausage Lidia Mangu Eric Brill Andreas Stolcke Presenter : Jen-Wei Kuo 2004/ 9 /24

Referred Reference CSL’00 Finding Consensus in Speech Recognition : Word Error Minimization and other Applications of Confusion Networks Eurospeech’99 Finding Consensus among Words : Lattice-Based Word Error Minimization Eurospeech’97 Explicit Word Error Minimization in N-Best List Rescoring

Motivation The mismatch between the standard scoring paradigm (MAP) and the evaluation metric (WER). maximize sentence posterior probability  minimize sentence level error

An Example Correct answer : I’M DOING FINE

Word Error Minimization Minimizing the expected word error under the posterior distribution potential hypothesis

N-best Approximation

Lattice-Based Word Error Minimization Computational Problem Several orders of magnitude larger than in N-best lists of practical size. No efficient algorithm of this kind. Fundamental Difficulty Objective function is based on pairwise string distance, a nonlocal measure. Solution Replace pairwise string alignment with a modified multiple string alignment. WE (word error)  MWE (modified word error)

Lattice to Confusion Network Multiple Alignment

Multiple Alignment Finding the optimal alignment is a problem for which no efficient solution is known (Gusfield, 1992) We resort to a heuristic approach based on lattice topology.

Algorithms Step1. Arc Pruning Step2. Same-Arc Clustering Step3. Intra-Word Clustering Step4*. Same-Phones Clustering Step5. Inter-Word Clustering Step6. Adding null hypothesis Step7. Consensus-based Lattice Pruning

Arc Pruning

Intra-Word Clustering Same-Arc Clustering Arcs with with same word_id, start frame and end frame would be merged first. Intra-Word Clustering Arcs with same word_id would be merged.

Same-Phones Clustering Arcs with same phone sequences would be clustered in this stage.

Inter-Word Clustering Remaining arcs be clustered at this stage finally.

Adding null hypothesis For each equivalent class, if the sum of the posterior probabilities is less than threshold (0.6) than add the null hypothesis to the class.

Consensus-based Lattice Pruning Standard Method  Likelihood-based Paths whose overall score differs by more than a threshold from the best-scoring path are removed from the word graph. Proposed Method  Consensus-based Firstly we construct a pruned confusion network. Then intersect the original lattice with the pruned confusion network.

Algorithm

An Example 我是 How to merge ? 我是我是是我是誰

Computational Issues Partial Order Stupid Method: History-based Look-ahead Apply first-pass search to find the history arcs for each arc.  Generate the initial partial ordering. While clusters are merged, lots of computation for (recursive) updates are needed. Thousands of arcs  need lots of memory storage.

Computational Issues – An example CA If we merge B and C, what happened? JA DA KA A B FA MA A C D F GA LA NA E G J H L N I K M

Experimental Set-up Lattices was built using HTK Training Corpus Trained with about 60 hours of Switchboard speech. LM is a backoff trigram model trained on 2.2 million words of Switchboard transcripts. Testing Corpus Test set in the 1997 JHU

Experimental Results

Experimental Results Hypothesis F0 F1 F2 F3 F4 F5 FX Overall Short utt. Long utt. MAP 13.0 30.8 42.1 31.0 22.8 52.3 53.9 33.1 33.3 31.5 N-best (center) 30.6 31.1 22.6 52.4 33.0 Lattice (consensus) 11.9 30.5 30.7 22.3 51.8 52.7 32.5

Confusion Network Analyses

Other Approaches ROVER (Recognizer Output Voting Error Reduction)