A word graph algorithm for large vocabulary continuous speech recognition Stefan Ortmanns, Hermann Ney, Xavier Aubert Bang-Xuan Huang Department of Computer.

Slides:



Advertisements
Similar presentations
LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0)
Advertisements

Sphinx-3 to 3.2 Mosur Ravishankar School of Computer Science, CMU Nov 19, 1999.
PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Metamorphic Malware Research
Memory-Efficient Regular Expression Search Using State Merging Department of Computer Science and Information Engineering National Cheng Kung University,
Automatic Continuous Speech Recognition Database speech text Scoring.
1 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI LANGUAGE AND INTELLIGENCE U N I V E R S I T Y O F P I S A DEPARTMENT OF COMPUTER SCIENCE Automatic.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
A Survey of ICASSP 2013 Language Model Department of Computer Science & Information Engineering National Taiwan Normal University 報告者:郝柏翰 2013/06/19.
Graphical models for part of speech tagging
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.
8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.
Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
The Use of Virtual Hypothesis Copies in Decoding of Large-Vocabulary Continuous Speech Frank Seide IEEE Transactions on Speech and Audio Processing 2005.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Design and Analysis of Algorithms (09 Credits / 5 hours per week) Sixth Semester: Computer Science & Engineering M.B.Chandak
PhD Dissertation Defense Scaling Up Machine Learning Algorithms to Handle Big Data BY KHALIFEH ALJADDA ADVISOR: PROFESSOR JOHN A. MILLER DEC-2014 Computer.
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Sridhar Raghavan and Joseph Picone URL:
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Data Structures Dr. Abd El-Aziz Ahmed Assistant Professor Institute of Statistical Studies and Research, Cairo University Springer 2015 DS.
Bayes Risk Minimization using Metric Loss Functions R. Schlüter, T. Scharrenbach, V. Steinbiss, H. Ney Present by Fang-Hui, Chu.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Design and Analysis of Algorithms (09 Credits / 5 hours per week)
Introduction to Concept Mapping
What is UML? What is UP? [Arlow and Neustadt, 2005] October 5, 2017
핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee
An overview of decoding techniques for LVCSR
2018/6/26 An Energy-efficient TCAM-based Packet Classification with Decision-tree Mapping Author: Zhao Ruan, Xianfeng Li , Wenjun Li Publisher: 2013.
Statistical Models for Automatic Speech Recognition
LECTURE 15: HMMS – EVALUATION AND DECODING
Garbage Collection Modern programming languages provide garbage collection mechanisms for reclaiming the memory locations that are no longer used by programs.
8.0 Search Algorithms for Speech Recognition
Jiawei Han Department of Computer Science
Tight Coupling between ASR and MT in Speech-to-Speech Translation
Objective of This Course
Statistical Models for Automatic Speech Recognition
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Closure Representations in Higher-Order Programming Languages
LECTURE 14: HMMS – EVALUATION AND DECODING
Markov Random Fields Presented by: Vladan Radosavljevic.
Connected Word Recognition
Dynamic Programming Search
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
COP3530- Data Structures Introduction
Presenter : Jen-Wei Kuo
What is this course about?
Presentation transcript:

A word graph algorithm for large vocabulary continuous speech recognition Stefan Ortmanns, Hermann Ney, Xavier Aubert Bang-Xuan Huang Department of Computer Science & Information Engineering National Taiwan Normal University

Outline Introduction Review of the integrated method Word graph method Experiment result

Introduction Why LVCSR difficult ? This paper describes a method for the construction of a word graph (or lattice) for large vocabulary, continuous speech recognition. The advantage of a word graph is that the final search at the word level using a complicated language model can be achieved. The difficulty in efficiently constructing a good word graph is the following: the start time of a word depends in general on the predecessor words. Integrated method - Word conditioned lexical tree search algorithm - bigram - trigram

Review of the integrated method(1/4)

Review of the integrated method(2/4) where σvmax(t, s) is the optimum predecessor state for the hypothesis (t, s) and predecessor word v. q(xt, s|σ) is the product of transition and emission probabilities of the Hidden Markov models used for the context dependent or independent phonemes. At word boundaries, we have to find the best predecessor word v for each word w. To this purpose, we define: where the state Sw denotes the terminal state of word w in the lexical tree.

Review of the integrated method(3/4) To propagate the path hypothesis into the lexical tree hypotheses or to start them up if they do not exist yet, we have to pass on the score and the time index before processing the hypotheses for time frame t: Garbage collection Pruning techniques and language model look-ahead

Review of the integrated method(4/4) Extension to trigram language models

Word graph method fundamental problem of word graph construction: Hypothesizing a word w and its ending time t, how can we find a limited number of “most likely” predecessor words? v ….. t w

Word-Pair Approximation: for each path hypothesis, the position of word boundary between the last two words is independent of the other words of this path hypothesis

Word graph generation algorithm

Word-conditioned Search – A virtual tree copy is explored for each active LM history – More complicated in HMM state manipulation (dependent on the LM complexity) • Time-conditioned Search – A virtual tree copy is being entered at each time by the word end hypotheses of the same given time – More complicated in LM-level recombination

Garbage collection For large-vocabulary recognition, it is essential to keep the storage costs as low as possible. To reduce the memory requirements back pointers and traceback arrays are needed, whereas the traceback arrays are used to record the decisions about the best predecessor word for each word start-up. In order to remove these obsolete hypothesis entries from the traceback arrays, we apply a garbage collection or purging method. In principle, this garbage collection process can be performed every time frame, but to reduce the overhead, it is sufficient to perform it in regular time intervals, say every 50th time frame.

Pruning techniques and language model look-ahead