8.0 Search Algorithms for Speech Recognition References: 1. 12.1-12.5 of Huang, or 2. 7.2-7.6 of Becchetti, or 3. 5.1-5.7, 6.1-6.5 of Jelinek 4. “ Progress.

Slides:



Advertisements
Similar presentations
LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0)
Advertisements

Sphinx-3 to 3.2 Mosur Ravishankar School of Computer Science, CMU Nov 19, 1999.
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
ECE 8443 – Pattern Recognition Objectives: Elements of a Discrete Model Evaluation Decoding Dynamic Programming Resources: D.H.S.: Chapter 3 (Part 3) F.J.:
15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.
Introduction to Hidden Markov Models
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Hidden Markov Models David Meir Blei November 1, 1999.
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Dynamic Time Warping Applications and Derivation
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Graphical models for part of speech tagging
1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.
7-Speech Recognition Speech Recognition Concepts
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Informed State Space Search Department of Computer Science & Engineering Indian Institute of Technology Kharagpur.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.
Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Evaluation Decoding Dynamic Programming.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
The Use of Virtual Hypothesis Copies in Decoding of Large-Vocabulary Continuous Speech Frank Seide IEEE Transactions on Speech and Audio Processing 2005.
Searching for Solutions
Chapter 12 search and speaker adaptation 12.1 General Search Algorithm 12.2 Search Algorithms for Speech Recognition 12.3 Language Model States 12.4 Speaker.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
A NONPARAMETRIC BAYESIAN APPROACH FOR
An overview of decoding techniques for LVCSR
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
LECTURE 15: HMMS – EVALUATION AND DECODING
8.0 Search Algorithms for Speech Recognition
Digital Systems: Hardware Organization and Design
Digital Systems: Hardware Organization and Design
4.0 More about Hidden Markov Models
LECTURE 14: HMMS – EVALUATION AND DECODING
Connected Word Recognition
LECTURE 15: REESTIMATION, EM AND MIXTURES
Research on the Modeling of Chinese Continuous Speech Recognition
Dynamic Programming Search
Lecture 6 Dynamic Programming
A word graph algorithm for large vocabulary continuous speech recognition Stefan Ortmanns, Hermann Ney, Xavier Aubert Bang-Xuan Huang Department of Computer.
Presenter : Jen-Wei Kuo
Presentation transcript:

8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress in Dynamic Programming Search for LVCSR (Large Vocabulary Continuous Speech Recognition)”, Proceedings of the IEEE, Aug 2000

A Simplified Block Diagram Example Input Sentence this is speech Acoustic Models (th-ih-s-ih-z-s-p-ih-ch) Lexicon (th-ih-s) → this (ih-z) → is (s-p-iy-ch) → speech Language Model (this) – (is) – (speech) P(this) P(is | this) P(speech | this is) P(w i |w i-1 ) bi-gram language model P(w i |w i-1,w i-2 ) tri-gram language model,etc Basic Approach for Large Vocabulary Speech Recognition Front-end Signal Processing Acoustic Models Lexicon Feature Vectors Linguistic Decoding and Search Algorithm Output Sentence Speech Corpora Acoustic Model Training Language Model Construction Text Corpora Language Model Input Speech

DTW and Dynamic Programming Dynamic Time Warping (DTW) –well accepted pre-HMM approach –find an optimal path for matching two templates with different length –good for small-vocabulary isolated-word recognition even today Test Template [y j, j=1,2,...N] and Reference Template [x i, i=1,2,..M] –warping both templates to a common length L warping functions: f x (m)= i, f y (m)= j, m=1,2,...L –endpoint constraints: f x (1)= f y (1)= 1, f x (L)=M, f y (L)= N monotonic constraints: f x (m+1) ≥ f x (m), f y (m+1) ≥ f y (m) –matching pair: for every m, m: index for matching pairs –recursive relationship: –global constraints/local constraints –lack of a good approach to train a good reference pattern Dynamic Programming –replacing the problem by a smaller sub-problem and formulating an iterative procedure Reference: 4.7 up to of Rabiner and Juang

DTW programming (reference) (test) reference test

DTW

f x (m)= i, f y (m)= j, m=1,2,...L f x (1)= f y (1)= 1, f x (L)=M, f y (L)= N f x (m+1) ≥ f x (m), f y (m+1) ≥ f y (m)

Viterbi Algorithm (P.21 of 4.0) δt(i)δt(i) δ t+1 (j) δt(i)δt(i) i t tt i j  t+1 ( j) = max [  t (i)a ij ]  b j (o t+1 ) i  t (i) = max P[q 1,q 2,…q t-1, q t = i, o 1,o 2,…,o t | ] q 1, q 2, …q t-1

Viterbi Algorithm (P.22 of 4.0)

Continuous Speech Recognition Example: Digit String Recognition― One-stage Search Unknown Number of Digits No Lexicon/Language Model Constraints Search over a 3-dim Grid Switched to the First State of the Next Model at the End of the Previous Model May Result with Substitution, Deletion and Insertion t

Recognition Errors Reference: (T) Recognized: insertion (I) substitution (S) deletion (D) Accuracy

Continuous Speech Recognition Example: Digit String Recognition ― Level-Building Known Number of Digits No Lexicon/Language Model Constraints Higher Computation Complexity, No Deletion/Insertion State t –number of levels = number of digits in an utterance –automatic transition from the last state of the previous model to the first state of the next model

Time (Frame)- Synchronous Viterbi Search for Large-Vocabulary Continuous Speech Recognition MAP Principle An Approximation –the word sequence with the highest probability for the most probable state sequence usually has approximately the highest probability for all state sequences –Viterbi search, a sub-optimal approach Viterbi Search ― Dynamic Programming –replacing the problem by a smaller sub-problem and formulating an iterative procedure –time (frame)- synchronous: the best score at time t is updated from all states at time t-1 Tree Lexicon as the Basic Working Structure from HMM from Language Model –each arc is an HMM (phoneme, tri-phone, etc.) –each leaf node is a word –search processes for a segment of utterance through some common units for different words can be shared –search space constrained by the lexicon –the same tree copy reproduced at each leaf node in principle

Basic Problem 2 for HMM (P.24 of 4.0) ․ Application Example of Viterbi Algorithm - Isolated word recognition The model with the highest probability for the most probable path usually also has the highest probability for all possible paths. observation 1  i  n 1  i  n Basic Problem 1 Forward Algorithm (for all paths) Basic Problem 2 Viterbi Algorithm (for a single best path)

Tree Lexicon states time o 1 o 2 …. o t …. o T

Time (Frame)- Synchronous Viterbi Search for Large –Vocabulary Continuous Speech Recognition Define Key Parameters D (t, q t, w) : objective function for the best partial path ending at time t in state q t for the word w h (t, q t, w) : backtrack pointer for the previous state at the pervious time when the best partial path ends at time t in state q t for the word w Intra-word Transition ― HMM only, no Language Model Inter-word Transition ― Language Model only, no HMM (bi-gram as an example)

Time Synchronous Viterbi Search D (t, q t, w) otot qtqt w

q f (v) Q t t

Time (Frame)- Synchronous Viterbi Search for Large-Vocabulary Continuous Speech Recognition Beam Search –at each time t only a subset of promising paths are kept –example 1: define a beam width L (i.e. keeping only L paths at each time) example 2: define a threshold Th (i.e. all paths with D< D max,t -Th are deleted) –very helpful in reducing the search space Two-pass Search (or Multi-pass Search) –use less knowledge or less constraints (e.g. acoustic model with less context dependency or language with lower order) in the first stage, while more knowledge or more constraints in rescoring in the second path –search space significantly reduced by decoupling the complicated search process into simpler processes N-best List and Word Graph (Lattice) W 1 W 2 W 1 W 6 W 5 W 3 W 4 W 5 W 7 W 8 W 9 W 10 W 9 –similarly constructed with dynamic programming iterations N-best List or Word Graph Generation Rescoring X N-best List Word Graph W Time

S: starting G: goal  to find the minimum distance path Blind Search Algorithms  Depth-first Search: pick up an arbitrary alternative and proceed  Breath-first Search: consider all nodes on the same level before going to the next level  no sense about where the goal is 2.8 G E F C B A D S An Example – a city traveling problem Some Search Algorithm Fundamentals Search Tree(Graph) S A B B A F D G E C C C D E E G Heuristic Search  Best-first Search  based on some knowledge, or “heuristic information” f(n) = g(n)+h*(n) g(n): distance up to node n h*(n): heuristic estimate for the remaining distance up to G  heuristic pruning A S C E G h * (n): straight-line distance

A B C D EF G L4 L1L2L List of Candidate Steps Node g(n) h*(n) f(n) A B C D E F G L L L L Heuristic Search: Another Example Problem: Find a path with the highest score from root node “A” to some leaf node (one of “L1”,”L2”,”L3”,”L4”) TopCandidate List A(15) C(15)C(15), B(13), D(7) G(14)G (14), B(13), F(9), D(7) B(13)B(13), L3(12), F(9), D(7) L3(12)L3 (12), E(11), F(9), D(7)

A* Search and Speech Recognition Admissibility –a search algorithm is admissible if it is guaranteed that the first solution found is optimal, if one exists (for example, beam search is NOT admissible) It can be shown –the heuristic search is admissible if for all n with a highest-score problem –A * search when the above is satisfied Procedure –Keep a list of next-step candidates, and proceed with the one with the highest f(n) (for a highest-score problem) A * search in Speech Recognition –example 1: use of weak constraints in the first pass to generate heuristic estimates in multi-pass search –example 2: estimated average score per frame as the heuristic information