Dynamic Programming Search

Slides:



Advertisements
Similar presentations
LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0)
Advertisements

Sphinx-3 to 3.2 Mosur Ravishankar School of Computer Science, CMU Nov 19, 1999.
Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Hidden Markov Models Theory By Johan Walters (SR 2003)
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Natural Language Understanding
1 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI LANGUAGE AND INTELLIGENCE U N I V E R S I T Y O F P I S A DEPARTMENT OF COMPUTER SCIENCE Automatic.
Graphical models for part of speech tagging
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.
8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.
Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
Artificial Intelligence University Politehnica of Bucharest Adina Magda Florea
The Use of Virtual Hypothesis Copies in Decoding of Large-Vocabulary Continuous Speech Frank Seide IEEE Transactions on Speech and Audio Processing 2005.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Chapter 12 search and speaker adaptation 12.1 General Search Algorithm 12.2 Search Algorithms for Speech Recognition 12.3 Language Model States 12.4 Speaker.
CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Probabilistic Pronunciation + N-gram Models CMSC Natural Language Processing April 15, 2003.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Solving problems by searching Chapter 3. Types of agents Reflex agent Consider how the world IS Choose action based on current percept Do not consider.
CS 224S / LINGUIST 285 Spoken Language Processing
Solving problems by searching
Juicer: A weighted finite-state transducer speech decoder
An overview of decoding techniques for LVCSR
LECTURE 15: HMMS – EVALUATION AND DECODING
Garbage Collection Modern programming languages provide garbage collection mechanisms for reclaiming the memory locations that are no longer used by programs.
8.0 Search Algorithms for Speech Recognition
i206: Lecture 13: Recursion, continued Trees
CSCI 5832 Natural Language Processing
The connected word recognition problem Problem definition: Given a fluently spoken sequence of words, how can we determine the optimum match in terms.
Statistical Models for Automatic Speech Recognition
Graphs.
N-Gram Model Formulas Word sequences Chain rule of probability
LECTURE 14: HMMS – EVALUATION AND DECODING
CONTEXT DEPENDENT CLASSIFICATION
EA C461 – Artificial Intelligence Problem Solving Agents
Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky
Connected Word Recognition
Dynamic Programming 動態規劃
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Graphs.
Graphs.
A word graph algorithm for large vocabulary continuous speech recognition Stefan Ortmanns, Hermann Ney, Xavier Aubert Bang-Xuan Huang Department of Computer.
Some Graph Algorithms.
Presenter : Jen-Wei Kuo
Unit II Game Playing.
Cs212: Data Structures Lecture 7: Tree_Part1
Presentation transcript:

Dynamic Programming Search for Continuous Speech Recognition 2004/11/3 報告者:陳怡婷

Concerned Problems Could this huge search space be handled by DP in an efficient way? How to decide good DP search DP computes only the single best sentence??

Outline Describe some problems and rule about Speech recognition One-Pass DP Search Using a Linear Lexicon One-Pass DP Search Using a Tree Lexicon One-Pass DP Search for Word Graph Construction

Boundary of Acoustic signal or words、speakers、quality of speech signal、natural-language… Bayes Decision Rule:(圖一所示) Pr(w1…wN | x1…xT) =>Pr(w1…wN)Pr(x1..xT|w1...wN) Language Model:syntactic、semantic… (for large-vocabulary: bigram or trigram model) Acoustic-phonetic Model (training、HMMs、pronunciation lexicon or dictionary)

(圖一 Bayes Decision Rule)

Specification of Search Problem Decision on the spoken words Language model Acoustic-phonetic model Pronunciation (optimization by knowledge sources) Thinking:a super HMM for a hypothesized word sequence

Consider only the most probable –Viterbi approximation ∴The search has to be performed at two level: state level ( ) word level ( ) Recombine hypotheses at both levels by DP---beam search

One-Pass DP Search Using a Linear Lexicon For three-word vocabulary ,the search space:

Maximum approximation:  To assign each acoustic vector observed at time t to a (state, word) index pair. (s1, w1),…,(st, wt),…,(sT, wT)

HMMs-the word interior (top) Word boundaries (bottom) Pr(w1…wt)˙Pr(x1…xt; s1…st | w1…wt) Note that the unknown word sequence and the unknown state sequence are determined simultaneously. ---DP algorithm

DP Recursion Two quantities: Q( t, s; w) score of the best path to time that ends in state s of word B( t, s; w) start time of the best path up to time t that ends in state s of word w DP solved that two types of transition rules for the path- the word interior and word boundaries.

In the word interior:  Q(t, s; w) = max {P(xt, s | s’; w)˙Q(t-1, s’; w)}  B(t, s; w) = B(t-1, smax(t, s; w); w)   Where smax(t, s; w) is the optimum predecessor state for the hypothesis (t, s; w). Word boundary: H(w; t) := max {P(w|v)˙Q(t, Sv; v)}

The search procedure works with a time-synchronous breadth-first strategy. (table 2) To reduce the storage requirements- traceback array. A strategy-Beam Search

Table 2

One-Pass DP Search Using a three Lexicon To large-vocabulary recognition,for efficiency reasons-Using the form of a prefix tree How to present the search algorithm for such a context ??

Structure the search space as follow: ( bigram )

DP Recursion Qv (t, s) := score of the best partial path that ends at time t in state s of the lexical tree for predecessor v. Bv (t, s) := start time of the best partial path that ends at time t in state s of the lexical tree for predecessor v.

Qv(t, s) = max {P(wt, s | s’)˙Qv(t-1, s’)} Bv(t, s) = Bv(t-1, svmax(t, s)) Where svmax(t, s) is the optimum predecessor state for the hypothesis (t, s) and predecessor word v. At the word boundaries H(w; t) := max {P(w|v)˙Qv(t, Sw)} Qv(t-1, s=0) = H(v; t-1) Bv(t-1, s=0) = t-1

Extension to Trigram Language Models The root of each tree copy is labeled with its two-word history. The probabilities or costs of each edge depend only on the edge itself. Size of the potential search space is increased drastically. Pruning strategy is even more crucial.

Refinements and Implementation Issues Pruning Refinements Acoustic pruning: QAC(t) := max{Qv(t,s)} Qv(t,s) < fAC˙QAC(t) Language model pruning (word end pruning) QLM(t) := max{Qv(t, s=0)} Qv(t, s=0) < fLM˙QLM(t) Histogram pruning

Language Model Look-Ahead To incorporate the language model probabilities as early as possible into the search process where W(s) is the set of words that can be reached from tree state s. To incorporate the anticipated LM probabilities into the three pruning operations of the search Why? Reduce number of state hypotheses. Problem?

Fig. 11

Implementation To arrive at an efficient implementation for tree search: Set representation of active hypotheses Set of active tree Arc Set of active HMM states (index s ,score Q ,back pointer) Forward DP recombination (computational cost down) Word boundaries、phoneme boundaries、HMM states Direct access to each new successor hypotheses. an exception-trigram LM.

Implementation Traceback and Garbage Collection Back pointer->a special traceback array (word index, end time of the predecessor word, score, and back pointer) Apply a garbage collection or purging method -> extend an additional component (time stamp)

報告完畢 Ps. 謝謝各位學長的指導 ^^

Table 4

One-Pass DP Search For Word Graph Construction Main idea:word alternative in regions of the speech signal. To keep track of word sequence hypotheses whose scores are very to the optimal hypothesis, but don’t survive To represents a word sequences by a word graph, Each word sequence should be close to single best sentence.

Using the same principle of time synchrony for word graph: =Conditional probability that word w produces the acoustic vectors =Joint probability of observing the acoustic vector and a word sequence with end time t.

decomposition- where is probability of the language model => For construct a word graph,Introduce a formal definition of the word boundary

Exploiting an m-gram language model we can recombine word sequence hypotheses at phrase level if they do not differ in their final (m-1) words. =>sufficient to distinguish partial word sequence hypotheses by their final words ∴ (included pruning strategy)

Word Pair Approximation The crucial assumption now is that the dependence of the word boundary can be confined to the final word pair => or Assuming the word pair approximation- At every time frame t, consider all word pairs =(v,w) For each triple(t ;v ,w), keep track of word boundary 、word score At end of the speech signal word grapy is constructed by…...

Fig 13

How to computation ?

table5

A third level-the phrase level Depending on whether the phrase-level recognition is carried out in a time-synchronous. Extended one-pass approach Two-pass approach : cache-based language model

Fig14

Principal properties: There is a maximum for the number of incoming word edges in any node, namely the vocabulary size. There is no maximum for the number of outgoing word edges. Two refinements of word graph method: Short words Long words with identical ending portions may waste esearch effort.

table6

table8