Dynamic Programming Search

Dynamic Programming Search
for Continuous Speech Recognition 2004/11/3 報告者：陳怡婷

Concerned Problems Could this huge search space be handled by DP in an efficient way? How to decide good DP search DP computes only the single best sentence??

Outline Describe some problems and rule about Speech recognition
One-Pass DP Search Using a Linear Lexicon One-Pass DP Search Using a Tree Lexicon One-Pass DP Search for Word Graph Construction

Boundary of Acoustic signal or words、speakers、quality of speech signal、natural-language…
Bayes Decision Rule：(圖一所示） Pr(w1…wN | x1…xT) =>Pr(w1…wN)Pr(x1..xT|w1...wN) Language Model：syntactic、semantic… (for large-vocabulary: bigram or trigram model) Acoustic-phonetic Model (training、HMMs、pronunciation lexicon or dictionary)

(圖一 Bayes Decision Rule）

Specification of Search Problem
Decision on the spoken words Language model Acoustic-phonetic model Pronunciation (optimization by knowledge sources) Thinking：a super HMM for a hypothesized word sequence

Consider only the most probable –Viterbi approximation
∴The search has to be performed at two level： state level ( ) word level ( ) Recombine hypotheses at both levels by DP---beam search

One-Pass DP Search Using a Linear Lexicon
For three-word vocabulary ,the search space：

Maximum approximation：
　To assign each acoustic vector observed at time t to a (state, word) index pair. (s1, w1),…,(st, wt),…,(sT, wT)

HMMs-the word interior (top)
Word boundaries (bottom) Pr(w1…wt)˙Pr(x1…xt; s1…st | w1…wt) Note that the unknown word sequence and the unknown state sequence are determined simultaneously. ---DP algorithm

DP Recursion Two quantities: Q( t, s; w) score of the best path to time that ends in state s of word B( t, s; w) start time of the best path up to time t that ends in state s of word w DP solved that two types of transition rules for the path－ the word interior and word boundaries.

In the word interior：　Q(t, s; w) = max {P(xt, s | s’; w)˙Q(t-1, s’; w)} 　B(t, s; w) = B(t-1, smax(t, s; w); w) 　　Where smax(t, s; w) is the optimum predecessor state for the hypothesis (t, s; w). Word boundary： H(w; t) := max {P(w|v)˙Q(t, Sv; v)}

The search procedure works with a time-synchronous breadth-first strategy. (table 2)
To reduce the storage requirements－ traceback array. A strategy－Beam Search

Table 2

One-Pass DP Search Using a three Lexicon
To large-vocabulary recognition，for efficiency reasons－Using the form of a prefix tree How to present the search algorithm for such a context ??

Structure the search space as follow: ( bigram )

DP Recursion Qv (t, s) := score of the best partial path that ends at time t in state s of the lexical tree for predecessor v. Bv (t, s) := start time of the best partial path that ends at time t in state s of the lexical tree for predecessor v.

Qv(t, s) = max {P(wt, s | s’)˙Qv(t-1, s’)}
Bv(t, s) = Bv(t-1, svmax(t, s)) Where svmax(t, s) is the optimum predecessor state for the hypothesis (t, s) and predecessor word v. At the word boundaries H(w; t) := max {P(w|v)˙Qv(t, Sw)} Qv(t-1, s=0) = H(v; t-1) Bv(t-1, s=0) = t-1

Extension to Trigram Language Models
The root of each tree copy is labeled with its two-word history. The probabilities or costs of each edge depend only on the edge itself. Size of the potential search space is increased drastically. Pruning strategy is even more crucial.

Refinements and Implementation Issues
Pruning Refinements Acoustic pruning： QAC(t) := max{Qv(t,s)} Qv(t,s) < fAC˙QAC(t) Language model pruning (word end pruning) QLM(t) := max{Qv(t, s=0)} Qv(t, s=0) < fLM˙QLM(t) Histogram pruning

Language Model Look-Ahead
To incorporate the language model probabilities as early as possible into the search process where W(s) is the set of words that can be reached from tree state s. To incorporate the anticipated LM probabilities into the three pruning operations of the search Why? Reduce number of state hypotheses. Problem?

Fig. 11

Implementation To arrive at an efficient implementation for tree search： Set representation of active hypotheses Set of active tree Arc Set of active HMM states (index s ,score Q ,back pointer) Forward DP recombination (computational cost down) Word boundaries、phoneme boundaries、HMM states Direct access to each new successor hypotheses. an exception－trigram LM.

Implementation Traceback and Garbage Collection
Back pointer－>a special traceback array (word index, end time of the predecessor word, score, and back pointer) Apply a garbage collection or purging method －> extend an additional component (time stamp)

報告完畢 Ps. 謝謝各位學長的指導　^^

Table 4

One-Pass DP Search For Word Graph Construction
Main idea：word alternative in regions of the speech signal. To keep track of word sequence hypotheses whose scores are very to the optimal hypothesis, but don’t survive To represents a word sequences by a word graph, Each word sequence should be close to single best sentence.

Using the same principle of time synchrony for word graph:
=Conditional probability that word w produces the acoustic vectors =Joint probability of observing the acoustic vector and a word sequence with end time t.

decomposition－ where is probability of the language model => For construct a word graph，Introduce a formal definition of the word boundary

Exploiting an m-gram language model we can recombine word sequence hypotheses at phrase level if they do not differ in their final (m-1) words. =>sufficient to distinguish partial word sequence hypotheses by their final words ∴ (included pruning strategy)

Word Pair Approximation
The crucial assumption now is that the dependence of the word boundary can be confined to the final word pair => or Assuming the word pair approximation－ At every time frame t, consider all word pairs =(v,w) For each triple(t ;v ,w), keep track of word boundary 、word score At end of the speech signal word grapy is constructed by…...

Fig 13

How to computation ?

table5

A third level－the phrase level
Depending on whether the phrase-level recognition is carried out in a time-synchronous. Extended one-pass approach Two-pass approach : cache-based language model

Principal properties：
There is a maximum for the number of incoming word edges in any node, namely the vocabulary size. There is no maximum for the number of outgoing word edges. Two refinements of word graph method： Short words Long words with identical ending portions may waste esearch effort.

table6

table8

Dynamic Programming Search

Similar presentations

Presentation on theme: "Dynamic Programming Search"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Programming Search

Similar presentations

Presentation on theme: "Dynamic Programming Search"— Presentation transcript:

Similar presentations

About project

Feedback