Presenter : Jen-Wei Kuo Sausage Lidia Mangu Eric Brill Andreas Stolcke Presenter : Jen-Wei Kuo 2004/ 9 /24
Referred Reference CSL’00 Finding Consensus in Speech Recognition : Word Error Minimization and other Applications of Confusion Networks Eurospeech’99 Finding Consensus among Words : Lattice-Based Word Error Minimization Eurospeech’97 Explicit Word Error Minimization in N-Best List Rescoring
Motivation The mismatch between the standard scoring paradigm (MAP) and the evaluation metric (WER). maximize sentence posterior probability minimize sentence level error
An Example Correct answer : I’M DOING FINE
Word Error Minimization Minimizing the expected word error under the posterior distribution potential hypothesis
N-best Approximation
Lattice-Based Word Error Minimization Computational Problem Several orders of magnitude larger than in N-best lists of practical size. No efficient algorithm of this kind. Fundamental Difficulty Objective function is based on pairwise string distance, a nonlocal measure. Solution Replace pairwise string alignment with a modified multiple string alignment. WE (word error) MWE (modified word error)
Lattice to Confusion Network Multiple Alignment
Multiple Alignment Finding the optimal alignment is a problem for which no efficient solution is known (Gusfield, 1992) We resort to a heuristic approach based on lattice topology.
Algorithms Step1. Arc Pruning Step2. Same-Arc Clustering Step3. Intra-Word Clustering Step4*. Same-Phones Clustering Step5. Inter-Word Clustering Step6. Adding null hypothesis Step7. Consensus-based Lattice Pruning
Arc Pruning
Intra-Word Clustering Same-Arc Clustering Arcs with with same word_id, start frame and end frame would be merged first. Intra-Word Clustering Arcs with same word_id would be merged.
Same-Phones Clustering Arcs with same phone sequences would be clustered in this stage.
Inter-Word Clustering Remaining arcs be clustered at this stage finally.
Adding null hypothesis For each equivalent class, if the sum of the posterior probabilities is less than threshold (0.6) than add the null hypothesis to the class.
Consensus-based Lattice Pruning Standard Method Likelihood-based Paths whose overall score differs by more than a threshold from the best-scoring path are removed from the word graph. Proposed Method Consensus-based Firstly we construct a pruned confusion network. Then intersect the original lattice with the pruned confusion network.
Algorithm
An Example 我 是 How to merge ? 我 是 我 是 是 我 是 誰
Computational Issues Partial Order Stupid Method: History-based Look-ahead Apply first-pass search to find the history arcs for each arc. Generate the initial partial ordering. While clusters are merged, lots of computation for (recursive) updates are needed. Thousands of arcs need lots of memory storage.
Computational Issues – An example CA If we merge B and C, what happened? JA DA KA A B FA MA A C D F GA LA NA E G J H L N I K M
Experimental Set-up Lattices was built using HTK Training Corpus Trained with about 60 hours of Switchboard speech. LM is a backoff trigram model trained on 2.2 million words of Switchboard transcripts. Testing Corpus Test set in the 1997 JHU
Experimental Results
Experimental Results Hypothesis F0 F1 F2 F3 F4 F5 FX Overall Short utt. Long utt. MAP 13.0 30.8 42.1 31.0 22.8 52.3 53.9 33.1 33.3 31.5 N-best (center) 30.6 31.1 22.6 52.4 33.0 Lattice (consensus) 11.9 30.5 30.7 22.3 51.8 52.7 32.5
Confusion Network Analyses
Other Approaches ROVER (Recognizer Output Voting Error Reduction)