Mohamed Kamel Omar and Lidia Mangu ICASSP 2007

An Evaluation of Lattice Scoring using A Smoothed Estimate of Word Accuracy
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007 IBM T.J. Watson Research Center

Outline Introduction Problem formulation Implementation
MSWA algorithm MSWA-CN algorithm Experiments and results Conclusions

Introduction In ASR systems, the maximum a posteriori probability is the standard decoding criterion To minimize an estimate of the sentence-level error Inconsistent with the evaluation metrics of ASR The motivation of this paper is.. To select a hypothesis which minimizes an estimate of the word error rate of the hypothesis lattice To avoid the computational infeasibility of calculating the pair-wise Levenshtein distance between each two possible paths

Introduction In LVCSR systems, it is commonly the case the word lattices are used as a compact representation of the alternative hypotheses However, calculating pair-wise word error rates for different hypotheses in the lattice is computationally infeasible [L. Mangu et al. 2000] Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks [V. Goel et al. 2006] Segmental Minimum Bayes-Risk Decoding for Automatic Speech Recognition Systems [F. Wessel et al. 2001] Explicit Word Error Minimization Using Word Hypothesis Posterior Probabilities

Formulation Given two lattices, a measure of the word accuracy of the hypothesis lattice with respect to the reference lattice can be approximated by is the expected value over the joint probability mass function (PMF) of the hypothesis word sequence, H, and the reference word sequence, R, given the observation vector Y is the word accuracy of with respect to is the posterior probability of the reference string estimated from the reference lattice is the posterior probability of the hypothesis string estimated from the hypothesis lattice is a smoothed approximation of the word accuracy of with respect to which takes phonetic similarity into consideration (1)

Maximum Smoothed Word Accuracy Approach
Our goal is to select the word sequence in the hypothesis lattice which maximizes the estimate of the word accuracy -- (MSWA) This word sequence can be estimated using the Viterbi algorithm Alternatively we can assign to each word arc, , in the hypothesis lattice, the conditional value of the objective function in eq(1) given this word arc, that is (2) (3)

Smoothed Estimate of Word Accuracy
The approximate measure of the accuracy of a word arc , in the hypothesis lattice with respect to a path , which start or end in the middle of an arc in the reference lattice such that it coincides with in time, is SIL 國立台灣師範大學鍋粒颱風排彎吃飯司法 Hyp 國立粒彎台灣 SIL 師範大學鍋排吃飯颱風司法 Ref 颱風 t_a ai f_e eng 台灣師 t_a ai sic_u uan sh_empt

Implementation Two requirements (for , , ):
The forward-backward algorithm has to be applied to the reference lattice The state sequence for each arc in the reference lattice has to be known In this paper, two approaches were used to estimate the hypothesis which will approximately maximize the objective function The Viterbi-based MSWA algorithm were used to estimate the word sequence according eq(2) The MSWA-CN algorithm, which is based on the CN algorithm, were used to estimate the best word sequence using the conditional values in eq(3)

Viterbi-Based MSWA Algorithm
1. Initialization: for each starting arc in the hypothesis lattice, 2. Forward Propagation: the update equations of the viterbi algorithm for each non-starting arc, , is SIL 國立台灣師範大學鍋粒颱風排彎吃飯司法國立粒彎台灣 SIL 師範大學鍋排吃飯颱風司法

Viterbi-Based MSWA Algorithm
3. Backtracking: 4. Set and exit with the output word sequence

MSWA-CN Algorithm 1. Initialization:
For each starting arc in the hypothesis lattice, For each ending arc in the hypothesis lattice, 2. Forward Propagation: the update equations of the forward propagation part of the algorithm for each non-starting arc, , is

MSWA-CN Algorithm 3. Backward Propagation: the update equations of the backward propagation part of the algorithm for each non-ending arc, , is 4. For each word arc in the hypothesis lattice The confusion network algorithm is then used to find the best path in the hypothesis lattice after replacing the word posterior probability in the original algorithm with

Experimental Setup Corpus Two systems Feature extraction
DARPA 2004 Rich Transcription evaluation data (RT04) 2005 broadcast news Arabic test set (BNAT05) Two systems unvowelized system vowelized system Feature extraction LDA+MLLT Acoustic Modeling (penta-phone) 4000K Gaussians trained with a combination fMPE and MPE Language Modeling (Lexicon size:617K) 4-gram LM trained with modified Kneser-Ney smoothing

Experimental Results The main difference between the two systems is the explicit modeling by the vowelized system of the short vowels which are pronounced in Arabic but almost never transcribed.

Conclusions In this paper, a new smoothed word accuracy (SWA) objective function for lattice scoring have been examined Two algorithms which use the SWA objective function to estimate the best hypothesis in a given lattice have been described: The Viterbi-based MSWA algorithm The MSWA-CN algorithm In the future, the authors will intend to assess the usefulness of the conditional score in eq(3) for confidence annotation

Derivations

Mohamed Kamel Omar and Lidia Mangu ICASSP 2007

Similar presentations

Presentation on theme: "Mohamed Kamel Omar and Lidia Mangu ICASSP 2007"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mohamed Kamel Omar and Lidia Mangu ICASSP 2007

Similar presentations

Presentation on theme: "Mohamed Kamel Omar and Lidia Mangu ICASSP 2007"— Presentation transcript:

Similar presentations

About project

Feedback