Download presentation
Presentation is loading. Please wait.
Published byBarbara Owen Modified over 9 years ago
1
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne Johns Hopkins University Speech Communication. Accepted, in Revision Present by shih-hung 2005/10/31
2
Speech Lab NTNU 2 Reference Discriminative training for segmental minimum bayes risk decoding (ICASSP 2003) (Alphadigits) Lattice segmentation and minimum bayes risk discriminative training (Eurospeech 2003) (Alphadigits) Pinched lattice minimum bayes risk discriminative training for large vocabulary continuous speech recognition (ICSLP 2004) Minimum bayes risk estimation and decoding in large vocabulary continuous speech recognition (ATR workshop 2004)
3
Speech Lab NTNU 3 Outline Introduction Minimum bayes risk discriminative training –Update parameters via Extended Baum Welch algorithm –Efficient computation of risk –Risk-based pruning of the evidence space Pinched lattice minimum bayes risk discriminative training Pinched lattice MMIE for whole word acoustic models One worst pinched lattice MBRDT Small vocabulary ASR performance and analysis (Alphadigits) MBRDT for LVCSR results (Switchboard, MALACH-CZ) Conclusion
4
Speech Lab NTNU 4 Introduction Discriminative acoustic modeling procedures, such as MMIE, are powerful modeling techniques that can be used to improve the performance of ASR systems. MMI is often motivated as an estimation procedure by observing that it increases the a posteriori probability of the correct transcription of the speech in the training set. ML MMI (MP)
5
Speech Lab NTNU 5 Introduction Since the ultimate goal is to reduce the number of words in error, which we will define as the loss, estimation procedures that reduce the loss rather than improve likelihood. One such risk-based parameter estimation procedure was developed by Kaiser et al.(2000, 2002) to reduce the expected loss, or risk, over the training set.
6
Speech Lab NTNU 6 Introduction Their approach is a generalization of MMI in that both are derived via Extended Baum Welch algorithm, and MMI is special case of risk minimization under the sentence error loss function.
7
Speech Lab NTNU 7 Introduction The risk-based estimation algorithm of Kaiser et al. is not suited for direct application to LVCSR. –The difficulty arises from the need to compute the risk over many recognition hypotheses to obtain reliable statistical estimates. In small vocabulary tasks, N-Best lists are adequate to represent the space of hypotheses. While lattice algorithms have been developed to compute the statistic needed for likelihood-based estimation procedures such as MMI. –risk-based estimation algorithms are not easily formulated over lattices.
8
Speech Lab NTNU 8 Introduction The focus of this paper is the efficient computation of loss and likelihood in risk-based parameter estimation for LVCSR. We use lattice-cutting techniques developed for Minimum Bayes Risk decoding (Goel 2001) to efficiently compute the statistics needed by the algorithm of Kaiser et al.
9
Speech Lab NTNU 9 Minimum bayes risk discriminative training Likelihood-based: Risk-based: via the Extended Baum Welch: CqCq C avg rqrq r MBR
10
Speech Lab NTNU 10 Computing statistic over the evidence space In LVCSR tasks, W is often a lattice generated by ASR decoder. Through the conditional independence assumption underlying the ASR system, posteriori can be found by summing over the lattice arcs. –Dense lattices are needed to obtain robust and unbiased estimations. However the risk minimizing estimation procedure is not readily over lattices. –The trellis (DP alignment) is not consistent with the structure of ASR lattices.
11
Speech Lab NTNU 11 Computing statistic over the evidence space The only possibility for lattice-based estimation is simply to expand the lattices into N-Best lists so that the string-to-string comparison and the gathering statistics (done via the Forward- Backward procedure) can be carried out exactly. (Kaiser 2000) While correct, this approach is not feasible for LVCSR tasks where these N-Best lists would have to be extremely deep to contain a significant portion of the most likely hypotheses. We now discuss risk-based MMI variants for parameter estimation in LVCSR.
12
Speech Lab NTNU 12 Efficient computation of risk The key is an efficient lattice-to-string alignment algorithm to find l(W ’,W) for all W ’ in any lattice W. By tracing a path through the lattice and accumulating the Levenshtein alignment costs and weighting them by the arcs likelihood (copied from original lattice), the risk can be computed.
13
Speech Lab NTNU 13 Lattice to string alignment
14
Speech Lab NTNU 14 Risk-based pruning of the evidence space Risk-based lattice segmentation proceeds by segmenting the lattice with respect to the reference string. For the K words of the reference string, we identify K-1 node cut sets. –identify all lattice subpaths that are align to the reference word –the cut set Ni consists of the final lattice nodes of all these subpaths The evidence space is pruned in two steps: –The likelihood of each lattice arc is used to discard all paths through every confusion set so that only likely alternative to the reference word remains. –We simply count all the confusion pairs in the training set lattices, and if any pair has occurred fewer times than a set threshold, that pair is everywhere pruned back to the reference transcription.
15
Speech Lab NTNU 15 Lattice to string alignment
16
Speech Lab NTNU 16 Induced loss function Original motivation to refining the evidence space was to speed up MBR search, but lattice pinching also allows us to refine the string-to- string loss within W. We refer to the corresponding loss as induced loss function. If the initial lattice-to-string alignment was good, will be a good approximation to The induced loss function and the pinched and pruned evidence space can be used to reduce the computational cost of MBRDT in LVCSR.
17
Speech Lab NTNU 17 Pinched lattice minimum bayes risk discriminative training (PLMBRDT)
18
Speech Lab NTNU 18 Pinched lattice minimum bayes risk discriminative training (PLMBRDT) PLMBRDT algorithm: step1: Generate lattices over the training set step2: Align the training set lattices to the reference transcription step3: Segment the lattices step4: Prune the confusion set to confusion pairs step5: Discard infrequently occurring confusion pairs step6: Expand each pinched lattice into N-Best list, keeping step7: Compute step8: For each, compute step9: Perform a Forward-Backward pass for each step10: Perform reestimation equation.
19
Speech Lab NTNU 19 Pinched lattice MMIE for whole word acoustic models (PLMMI)
20
Speech Lab NTNU 20 Pinched lattice MMIE for whole word acoustic models (PLMMI) PLMMI algorithm step1: Generate lattices over the training set step2: Align the training set lattices to the reference transcription step3: Segment the lattices step4: Prune the confusion set to confusion pairs step5: Discard infrequently occurring confusion pairs step6: Tag word hypotheses in confusion pairs step7: Regenerate lattices over the training set using the tagged and pinched lattice to constrain recognition step8: Perform lattice-based MMI using the word boundary times obtained from the lattice Statistics are gathered only for those word instances that appear in confusion set.
21
Speech Lab NTNU 21 One worst pinched lattice MBRDT
22
Speech Lab NTNU 22 One worst pinched lattice MBRDT One Worst PLMBRDT algorithm (corrective training) step1: Generate lattices over the training set step2: Align the training set lattices to the reference transcription step3: Segment the lattices step4: Prune the confusion set to confusion pairs step5: Discard infrequently occurring confusion pairs step6: Extract the most errorful hypothesis W* step7: Perform Forward-Backward passes with respect to and W* step8: Perform MMI below
23
Speech Lab NTNU 23 Small vocabulary ASR performance and analysis
24
Speech Lab NTNU 24 Small vocabulary ASR performance and analysis
25
Speech Lab NTNU 25 Small vocabulary ASR performance and analysis
26
Speech Lab NTNU 26 Small vocabulary ASR performance and analysis
27
Speech Lab NTNU 27 Small vocabulary ASR performance and analysis
28
Speech Lab NTNU 28 MMIE baseline
29
Speech Lab NTNU 29 MBRDT for LVCSR results
30
Speech Lab NTNU 30 MBRDT for LVCSR results
31
Speech Lab NTNU 31 MBRDT for LVCSR results
32
Speech Lab NTNU 32 Conclusion We have demonstrated how techniques developed for MBR decoding make it possible to apply risk-based parameter estimation algorithm to LVCSR. –PLMBRDT, PLMMI, One-worst PLMBRDT Our approach starts with the original derivations of Kaiser et al. which show how the Extended Baum Welch algorithm can be used to derive a parameter estimation procedure to reduce expected loss over training data.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.