Download presentation
Presentation is loading. Please wait.
Published byMiranda Cleopatra Jackson Modified over 8 years ago
1
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26 present by 士弘
2
2 Outline Introduction Minimum Bayes-Risk Classification Framework --Likelihood ratio based hypothesis testing --Maximum a-posteriori probability classification Practical MBR Procedures for ASR --MBR recognition with N-best lists --MBR recognition with lattices -- search under general loss functions --Single stack search under Levenshtein loss function --Prefix tree search under Levenshtein loss function Segmental MBR Procedures --Segmental voting --ROVER (recognizer output voting for error reduction) --e-ROVER Experimental Result Summary
3
3 Introduction Maximum likelihood techniques that underlie the training and decision processes of most current ASR systems are not sensitive to application specific goals. A promising approach towards the construction of speech recognizers that are tuned for specific tasks is known as Minimum bayes-risk automatic speech recognition
4
4 Introduction The MBR framework assumes that a quantitative measure of recognition performance is known and that recognition should be a decision process that attempts to minimize the expected error under this measure. The three components of this decision process are: 1.the given error measures 2.the space of possible decisions 3.a probability distribution that allows the measurement of expected error.
5
5 Minimum Bayes-Risk Classification Framework
6
6
7
7
8
8
9
9 Likelihood ratio based hypothesis testing
10
10 Maximum a-posteriori probability classification
11
11 Practical MBR Procedures for ASR MBR recognizer is difficult to implement for three reason: 1.The evidence and hypothesis spaces tend to be quite large. 2.The problem of large spaces is worsened by the fact that an ASR recognizer often has to process many consecutive utterances. 3.There are efficient DP techniques for MAP recognizer, such methods are not yet available for an MBR recognizer under an arbitrary loss function.
12
12 Practical MBR Procedures for ASR How to implement? –Two implementation: N-best list rescoring procedure Search over a recognition lattice –Segment long acoustic data into sentence or phrase length utterances. –Restrict the evidence and hypothesis spaces to manageable sets of word strings.
13
13 MBR recognition with N-best lists
14
14 MBR recognition with lattices
15
15 MBR recognition with lattices
16
16 MBR recognition with lattices
17
17 MBR recognition with lattices ?
18
18 MBR Recognition with Lattices -2.5 -2.3 -3.5 -1.55-1.6 0.043
19
19 search under general loss functions
20
20 search under general loss functions Two cost functions are required for the search. –The first cost function is associated with each hypothesis, whether partial or complete. Its value is a lower bound on the expected loss that can be obtained by extending the hypothesis through the lattice to completion.
21
21 search under general loss functions –The second cost function in only associated with complete hypotheses. It is an over-estimate of the expected loss of a complete hypothesis –Hypotheses are kept in a stack (priority queue) sorted by cost C, with the smallest cost hypothesis at the top. –At every iteration the hypothesis at the top of the stack is extended.
22
22 search under general loss functions When to terminate? 1.When there is a complete hypothesis at the top, its second cost is computed. If this over-estimate cost is smaller than the under-estimate cost C of the next stack hypothesis. 2.There is no partial hypothesis left in the stack.
23
23 Single stack search under Levenshtein loss function We now present usable cost functions for the Levenshtein distance These costs are not unique, and the efficiency of the search depends on the quality of both the under-estimate and the over-estimate. The Levenshtein loss function is not sensitive to the word time boundaries. Therefore, the word time boundaries would be summed over during the search.
24
24 Single stack search under Levenshtein loss function
25
25 Single stack search under Levenshtein loss function
26
26 Single stack search under Levenshtein loss function
27
27 Single stack search under Levenshtein loss function
28
28 Single stack search under Levenshtein loss function
29
29 Prefix tree search under Levenshtein loss function
30
30 Prefix tree search under Levenshtein loss function
31
31 Prefix tree search under Levenshtein loss function
32
32 Prefix tree search under Levenshtein loss function
33
33 Prefix tree search under Levenshtein loss function The single stack search and the prefix tree search have the disadvantage that the costs of partial hypotheses of different lengths are compared. This is acceptable under the search formulation, but is not a good comparison for use in pruning since it favors short hypotheses. sub-optimal. How to solve it? –Using multistack implementation that maintains a separate stack for each hypothesis length. –It has been found to have better pruning characteristics in practice.
34
34 Segmental MBR Procedures Segmental MBR (SMBR). Utterance level recognition sequence of simpler MBR recognition. The lattices or N-best lists are segmented into sets of words. Advantages: –The segmentation can be performed to identify high confidence regions within the evidence space. –Within the regions we can produce reliable word hypotheses. –But SMBR focuses on the low confidence regions.
35
35 Segmental MBR Procedures evidencehypothesis space word sequence Definition reconstruct
36
36 Segmental MBR Procedures Assume that the utterance level loss can be found from the losses over the segment sets as
37
37 Segmental MBR Procedures Therefore, under the assumption of losses over the segment sets, utterance level MBR recognition becomes a sequence of smaller MBR recognition problems. In practice it may be difficult to segment the evidence and hypothesis spaces. Utterance level induced loss function is defined as The overall performance under the desired loss function l should depend on how well l I approximates l
38
38 Segmental Voting Special case of segmental MBR recognition Suppose each evidence and hypothesis segment set contains at most one word. There is a 0/1 loss function on segment sets. The utterance level induced loss for segmental voting
39
39 Segmental Voting We will now describe two versions of segmental MBR recognition used in state-of-the-art ASR systems. Both these procedures attempt to reduce the word error rate (WER) and thus are based on the Levenshtein loss function.
40
40 ROVER Recognizer Output Voting for Error Reduction (ROVER) is an N-best list segmental voting procedure. It combines the hypotheses from multiple independent recognizers under the Levenshtein loss.
41
41 ROVER The word strings of N e are arranged in a word transition network (WTN) that represents an approximate simultaneous alignment of these hypotheses.
42
42 ROVER The utterance level induced loss in ROVER is derived This loss is similar to the Levenshtein distance between strings W and W ’ when their alignment is specified by the WTN.
43
43 e-ROVER Extended-ROVER (e-ROVER) The utterance level loss function of e-ROVER is given as follows. –Start with initial WTN –Merge two consecutive set. –Let the loss function on the expanded set be the Levenshtein distance. –The loss function on correspondence sets that did not expand remains the 0/1 loss.
44
44 e-ROVER
45
45 e-ROVER The utterance level induced loss in e-ROVER is It follows from the definition of Levenshtein distance that
46
46 e-ROVER There are two consequences of joining correspondence sets: –After the joining operation, the loss function on the expanded set is no longer the 0/1 loss but is instead the Levenshtein distance. –The size of the expanded set grows exponentially with the number of joining operations, making Equation below progressively difficult to implement. Therefore, it is important to determine the sets to be joined carefully so as to yield maximum gain in Levenshtein distance approximation with minimum combinations of the correspondence sets.
47
47 Parameter Tuning within the MBR Classification Rule The joint distribution to be used in the MBR recognizers is derived by combining probabilities from acoustic and language models. It is customary in ASR to use two tuning parameters in the computation of joint probability
48
48 Parameter Tuning within the MBR Classification Rule
49
49 Utterance Level MBR Word and Keyword Recognition
50
50 ROVER and e-ROVER for Multilingual ASR
51
51 Summary We have described automatic speech recognition algorithms that attempt to minimize the average misrecognition cost under task specific loss function. These recognizers, although generally more computationally complex than more widely used MAP algorithms, can be efficiently implemented using an N-best list rescoring procedure or as an search over recognition lattices. Segmental MBR is described as a special case of MBR recognition that results from the segmentation of the recognition search space. The segmentation is done with the assumption that the loss function induced is a good approximation to the original, desired loss function.
52
52 Conclusion Hard to implement~!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.