Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Similar presentations


Presentation on theme: "1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26."— Presentation transcript:

1 1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26 present by 士弘

2 2 Outline Introduction Minimum Bayes-Risk Classification Framework --Likelihood ratio based hypothesis testing --Maximum a-posteriori probability classification Practical MBR Procedures for ASR --MBR recognition with N-best lists --MBR recognition with lattices -- search under general loss functions --Single stack search under Levenshtein loss function --Prefix tree search under Levenshtein loss function Segmental MBR Procedures --Segmental voting --ROVER (recognizer output voting for error reduction) --e-ROVER Experimental Result Summary

3 3 Introduction Maximum likelihood techniques that underlie the training and decision processes of most current ASR systems are not sensitive to application specific goals. A promising approach towards the construction of speech recognizers that are tuned for specific tasks is known as Minimum bayes-risk automatic speech recognition

4 4 Introduction The MBR framework assumes that a quantitative measure of recognition performance is known and that recognition should be a decision process that attempts to minimize the expected error under this measure. The three components of this decision process are: 1.the given error measures 2.the space of possible decisions 3.a probability distribution that allows the measurement of expected error.

5 5 Minimum Bayes-Risk Classification Framework

6 6

7 7

8 8

9 9 Likelihood ratio based hypothesis testing

10 10 Maximum a-posteriori probability classification

11 11 Practical MBR Procedures for ASR MBR recognizer is difficult to implement for three reason: 1.The evidence and hypothesis spaces tend to be quite large. 2.The problem of large spaces is worsened by the fact that an ASR recognizer often has to process many consecutive utterances. 3.There are efficient DP techniques for MAP recognizer, such methods are not yet available for an MBR recognizer under an arbitrary loss function.

12 12 Practical MBR Procedures for ASR How to implement? –Two implementation: N-best list rescoring procedure Search over a recognition lattice –Segment long acoustic data into sentence or phrase length utterances. –Restrict the evidence and hypothesis spaces to manageable sets of word strings.

13 13 MBR recognition with N-best lists

14 14 MBR recognition with lattices

15 15 MBR recognition with lattices

16 16 MBR recognition with lattices

17 17 MBR recognition with lattices ?

18 18 MBR Recognition with Lattices -2.5 -2.3 -3.5 -1.55-1.6 0.043

19 19 search under general loss functions

20 20 search under general loss functions Two cost functions are required for the search. –The first cost function is associated with each hypothesis, whether partial or complete. Its value is a lower bound on the expected loss that can be obtained by extending the hypothesis through the lattice to completion.

21 21 search under general loss functions –The second cost function in only associated with complete hypotheses. It is an over-estimate of the expected loss of a complete hypothesis –Hypotheses are kept in a stack (priority queue) sorted by cost C, with the smallest cost hypothesis at the top. –At every iteration the hypothesis at the top of the stack is extended.

22 22 search under general loss functions When to terminate? 1.When there is a complete hypothesis at the top, its second cost is computed. If this over-estimate cost is smaller than the under-estimate cost C of the next stack hypothesis. 2.There is no partial hypothesis left in the stack.

23 23 Single stack search under Levenshtein loss function We now present usable cost functions for the Levenshtein distance These costs are not unique, and the efficiency of the search depends on the quality of both the under-estimate and the over-estimate. The Levenshtein loss function is not sensitive to the word time boundaries. Therefore, the word time boundaries would be summed over during the search.

24 24 Single stack search under Levenshtein loss function

25 25 Single stack search under Levenshtein loss function

26 26 Single stack search under Levenshtein loss function

27 27 Single stack search under Levenshtein loss function

28 28 Single stack search under Levenshtein loss function

29 29 Prefix tree search under Levenshtein loss function

30 30 Prefix tree search under Levenshtein loss function

31 31 Prefix tree search under Levenshtein loss function

32 32 Prefix tree search under Levenshtein loss function

33 33 Prefix tree search under Levenshtein loss function The single stack search and the prefix tree search have the disadvantage that the costs of partial hypotheses of different lengths are compared. This is acceptable under the search formulation, but is not a good comparison for use in pruning since it favors short hypotheses.  sub-optimal. How to solve it? –Using multistack implementation that maintains a separate stack for each hypothesis length. –It has been found to have better pruning characteristics in practice.

34 34 Segmental MBR Procedures Segmental MBR (SMBR). Utterance level recognition  sequence of simpler MBR recognition. The lattices or N-best lists are segmented into sets of words. Advantages: –The segmentation can be performed to identify high confidence regions within the evidence space. –Within the regions we can produce reliable word hypotheses. –But SMBR focuses on the low confidence regions.

35 35 Segmental MBR Procedures evidencehypothesis space word sequence Definition reconstruct

36 36 Segmental MBR Procedures Assume that the utterance level loss can be found from the losses over the segment sets as

37 37 Segmental MBR Procedures Therefore, under the assumption of losses over the segment sets, utterance level MBR recognition becomes a sequence of smaller MBR recognition problems. In practice it may be difficult to segment the evidence and hypothesis spaces. Utterance level induced loss function is defined as The overall performance under the desired loss function l should depend on how well l I approximates l

38 38 Segmental Voting Special case of segmental MBR recognition Suppose each evidence and hypothesis segment set contains at most one word. There is a 0/1 loss function on segment sets. The utterance level induced loss for segmental voting

39 39 Segmental Voting We will now describe two versions of segmental MBR recognition used in state-of-the-art ASR systems. Both these procedures attempt to reduce the word error rate (WER) and thus are based on the Levenshtein loss function.

40 40 ROVER Recognizer Output Voting for Error Reduction (ROVER) is an N-best list segmental voting procedure. It combines the hypotheses from multiple independent recognizers under the Levenshtein loss.

41 41 ROVER The word strings of N e are arranged in a word transition network (WTN) that represents an approximate simultaneous alignment of these hypotheses.

42 42 ROVER The utterance level induced loss in ROVER is derived This loss is similar to the Levenshtein distance between strings W and W ’ when their alignment is specified by the WTN.

43 43 e-ROVER Extended-ROVER (e-ROVER) The utterance level loss function of e-ROVER is given as follows. –Start with initial WTN –Merge two consecutive set. –Let the loss function on the expanded set be the Levenshtein distance. –The loss function on correspondence sets that did not expand remains the 0/1 loss.

44 44 e-ROVER

45 45 e-ROVER The utterance level induced loss in e-ROVER is It follows from the definition of Levenshtein distance that

46 46 e-ROVER There are two consequences of joining correspondence sets: –After the joining operation, the loss function on the expanded set is no longer the 0/1 loss but is instead the Levenshtein distance. –The size of the expanded set grows exponentially with the number of joining operations, making Equation below progressively difficult to implement. Therefore, it is important to determine the sets to be joined carefully so as to yield maximum gain in Levenshtein distance approximation with minimum combinations of the correspondence sets.

47 47 Parameter Tuning within the MBR Classification Rule The joint distribution to be used in the MBR recognizers is derived by combining probabilities from acoustic and language models. It is customary in ASR to use two tuning parameters in the computation of joint probability

48 48 Parameter Tuning within the MBR Classification Rule

49 49 Utterance Level MBR Word and Keyword Recognition

50 50 ROVER and e-ROVER for Multilingual ASR

51 51 Summary We have described automatic speech recognition algorithms that attempt to minimize the average misrecognition cost under task specific loss function. These recognizers, although generally more computationally complex than more widely used MAP algorithms, can be efficiently implemented using an N-best list rescoring procedure or as an search over recognition lattices. Segmental MBR is described as a special case of MBR recognition that results from the segmentation of the recognition search space. The segmentation is done with the assumption that the loss function induced is a good approximation to the original, desired loss function.

52 52 Conclusion Hard to implement~!!


Download ppt "1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26."

Similar presentations


Ads by Google