Download presentation
Presentation is loading. Please wait.
1
Detection of Recognition Errors and Out of the Spelling Dictionary Names in a Spelled Name Recognizer for Spanish R. San-Segundo, J. Macías-Guarasa, J. Ferreiros, P. Martín, J.M. Pardo Grupo de Tecnología del Habla. Universidad Politécnica de Madrid. Spain {lapiz, macias, jfl, ppajaro, pardo}@die.upm.es Recognition Experiments (Dictionaries: 1,000 5,000 and 10,000): 2100 utterances in the training set 300 utterances in evaluation set 1: calculating penalties values. 300 utterances in evaluation set 2: developing. 300 utterances in testing set. 6-Round-Robin Training. The results presented are the average of all of them. Confidence annotation experiments (Dictionary: 10,000): 1200 utterances in the training set: for training the Neural Networks. 300 utterances in evaluation. 300 utterances in testing. 6-Round-Robin Training. The results presented are the average of all of them. SYSTEM ARCHITECTURE Previous work presented at ICSLP’00: Description of the Spelling task for Spanish. Recognition of continuously spelled names. Over the Telephone line. Comparison of different recognition architectures: two levels, integrated and hypothesis-verification. Noise models integration in all recognition architectures. In this paper: New adjustments in the Hypothesis- Verification architecture. Confidence measures for detecting recognition errors and out of dictionary names. Neural Networks for combining confidence features in order to obtain a unique confidence measure. SUMMARY New version for the Spanish Spelled Name Recognizer over the telephone. More than 90.0% recognition rate for a 10,000 names dictionary. Features proposals for confidence annotation in Hypothesis-Verification systems. 57.9% 68.3% 57.9% of incorrectly recognized names and 68.3% of names out of the spelling dictionary are detected at 5% false rejection rate. CSD-3 Best feature for detecting Recognition Errors: CSD-3 SR-3 Best feature for detecting Names out of the dictionary: SR-3 To discriminate between Recognition Errors and Out Of Dictionary names is a difficult task. CONFIDENCE ANNOTATION EXPERIMENTAL SETUP CONCLUSIONS The average confusion (in parenthesis) is the average number of pairs from the dictionary that differ only by one letter substitution. M: number of candidates passed from the hypothesis stage to the verification stage. RECOGNITION RESULTS FEATURES Hypothesis Stage From the HMM recognizer (F-1): Best Score (BS-1): acoustic score of the 1 st letter sequence divided by the number of frames. Score Difference (SD-1): acoustic score difference between the 1 st and 2 nd letter sequences divided by the number of frames. From the DP alignment (F-2): Best Cost (BC-2): lowest alignment cost between the N-best letter sequences and the names of the dictionary divided by the length of the 1 st letter sequence. Cost Difference (CD-2): difference between the two best alignment costs divided by the length of the 1 st letter sequence. Cost Mean (CM-2): average cost along the 50 best alignment costs divided by the length of the 1 st letter sequence. Cost Variance (CV-2): cost variance along the 50 best alignment costs divided by the length of the 1 st letter sequence. Verification Stage Candidate Score (CS-3): acoustic score for the best candidate name obtained after the verification stage divided by the number of frames. Candidate Score Difference (CSD-3): acoustic score difference between the two best candidates obtained in the verification stage divided by the number of frames. Candidate Score Mean (CSM-3): average score along the 50 best candidate names divided by the number of frames. Candidate Score Variance (CV-3): score variance along the 50 best candidate names divided by the number of frames. Score Ratio (SR-3): difference between the score of the 1 st letter sequence (hypothesis stage) and the score of the best candidate name (verification stage) divided by the number of frames. RECOGNITION ERRORS DETECTION Speech Analysis HYPOTHESIS STAGE VERIFICATION STAGE ConstrainedGrammar Recognised Name HMM Recogniser N-gramletters Letter-Graph DP Alignment Letter Models Dictionary M-Best Names N-BestLettersSequences Penalty values RASTA-PLP parameterisation. 40 Continuous Letters Models: 30 standard pronunciations, 6 seconds pronunciations and 4 noise models. Penalties Values in the DP alignment have been trained with an evaluation set. Verification Stage: constrained grammar considering possible noise models between letters. OUT OF DICTIONARY DETECTION RECOGNITION ERRORS AND OUT OF DICTIONARY DETECTION Features Correct Detection Rates 2.5% 5.0% Class. Error Baseline: 9.7% Hypothesis Verification F-1 F-2 F-3 BC-2 CD-2 CM-2 CV-2 CS-3 CSD-3 CSM-3 CSV-3 SR-3 F-2 and F-3 7.1% 12.9% 20.0% 26.5% 18.3% 26.0% 22.3% 29.5% 40.5% 54.3% 27.1% 38.2% 30.1% 37.4% 46.7% 57.4% 44.7% 57.9% 9.7% 9.4% 9.2% 8.0% 9.0% 7.6% 7.5% Features Correct Detection Rates 2.5% 5.0% Class. Error Baseline: 21.5% Hypothesis Verification F-1 F-2 F-3 BC-2 CD-2 CM-2 CV-2 CS-3 CSD-3 CSM-3 CSV-3 SR-3 F-2 and F-3 2.9% 5.7% 17.6% 33.4% 3.0% 5.3% 17.5% 34.5% 9.3% 15.5% 3.0% 6.3% 53.0% 66.3% 53.5% 67.9% 56.2% 68.3% 21.5% 17.7% 21.5% 17.7% 21.5% 11.2% 10.9% 10.9% Features Correct Detection Rates 2.5% 5.0% Class. Error Baseline: 29.2% F-2 and F-3 sets 54.8% 65.8%13.1% CRN Confusion Matrix IRN ODN CRN ODNIRN 94.9% (1213) 18.0% (25) 72.4% (279) 49.7% (68) 24.0% (92) 0.9% (13) 3.6% (14) 4.2% (52) 32.3% (44) Correct detection of recognition errors or names out of the dictionary and Minimum classification error Confusion matrix for name classification as Correctly Recognized Name (CRN), Incorrectly Recognized Name (IRN) or Out of Dictionary Name (ODN).
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.