Download presentation
Presentation is loading. Please wait.
1
Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier Ferreiros, Rubén San-Segundo, Juan M. Montero and José M. Pardo Grupo de Tecnología del Habla Departamento de Ingeniería Electrónica E.T.S.I. Telecomunicación Universidad Politécnica de Madrid macias@die.upm.es
2
Overview l Introduction l System architecture l Motivation l Databases & Dictionaries l Experimental results l Conclusions and future work
3
Abstract l In LVSRS: classify utterances as being correctly or incorrectly recognized is of major interest l Preliminary study on: –Word-level confidence estimation –Multiple features: Acoustical and lexical decoders –Neural Network based scheme
4
Introduction (I) l ASR Systems rank the output hipothesis according to scores l Confidence on proposed decoding is not a direct byproduct of the process l Lot of work in recent years: –Acoustic and linguistic features –Single or multiple set of parameters –Direct estimation, LDA, NNs, etc.
5
Introduction (and II) l Traditionally –Acoustic features alone show poor results (likelihoods not comparable across utterances) –Literature centered in description of methods to convert the HMM decoded probabilities into useful confidence measures: l likelihoods (normalized versions) l LM probabilities l n-best decoding lists
6
System Architecture Intermediate Unit Generation Lexical Access Verification Module HypothesisVerification Rough AnalysisDetailed l Hypothesis-verification strategy l We work in the Hypothesis Module
7
Detailed Architecture Preprocessing & VQ processes Lexical Access Hypothesis Phonetic String Build-Up HMMsVQ booksDurats. Align. costs Phonetic string. List of Candidate Words Speech Dicts Indexes.
8
Motivation l Studies on variable preselection list length estimation systems –# of words to pass to the verification stage l Direct correlation with confidence estimation: –If proposed list length is small high confidence l Initial application to hypothesis module only
9
Databases & Dictionaries l Part of the VESTEL database l Training –5820 utterances. 3011 Speakers. l Testing –2536 utters. (vocabulary dependent). 2255 spks –1434 utters. (vocabulary independent). 1351 spks l Dictionaries: 10000 (VD&I) words
10
Baseline experiment l Directly using the features (normalized to range 0..1 ) l Baseline features: –Acoustic log-likelihood (and normalized versions) –Lexical access cost for the 1 st candidate –Standard deviation of lexical access costs l Not very good results –Best one with Std Deviation
11
Baseline distributions Std Deviation LA Acoustic likelihood (normalized)
12
Baseline distributions
13
Neural Network estimator l Used successfully in preselection list length estimation l Able to combine parameters w/o effort l 3-layer MLP l Wide range of topology alternatives, coding schemes and features: –Direct parameters –Normalized –Lexical Access costs distribution
14
NN based experiments l Maximum correct classification rates: 70-75% for the three datasets (reasonable, taking into account the preselection rates achieved: 46.95%, 30.14% and 42.47%) l Best single feature: Standard deviation of the lexical access cost measured over the list of the first 10 candidates (0.1% of the dictionary size) l Final system uses 8 parameters (lexical and acoustical-based)
15
Final distributions Not using NN Using NN
16
Final distributions Using NN Not using NN
17
Additional results l EER: –30% for PERFDV –25% for PEIV1000 and PRNOK5TR –Optimum threshold very close to the scale midpoint l Correct rejection rates for given False rejection:
18
Conclusions l Introduced word-level confidence estimation system based on NNs and a combination of lexical and acoustical features l NN showed to improve results obtained using the features directly l Best parameter is lexical-based and consistent with acoustical-versions reported in the literature (standard deviation is similar to likelihood ratios and n-best related features)
19
Future work l Extend the comparison of the NN vs non- NN system to all feature set l Extend the work to the verification module (experiments already carried out shows good results) l Extend the approach to CRS (phrase level confidence)
20
ROC Curves (NN vs. non NN)
21
Any questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.