Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.

Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier Ferreiros, Rubén San-Segundo, Juan M. Montero and José M. Pardo Grupo de Tecnología del Habla Departamento de Ingeniería Electrónica E.T.S.I. Telecomunicación Universidad Politécnica de Madrid macias@die.upm.es

Overview l Introduction l System architecture l Motivation l Databases & Dictionaries l Experimental results l Conclusions and future work

Abstract l In LVSRS: classify utterances as being correctly or incorrectly recognized is of major interest l Preliminary study on: –Word-level confidence estimation –Multiple features: Acoustical and lexical decoders –Neural Network based scheme

Introduction (I) l ASR Systems rank the output hipothesis according to scores l Confidence on proposed decoding is not a direct byproduct of the process l Lot of work in recent years: –Acoustic and linguistic features –Single or multiple set of parameters –Direct estimation, LDA, NNs, etc.

Introduction (and II) l Traditionally –Acoustic features alone show poor results (likelihoods not comparable across utterances) –Literature centered in description of methods to convert the HMM decoded probabilities into useful confidence measures: l likelihoods (normalized versions) l LM probabilities l n-best decoding lists

System Architecture Intermediate Unit Generation Lexical Access Verification Module HypothesisVerification Rough AnalysisDetailed l Hypothesis-verification strategy l We work in the Hypothesis Module

Detailed Architecture Preprocessing & VQ processes Lexical Access Hypothesis Phonetic String Build-Up HMMsVQ booksDurats. Align. costs Phonetic string. List of Candidate Words Speech Dicts Indexes.

Motivation l Studies on variable preselection list length estimation systems –# of words to pass to the verification stage l Direct correlation with confidence estimation: –If proposed list length is small  high confidence l Initial application to hypothesis module only

Databases & Dictionaries l Part of the VESTEL database l Training –5820 utterances. 3011 Speakers. l Testing –2536 utters. (vocabulary dependent). 2255 spks –1434 utters. (vocabulary independent). 1351 spks l Dictionaries: 10000 (VD&I) words

Baseline experiment l Directly using the features (normalized to range 0..1 ) l Baseline features: –Acoustic log-likelihood (and normalized versions) –Lexical access cost for the 1 st candidate –Standard deviation of lexical access costs l Not very good results –Best one with Std Deviation

Baseline distributions  Std Deviation LA Acoustic likelihood (normalized) 

Baseline distributions

Neural Network estimator l Used successfully in preselection list length estimation l Able to combine parameters w/o effort l 3-layer MLP l Wide range of topology alternatives, coding schemes and features: –Direct parameters –Normalized –Lexical Access costs distribution

NN based experiments l Maximum correct classification rates: 70-75% for the three datasets (reasonable, taking into account the preselection rates achieved: 46.95%, 30.14% and 42.47%) l Best single feature: Standard deviation of the lexical access cost measured over the list of the first 10 candidates (0.1% of the dictionary size) l Final system uses 8 parameters (lexical and acoustical-based)

Final distributions  Not using NN Using NN 

Final distributions  Using NN Not using NN 

Additional results l EER: –30% for PERFDV –25% for PEIV1000 and PRNOK5TR –Optimum threshold very close to the scale midpoint l Correct rejection rates for given False rejection:

Conclusions l Introduced word-level confidence estimation system based on NNs and a combination of lexical and acoustical features l NN showed to improve results obtained using the features directly l Best parameter is lexical-based and consistent with acoustical-versions reported in the literature (standard deviation is similar to likelihood ratios and n-best related features)

Future work l Extend the comparison of the NN vs non- NN system to all feature set l Extend the work to the verification module (experiments already carried out shows good results) l Extend the approach to CRS (phrase level confidence)

ROC Curves (NN vs. non NN)

Any questions?

Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.

Similar presentations

Presentation on theme: "Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.

Similar presentations

Presentation on theme: "Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier."— Presentation transcript:

Similar presentations

About project

Feedback