Presentation is loading. Please wait.

Presentation is loading. Please wait.

Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.

Similar presentations


Presentation on theme: "Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier."— Presentation transcript:

1 Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier Ferreiros, Rubén San-Segundo, Juan M. Montero and José M. Pardo Grupo de Tecnología del Habla Departamento de Ingeniería Electrónica E.T.S.I. Telecomunicación Universidad Politécnica de Madrid macias@die.upm.es

2 Overview l Introduction l System architecture l Motivation l Databases & Dictionaries l Experimental results l Conclusions and future work

3 Abstract l In LVSRS: classify utterances as being correctly or incorrectly recognized is of major interest l Preliminary study on: –Word-level confidence estimation –Multiple features: Acoustical and lexical decoders –Neural Network based scheme

4 Introduction (I) l ASR Systems rank the output hipothesis according to scores l Confidence on proposed decoding is not a direct byproduct of the process l Lot of work in recent years: –Acoustic and linguistic features –Single or multiple set of parameters –Direct estimation, LDA, NNs, etc.

5 Introduction (and II) l Traditionally –Acoustic features alone show poor results (likelihoods not comparable across utterances) –Literature centered in description of methods to convert the HMM decoded probabilities into useful confidence measures: l likelihoods (normalized versions) l LM probabilities l n-best decoding lists

6 System Architecture Intermediate Unit Generation Lexical Access Verification Module HypothesisVerification Rough AnalysisDetailed l Hypothesis-verification strategy l We work in the Hypothesis Module

7 Detailed Architecture Preprocessing & VQ processes Lexical Access Hypothesis Phonetic String Build-Up HMMsVQ booksDurats. Align. costs Phonetic string. List of Candidate Words Speech Dicts Indexes.

8 Motivation l Studies on variable preselection list length estimation systems –# of words to pass to the verification stage l Direct correlation with confidence estimation: –If proposed list length is small  high confidence l Initial application to hypothesis module only

9 Databases & Dictionaries l Part of the VESTEL database l Training –5820 utterances. 3011 Speakers. l Testing –2536 utters. (vocabulary dependent). 2255 spks –1434 utters. (vocabulary independent). 1351 spks l Dictionaries: 10000 (VD&I) words

10 Baseline experiment l Directly using the features (normalized to range 0..1 ) l Baseline features: –Acoustic log-likelihood (and normalized versions) –Lexical access cost for the 1 st candidate –Standard deviation of lexical access costs l Not very good results –Best one with Std Deviation

11 Baseline distributions  Std Deviation LA Acoustic likelihood (normalized) 

12 Baseline distributions

13 Neural Network estimator l Used successfully in preselection list length estimation l Able to combine parameters w/o effort l 3-layer MLP l Wide range of topology alternatives, coding schemes and features: –Direct parameters –Normalized –Lexical Access costs distribution

14 NN based experiments l Maximum correct classification rates: 70-75% for the three datasets (reasonable, taking into account the preselection rates achieved: 46.95%, 30.14% and 42.47%) l Best single feature: Standard deviation of the lexical access cost measured over the list of the first 10 candidates (0.1% of the dictionary size) l Final system uses 8 parameters (lexical and acoustical-based)

15 Final distributions  Not using NN Using NN 

16 Final distributions  Using NN Not using NN 

17 Additional results l EER: –30% for PERFDV –25% for PEIV1000 and PRNOK5TR –Optimum threshold very close to the scale midpoint l Correct rejection rates for given False rejection:

18 Conclusions l Introduced word-level confidence estimation system based on NNs and a combination of lexical and acoustical features l NN showed to improve results obtained using the features directly l Best parameter is lexical-based and consistent with acoustical-versions reported in the literature (standard deviation is similar to likelihood ratios and n-best related features)

19 Future work l Extend the comparison of the NN vs non- NN system to all feature set l Extend the work to the verification module (experiments already carried out shows good results) l Extend the approach to CRS (phrase level confidence)

20 ROC Curves (NN vs. non NN)

21 Any questions?


Download ppt "Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier."

Similar presentations


Ads by Google