Download presentation
Presentation is loading. Please wait.
1
VARIABLE PRESELECTION LIST LENGTH ESTIMATION USING NEURAL NETWORKS IN A TELEPHONE SPEECH HYPOTHESIS-VERIFICATION SYSTEM J. Macías-Guarasa, J. Ferreiros, A. Gallardo, R. San-Segundo, J.M. Pardo and *L. Villarrubia Grupo de Tecnología del Habla. Universidad Politécnica de Madrid. Spain *Grupo de Tecnología del Habla. Telefónica Investigación y Desarrollo. Madrid. Spain Initial experiments carried out on a subset of the VESTEL database telephone speech corpus: 1004 utterances in the training set 215 utterances in testing set 1 215 utterances in testing set 2 Vocabulary composed of 10000 words Experimental alternatives: Different output distribution coding Different preselection list length estimation methods Different postprocessing and optimisation methods of this estimation Preprocessing & VQ processes Lexical Access Hypothesis Phonetic String Build-Up HMMsVQ booksDurats. Align. costs Phonetic string. List of Candidate Words Speech Dicts Indexes. SYSTEM ARCHITECTURE Previous work on this topic presented at ICSLP’98: Flexible Large Vocabulary (up to 10000 words) Speaker independent Telephone speech Isolated word Two stage - bottom up strategy Variable preselection list length estimated using parametric and non parametric approaches with promising results In this paper: Using neural networks as a novel approach to estimate preselection list length Postprocessing methods of the neural network output for final estimation Encouraging results obtained SUMMARY Empirical relationship between recognition accuracy and some parameters was found (related to word length) estimation of this relationship could be possible NNs could solve the problem Novel strategy for variable preselection list length estimation Encouraging results up to 30% reduction in average effort is possible, with IER under 2% (LC-PP-OPT) 34% reduction with IER under 3% (LC-PP) 47-58% reduction with IER under 4% (LC-FX) Preliminary results: Evaluation with the full VESTEL database in progress Biggest problem: availability of data able to correctly train the system Future work: Develop a hierarchical network structure Test alternatives in input parameter coding, output coding and network topology Study relationship between ANN output and recognition confidence USING NEURAL NETWORKS EXPERIMENTAL SETUP CONCLUSIONS AND FUTURE WORK Target: 2% inclusion error rate (IER) + 90% pruning SCHMM: 23+2 automatically clustered context independent phoneme-like units Fixed length preselection lists Evaluation factor: average effort (average preselection list length), provided error rate is kept under 2% Needed fixed lists length to keep 2% IER: 916 (around 10% of dictionary size). See Figure 1 BASELINE SYSTEM NETWORK DESIGN Traditional MLP with one hidden layer Initial topology: 4 inputs - 7 hidden - 11 outputs Trained with BP: Enough data is available Input parameters: Any known-in-advance system parameter (inventory of up to 24) 4 initially selected: Number of frames, phonetic string length, number of phones in first candidate, normalised PSBU log probability Coding options: Initially, simple scaling Output coding Each output different list length segment Problem: Inhomogeneous number of activations per output Solution: Train segment length distribution (Table 1 and Figure 2). POST-PROCESSING OF THE NN ESTIMATION The network output is postprocessed to increase robustness Two alternatives: The winner output neuron decides (WN). Linear combination of normalised activations (LC): Neuron length (i): Upper limit of neuron i (Table 1) normAct(i): Normalised activation of this neuron Additionally, fixed (-FX) or proportional (-PP) threshold can be added (trained to achieve a certain IER) Suffix -OPT indicates threshold optimisation to get 2% IER Indication of maximum effort reduction achievable RESULTS Winner method (WN) unable to achieve good results (the decision is too hard) Other approaches reach reasonable performance with significant decreases in average effort. E.g.: LC reduces up to 90% with IER around 10%: too far from our 2% IER target! LC-PP-OPT shows maximum possible reductions up to 34%keeping IER < 2% LC-PP shows reductions of 34% allowing IER < 3% LC-FX shows reductions of 58% allowing IER < 4% Figure 1:IER versus size of preselection list Table 1:Trained preselection list length limits Table 2: For every method that actually reduced average effort: Relative reduction in average effort (compared to the fixed list length that achieved 2% IER) Inclusion rate Figure 2:Number of activations per output neuron in the training set
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.