VARIABLE PRESELECTION LIST LENGTH ESTIMATION USING NEURAL NETWORKS IN A TELEPHONE SPEECH HYPOTHESIS-VERIFICATION SYSTEM J. Macías-Guarasa, J. Ferreiros,

Slides:

Advertisements

Similar presentations

On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.

Advertisements

Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.

Advances in WP2 Torino Meeting – 9-10 March

Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.

Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.

AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.

ETRW Modelling Pronunciation variation for ASR ESCA Tutorial & Research Workshop Modelling pronunciation variation for ASR INTRODUCING MULTIPLE PRONUNCIATIONS.

Advances in WP2 Nancy Meeting – 6-7 July

Detection of Recognition Errors and Out of the Spelling Dictionary Names in a Spelled Name Recognizer for Spanish R. San-Segundo, J. Macías-Guarasa, J.

INCORPORATING MULTIPLE-HMM ACOUSTIC MODELING IN A MODULAR LARGE VOCABULARY SPEECH RECOGNITION SYSTEM IN TELEPHONE ENVIRONMENT A. Gallardo-Antolín, J. Ferreiros,

Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.

1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.

Advances in WP2 Trento Meeting – January

Text Independent Speaker Recognition with Added Noise Jason Cardillo & Raihan Ali Bashir April 11, 2005.

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM ICSLP’ 98 CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM J. Ferreiros,

Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.

VESTEL database realistic telephone speech corpus:  PRNOK5TR: 5810 utterances in the training set  PERFDV: 2502 utterances in testing set 1 (vocabulary.

MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Distributed Representations of Sentences and Documents

Experimental Evaluation

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Traffic Sign Recognition Using Artificial Neural Network Radi Bekker

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

7-Speech Recognition Speech Recognition Concepts

Cascade Correlation Architecture and Learning Algorithm for Neural Networks.

Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.

1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.

Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10

Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.

Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.

SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,

Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.

Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,

Gap filling of eddy fluxes with artificial neural networks

Applying Neural Networks Michael J. Watts

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.

CMU Robust Vocabulary-Independent Speech Recognition System Hsiao-Wuen Hon and Kai-Fu Lee ICASSP 1991 Presenter: Fang-Hui CHU.

Sampling Methods and Sampling Distributions

Summary  Extractive speech summarization aims to automatically select an indicative set of sentences from a spoken document to concisely represent the.

Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-

Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.

Data Mining and Decision Support

Network Training for Continuous Speech Recognition Author: Issac John Alphonso Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng.

Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.

St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.

Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.

Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.

Mr. Darko Pekar, Speech Morphing Inc.

Applying Neural Networks

Online Multiscale Dynamic Topic Models

Conditional Random Fields for ASR

RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION

Luís Filipe Martinsª, Fernando Netoª,b.

Automatic Speech Recognition: Conditional Random Fields for ASR

network of simple neuron-like computing elements

Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,

Speaker Identification:

Evolutionary Ensembles with Negative Correlation Learning

Network Training for Continuous Speech Recognition

Presentation transcript:

VARIABLE PRESELECTION LIST LENGTH ESTIMATION USING NEURAL NETWORKS IN A TELEPHONE SPEECH HYPOTHESIS-VERIFICATION SYSTEM J. Macías-Guarasa, J. Ferreiros, A. Gallardo, R. San-Segundo, J.M. Pardo and *L. Villarrubia Grupo de Tecnología del Habla. Universidad Politécnica de Madrid. Spain *Grupo de Tecnología del Habla. Telefónica Investigación y Desarrollo. Madrid. Spain Initial experiments carried out on a subset of the VESTEL database telephone speech corpus:  1004 utterances in the training set  215 utterances in testing set 1  215 utterances in testing set 2 Vocabulary composed of words Experimental alternatives:  Different output distribution coding  Different preselection list length estimation methods  Different postprocessing and optimisation methods of this estimation Preprocessing & VQ processes Lexical Access Hypothesis Phonetic String Build-Up HMMsVQ booksDurats. Align. costs Phonetic string. List of Candidate Words Speech Dicts Indexes. SYSTEM ARCHITECTURE Previous work on this topic presented at ICSLP’98: Flexible Large Vocabulary (up to words) Speaker independent Telephone speech Isolated word Two stage - bottom up strategy Variable preselection list length estimated using parametric and non parametric approaches with promising results In this paper: Using neural networks as a novel approach to estimate preselection list length Postprocessing methods of the neural network output for final estimation Encouraging results obtained SUMMARY Empirical relationship between recognition accuracy and some parameters was found (related to word length)  estimation of this relationship could be possible NNs could solve the problem Novel strategy for variable preselection list length estimation Encouraging results  up to 30% reduction in average effort is possible, with IER under 2% (LC-PP-OPT)  34% reduction with IER under 3% (LC-PP)  47-58% reduction with IER under 4% (LC-FX) Preliminary results: Evaluation with the full VESTEL database in progress Biggest problem: availability of data able to correctly train the system Future work:  Develop a hierarchical network structure  Test alternatives in input parameter coding, output coding and network topology  Study relationship between ANN output and recognition confidence USING NEURAL NETWORKS EXPERIMENTAL SETUP CONCLUSIONS AND FUTURE WORK Target: 2% inclusion error rate (IER) + 90% pruning SCHMM: 23+2 automatically clustered context independent phoneme-like units Fixed length preselection lists Evaluation factor: average effort (average preselection list length), provided error rate is kept under 2% Needed fixed lists length to keep 2% IER: 916 (around 10% of dictionary size). See Figure 1 BASELINE SYSTEM NETWORK DESIGN Traditional MLP with one hidden layer  Initial topology: 4 inputs - 7 hidden - 11 outputs  Trained with BP: Enough data is available Input parameters:  Any known-in-advance system parameter (inventory of up to 24)  4 initially selected: Number of frames, phonetic string length, number of phones in first candidate, normalised PSBU log probability  Coding options: Initially, simple scaling Output coding  Each output  different list length segment  Problem: Inhomogeneous number of activations per output  Solution: Train segment length distribution (Table 1 and Figure 2). POST-PROCESSING OF THE NN ESTIMATION The network output is postprocessed to increase robustness Two alternatives:  The winner output neuron decides (WN).  Linear combination of normalised activations (LC): Neuron length (i): Upper limit of neuron i (Table 1) normAct(i): Normalised activation of this neuron Additionally, fixed (-FX) or proportional (-PP) threshold can be added (trained to achieve a certain IER) Suffix -OPT indicates threshold optimisation to get 2% IER  Indication of maximum effort reduction achievable RESULTS Winner method (WN) unable to achieve good results (the decision is too hard) Other approaches reach reasonable performance with significant decreases in average effort. E.g.:  LC reduces up to 90% with IER around 10%: too far from our 2% IER target!  LC-PP-OPT shows maximum possible reductions up to 34%keeping IER < 2%  LC-PP shows reductions of 34% allowing IER < 3%  LC-FX shows reductions of 58% allowing IER < 4% Figure 1:IER versus size of preselection list Table 1:Trained preselection list length limits Table 2: For every method that actually reduced average effort:  Relative reduction in average effort (compared to the fixed list length that achieved 2% IER)  Inclusion rate Figure 2:Number of activations per output neuron in the training set