VESTEL database realistic telephone speech corpus:  PRNOK5TR: 5810 utterances in the training set  PERFDV: 2502 utterances in testing set 1 (vocabulary.

Slides:



Advertisements
Similar presentations
FUNCTION FITTING Student’s name: Ruba Eyal Salman Supervisor:
Advertisements

Polynomial Curve Fitting BITS C464/BITS F464 Navneet Goyal Department of Computer Science, BITS-Pilani, Pilani Campus, India.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.
Advances in WP2 Torino Meeting – 9-10 March
Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.
Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva CALD Masters Presentation 19 August 2002 Advisors: Alan Montgomery,
ETRW Modelling Pronunciation variation for ASR ESCA Tutorial & Research Workshop Modelling pronunciation variation for ASR INTRODUCING MULTIPLE PRONUNCIATIONS.
Advances in WP2 Nancy Meeting – 6-7 July
Detection of Recognition Errors and Out of the Spelling Dictionary Names in a Spelled Name Recognizer for Spanish R. San-Segundo, J. Macías-Guarasa, J.
INCORPORATING MULTIPLE-HMM ACOUSTIC MODELING IN A MODULAR LARGE VOCABULARY SPEECH RECOGNITION SYSTEM IN TELEPHONE ENVIRONMENT A. Gallardo-Antolín, J. Ferreiros,
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Speaker Adaptation for Vowel Classification
CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM ICSLP’ 98 CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM J. Ferreiros,
VARIABLE PRESELECTION LIST LENGTH ESTIMATION USING NEURAL NETWORKS IN A TELEPHONE SPEECH HYPOTHESIS-VERIFICATION SYSTEM J. Macías-Guarasa, J. Ferreiros,
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Optimal Adaptation for Statistical Classifiers Xiao Li.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Autoencoders Mostafa Heidarpour
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Multilayer feed-forward artificial neural networks for Class-modeling F. Marini, A. Magrì, R. Bucci Dept. of Chemistry - University of Rome “La Sapienza”
Radial-Basis Function Networks
Radial Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
7-Speech Recognition Speech Recognition Concepts
1 / 41 Inference and Computation with Population Codes 13 November 2012 Inference and Computation with Population Codes Alexandre Pouget, Peter Dayan,
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Application of artificial neural network in materials research Wei SHA Professor of Materials Science
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Applying Neural Networks Michael J. Watts
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
CS621 : Artificial Intelligence
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
1 LING 696B: Final thoughts on nonparametric methods, Overview of speech processing.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.
LOAD FORECASTING. - ELECTRICAL LOAD FORECASTING IS THE ESTIMATION FOR FUTURE LOAD BY AN INDUSTRY OR UTILITY COMPANY - IT HAS MANY APPLICATIONS INCLUDING.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
Deep Feedforward Networks
Mr. Darko Pekar, Speech Morphing Inc.
Feature Mapping FOR SPEAKER Diarization IN NOisy conditions
Final Year Project Presentation --- Magic Paint Face
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Automatic Speech Recognition: Conditional Random Fields for ASR
Speaker Identification:
Presentation transcript:

VESTEL database realistic telephone speech corpus:  PRNOK5TR: 5810 utterances in the training set  PERFDV: 2502 utterances in testing set 1 (vocabulary dependent)  PEIV1000: 1434 utterances in testing set 2 (vocabulary independent) Vocabulary composed of words Experimental alternatives:  Different output distribution coding  Different preselection list length estimation methods  Different postprocessing and optimisation methods of this estimation Previous work on this topic EUROSPEECH’99: Flexible Large Vocabulary (up to words) Speaker independent Isolated word Telephone speech Two stage - bottom up strategy Using neural networks as a novel approach to estimate preselection list length Postprocessing methods of the neural network output for final estimation In this paper: Parameter inventory increased New validation scheme Extensive testing with improved results SUMMARY NNs as a suitable strategy for variable preselection list length estimation Improvements (both in IER and Average effort)  Typical: 7-8% for PERFDV, 18-28% for PRNOK and PEIV1000  Maximum: 10% for PERFDV, 32% for PRNOK and PEIV1000 Future work:  Bigger databases :-)  Further exploiting the parameter inventory  Extensibility to other architectures  What happens if single neuron output is used?  Application to word confidence estimation tasks EXPERIMENTAL SETUPCONCLUSIONS AND FUTURE WORK IMPROVED VARIABLE PRESELECTION LIST LENGTH ESTIMATION USING NNs IN A LARGE VOCABULARY TELEPHONE SPEECH RECOGNITION SYSTEM J. Macías-Guarasa, J. Ferreiros, J. Colás, A. Gallardo-Antolín and J.M. Pardo Grupo de Tecnología del Habla. Universidad Politécnica de Madrid. Spain SYSTEM ARCHITECTURE Preprocessing & VQ processes Lexical Access Hypothesis Generator Phonetic String Build-Up HMMs VQ books Durations Alignment costs Phonetic string Speech Dictionary Indexes Verification Module Detailed Matching Preselection list List Length Estimator List Length Text MOTIVATION Verification module is computationally expensive  Idea: reduce preselection list length  Difficult, specially if low acoustic detail  Estimate a different PLL for every word  Methods: parametric, non-parametric,... To think about:  Computational demands do not depend linearly on PL  Final savings must take into account both modules  Only average estimations are possible List Length Estimator Input parameters listLength(0) listLength(1) listLength(2) listLength(4) listLength(3) listLength(5) For example (vocabulary of 600 words) Target: 2% inclusion error rate (IER) + 90% pruning SCHMM: 23+2 automatically clustered context independent phoneme-like units Fixed length preselection lists BASELINE SYSTEM NN BASED PLL Traditional MLP with one hidden layer  Final topology: 8 inputs - 5 hidden - 10 outputs  Trained with BP: Enough data is available Input parameters:  Direct parameters  Derived parameters: Direct normalized  Lexical Access Statistical Parameters: Calculated over the lexical access costs distribution  Input coding: maxmin, normalization, w/o clipping, single and multiple neurons, linear/nonlinear mapping, etc. Output coding  Each output  different list length segment  Problem: Inhomogeneous number of activations per output  Solution: Train segment length distribution (Table 1 and Figure 2). POST-PROCESSING OF THE NN ESTIMATION The network output is postprocessed to increase robustness Two alternatives:  The winner output neuron decides (WINNER).  Linear combination of normalised activations (SUMMA): Neuron length (i): Upper limit of neuron i (Table 1) normAct(i): Normalised activation of this neuron Additionally, fixed (-FX) or proportional (-PP) threshold can be added (trained to achieve a certain IER) NN DESIGN PARAMETER SELECTION Selected according to results in discrimination task (1 st position vs. the rest) Best absolute results: multiple input neuron, nonlinear mapping No significative differences with single input neuron 8 final parameters selected MLP:  8 inputs - 5 hidden - 10 outputs  Standard input coding normalization  Nonlinear output coding Additional control parameters:  Segment length assigned to last output neuron (0, 2500, 5000, 10000)  G (during training) B (during test)  Objective inclusion rate in threshold estimation process (98%, 98.5%, 99% and 99.5%)  T FINAL NN BASED SYSTEM How to compare the systems?  NN system gets a single point in the (averageEffort x inclusionRate) space  Fixed length system generates a full inclusion rate graphic Alternatives  Use the single point  If improvement in both axis: OK  If not: ???  Sensitivity? Spuriousness? Extend the analysis:  Use the estimated thresholds to build an artificial inclusion rate histogram, around the area of interest (96.5% - 99%)  Compare each point in this range with fixed list length inclusion rate curve  Combine comparisons in both axes:  Inclusion rate improvement to get the same average effort  Average effort reduction to get the same inclusion rate EVALUATION STRATEGY BEST EXPERIMENT WINNER METHOD:  Lack of precision in discrimination SUMMA METHOD:  Good results!  SUMMA plus FX improve fixed length system in almost all cases  High values for last neuron length are needed (>5000)  Relative improvements (for 10 best experiments selected looking at training set results): RESULTS (I) ICSLP’2000 Beijing (China) Quantitative improvements:  Typical: 7-8% for PERFDV, 18-28% for PRNOK and PEIV1000  Maximum: 10% for PERFDV, 32% for PRNOK and PEIV1000 Statistical confidence:  We do not have enough data to absolutely prove our results are statistically relevant  All bands overlap:  We have a small database  Our inclusion rates are very high But: We prove improvements in a wide range of values RESULTS (II)