Temple University Training Acoustic Models Using SphinxTrain Jaykrishna Shukla, Mubin Amehed, and Cara Santin Department of Electrical and Computer Engineering.

Slides:



Advertisements
Similar presentations
PHONE MODELING AND COMBINING DISCRIMINATIVE TRAINING FOR MANDARIN-ENGLISH BILINGUAL SPEECH RECOGNITION Yanmin Qian, Jia Liu ICASSP2010 Pei-Ning Chen CSIE.
Advertisements

Building an ASR using HTK CS4706
Albert Gatt Corpora and Statistical Methods Lecture 13.
School of Engineering & Technology Computer Architecture Pipeline.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
1 Bayesian Adaptation in HMM Training and Decoding Using a Mixture of Feature Transforms Stavros Tsakalidis and Spyros Matsoukas.
Nonparametric-Bayesian approach for automatic generation of subword units- Initial study Amir Harati Institute for Signal and Information Processing Temple.
High Throughput Computing and Protein Structure Stephen E. Hamby.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
From Main() to the search routine in Sphinx 3 (s3accurate) Arthur Chan July 8, 2004.
Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.
Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh Welcome.
Chapter 5. Operations on Multiple R. V.'s 1 Chapter 5. Operations on Multiple Random Variables 0. Introduction 1. Expected Value of a Function of Random.
Sphinx 3.4 Development Progress Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 4, 2004.
Motivation Parametric models can capture a bounded amount of information from the data. Real data is complex and therefore parametric assumptions is wrong.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Temple University Speech Recognition using Sphinx 4 (Ti Digits test) Jaykrishna shukla,Amir Harati,Mubin Amehed,& cara Santin Department of Electrical.
Gaussian Mixture Model and the EM algorithm in Speech Recognition
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 2011 Predicting Solar Generation from Weather Forecasts Using Machine Learning Navin.
Speech Recognition Application
Speech and Language Processing
Temple University Goals : 1.Down sample 20 khz TIDigits data to 16 khz. 2. Use Down sample data run regression test and Compare results posted in Sphinx-4.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
Jacob Zurasky ECE5526 – Spring 2011
Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Software Release and Support.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Bioinformatics Applications.
Language Model Grammar Conversion Wesley Holland Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering.
CASE STUDY: NEW NAMING CONVENTION IN SHAREPOINT David Schlachter.
Temple University Training Acoustic model using Sphinx Train Jaykrishna shukla,Mubin Amehed& cara Santin Department of Electrical and Computer Engineering.
Weekly presentation Jônatas Macêdo Soares 6/15/2015.
Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of three new variational inference.
Normal text - click to edit HLT tracking in TPC Off-line week Gaute Øvrebekk.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Flat clustering approaches
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
Basic structure of sphinx 4
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
An Enhanced Cellular Automata and Image Pyramid Decomposition Based Algorithm for Image Segmentation : A New Concept Anand Prakash Shukla Suneeta Agarwal.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
Network Training for Continuous Speech Recognition Author: Issac John Alphonso Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
ALPHABET RECOGNITION USING SPHINX-4 BY TUSHAR PATEL.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
Temple University Summer Research Progress: Week 4 – GUI Development George Fava Department of Electrical and Computer Engineering Temple University URL:
Introduction to JavaScript MIS 3502, Spring 2016 Jeremy Shafer Department of MIS Fox School of Business Temple University 2/2/2016.
ECE 8443 – Pattern Recognition EE 8524 – Speech Signal Processing Objectives: Word Graph Generation Lattices Hybrid Systems Resources: ISIP: Search ISIP:
Temple University Summer Research Progress: Week 2 – Extraction of Data George Fava Department of Electrical and Computer Engineering Temple University.
Madhulika Pannuri Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Correlation Dimension.
Introduction to Android Programming. Features of Android.
SUPPORT VECTOR MACHINES
Dr. Hatem Elaydi Fall 2014 Lead Compensator
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Progress Report of Sphinx in Q (Sep 1st to Dec 30th)
Sphinx Recognizer Progress Q2 2004
Loops and Arrays in JavaScript
Network Training for Continuous Speech Recognition
Presentation transcript:

Temple University Training Acoustic Models Using SphinxTrain Jaykrishna Shukla, Mubin Amehed, and Cara Santin Department of Electrical and Computer Engineering Temple University URL:

Temple University: Slide 1 Goals To complete the training process. To Train using the ISIP like setup. Learn the Lexicon file. Understand the steps in generating ci phone models

Temple University: Slide 2 Introduction to Ci Phone models Last week we generated the feature vectors and now we need the models for ci phones. Again the features are needed to distinguish between words and phones. What is ci phone models? Ci phone models are phone models that do not consider the influence of surrounding phonemes on the pronunciation of a given phoneme.

Temple University: Slide 3 Process of generating the Ci phone models. Sphinx Train uses a technique of initialization called flat initialization.. Flat initiation is a simple and effective technique used to initialize an acoustic model. It computes the global mean variance from the training data and sets the model parameters to these values. After running the ci model script the a model def file for the the ci models are created. Similar process is repeated for cd models (cd models are more accurate)

Temple University: Slide 4 Parameter specifications Feature vector: 13 dimensional Gaussian iteration: 10 for ci models 5 for cd models and 8 iteration for the tied models. Gaussian splitting per state can be a multiple of 2 up till 8 Other parameters

Temple University: Slide 5 This weeks accomplishment The following is the model architecture of the ci acoustic models

Temple University: Slide 6 Future plan (today) To use the acoustic model generated with Sphinx 4 decoder and get a WER. To do training using a setup similar to ISIP