Results of Tagalog vowel Speech recognition using Continuous HMM Arnel C. Fajardo Ph. D student (Under the supervision of Professor Yoon-Joong Kim)

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
K.Marasek Multimedia Department HTK Tutorial Prepared using HTKBook.
Building an ASR using HTK CS4706
數位語音處理概論 HW#2-1 HMM Training and Testing
INSTRUCTOR:Dr.Veton Kepuska STUDENT:Dileep Narayan.Koneru YES/NO RECOGNITION SYSTEM.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003.
CMSC Assignment 1 Audio signal processing
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
語音辨認概論 A Tutorial Example of Using HTK 96/10/18 老師 : 廖元甫 演講者 : 蔡明峰.
Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.
© 2001 CUHK Recognition Software Building Block Overview 1. Task specification — What is the function of the system ? 2. Recognizer Design — Vocabulary.
Conditional Random Fields   A form of discriminative modelling   Has been used successfully in various domains such as part of speech tagging and other.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
© 2007 Cisco Systems, Inc. All rights reserved.UCCXD v2.0—10-1 Configuring CME for CRS 5.0 & ASR Grammar.
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Study of Word-Level Accent Classification and Gender Factors
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
Speech and Language Processing
Temple University Goals : 1.Down sample 20 khz TIDigits data to 16 khz. 2. Use Down sample data run regression test and Compare results posted in Sphinx-4.
7-Speech Recognition Speech Recognition Concepts
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
Modeling speech signals and recognizing a speaker.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Arnel Fajardo, student (“Hak Seng”)
Presentation by Daniel Whiteley AME department
Launch SpecE8 and React from GSS. You can use the chemical analyses in a GSS data sheet to set up and run SpecE8 and React calculations. Analysis → Launch…
Introduction to Engineering MATLAB – 6 Script Files - 1 Agenda Script files.
Matlab Basics Tutorial. Vectors Let's start off by creating something simple, like a vector. Enter each element of the vector (separated by a space) between.
DSP homework 1 HMM Training and Testing
Speaker Diarisation and Large Vocabulary Recognition at CSTR: The AMI/AMIDA System Fergus McInnes 7 December 2011 History – AMI, AMIDA and recent developments.
Temple University Training Acoustic model using Sphinx Train Jaykrishna shukla,Mubin Amehed& cara Santin Department of Electrical and Computer Engineering.
ENG College of Engineering Engineering Education Innovation Center 1 Array Accessing and Strings in MATLAB Topics Covered: 1.Array addressing. 2.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
ISIP: Research Presentation Seungchan Lee Feb Page 0 of 36 Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department.
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
Performance Comparison of Speaker and Emotion Recognition
Feature Extraction Find best Alignment between primitives and data Found Alignment? TUH EEG Corpus Supervised Learning Process Reestimate Parameters Recall.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Speech Processing Using HTK Trevor Bowden 12/08/2008.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
長庚 多媒體信號處理 實驗室 1 HTK tutorial Speaker: ricer Date:
LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Random walk initialization for training very deep feedforward networks
Speech Processing Speech Recognition
CRANDEM: Conditional Random Fields for ASR
Statistical Models for Automatic Speech Recognition
Chinese ASR in L2 Fluency
Lab 3: Isolated Word Recognition
PROJ2: Building an ASR System
CONTEXT DEPENDENT CLASSIFICATION
Presentation by Daniel Whiteley AME department
專題進度報告 資工四 B 洪志豪 資工四 B 林宜鴻.
Network Training for Continuous Speech Recognition
ME 123 Computer Applications I Lecture 5: Input and Output 3/17/03
Presentation transcript:

Results of Tagalog vowel Speech recognition using Continuous HMM Arnel C. Fajardo Ph. D student (Under the supervision of Professor Yoon-Joong Kim)

Basic structure of HTK

1.Data Preparation Speech Data for Training and Testing -Data ( wave file) 625 wave file 25 sets 5 sets per speaker -Training Data 5 sets per speaker 25 sets -Test Data Speaker Dependent Test 1 :5 sets Test 2 :10 sets Feature of Speech Data: *.wav -16Khz, 16 bit, linear PCM a1001.wav=>”a” e1001.wav=>”e” i1001.wav=>”I” o1001.wav=>”o” u1001.wav=>”u” …… Variables: 2 test : 5 speakers ( 1 set each) 10 speakers ( 1 set each) Hmmdefs m5 m6

Compute Feature Vectors Use HCopy -C configs\HCopy.config -S scripts\HCopy.scp Hcopy.exe – Compute the features from wave file and save the features on the same folder. – MFCC was used -C configs\HCopy.config Configuration file to compute features -S scripts\HCopy.scp Script file of a list Wave file and feature file

HCopy Number of Inputs files: 3 Waveform files - *.wav Configuration file – Hcopy.config Script file - Hcopy.scp Number of output file: 1 MFCC file - *.mfc Create Hcopy.config in ….Configs/Hcopy.config Write: # Coding parameters SOURCEKIND = WAVEFORM SOURCEFORMAT = NIST SOURCERATE = 625 TARGETKIND = MFCC_0 TARGETRATE = SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F

Script file: Scripts/Hcopy.scp

Prepare the Master label file Master label File- Word level transcriptions mlfs/words.mlf

ModelList -modelList/wordList Hmm model name list

Generate initial master macro file or Hmmdefs HCompV -C configs\config -f m -S scripts\train.scp -M wordHmms\m0\ wordHmms\proto HCompV.exe number of Inputs: 3 Input 1 - -C configs/config //parameters for computing feature -f 0.01 //the variance floor macro (called vFloors) will be // computed with value 0.01 times the global variance -m //the mean and the variance will be computed Input 2- -S scripts/train.scp //mfc feature vector list to be used in training Input 3- WordHmms/proto//the handwritten hmm prototype Number of output: 1 -M WordHmms/m0 // directory for the result //vfloors : variance floor macro Output 1 //proto : hmm prototype with valued GMM //hmmdefs : will be written manually with proto

Input 1 Configs/config script/Hcopy.config => configs/config

wordHmms/mo/vfloor global constant values for computing bj(ot) shown below

Input 2 Scripts/train.scp

Input 3 General Hmm model(prototype) for mono phone speech Word Hmms/proto It has 3 states Note: NumStates has 5 states since state 1 and 5 correspond to sil

wordHmms/proto + global means and variances => wordHmms/m0/proto Shows the result of the command HCompV for wordHmms/m0/proto

Input 3 - wordHmms/mo/hmmdefs -Master Macro file (MMF)

Step 2.Training HERest -C configs\config -I mlfs\words.mlf -S scripts\train.scp -H wordHmms\m0\hmmdefs -M wordHmms\m1 modelList\wordList HERest Number of inputs: 5 -C configs/config //parameters for feature -I mlfs/words.mlf //master label file, word, speech file modellist/wordList //word name list(hmm list) -S scripts/train.scp //mfc file list for training -H wordHmms/m0/hmmdefs //hmmdefs (a set of hmm prototypes) for all words Number of output: 1 -M wordhmms/m1 // re-estimated hmmdefs

Input 1 Configs/config Configuration for wordhmms/m1

Input 3 modelList/wordList Input 2 mlfs/words.mlf Input 4 Scripts/train.scp

Input 5 wordHmms/mo/hmmdefs (MMF)

Output 1 HERest -C configs\config -I mlfs\words.mlf -S scripts\train.scp -H wordHmms\m0\hmmdefs -M wordHmms\m1 modelList\wordList Result: wordHmms/m1/hmmdefs

Reestimate hmmdefs : HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp –H wordHmms/m1/hmmdefs –M wordHmms/m2 modelList/wordList HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp –H wordHmms/m2/hmmdefs –M wordHmms/m3 modelList/wordList HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp –H wordHmms/m3/hmmdefs –M wordHmms/m4 modelList/wordList HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp –H wordHmms/m4/hmmdefs –M wordHmms/m5 modelList/wordList

Step 3.Recognition Test HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs –w dic/tag_Net –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList HVite Number of Inputs = 5 –C configs/config //parameters for mfc modelList/wordList // hmm name list -S scripts/test.scp // mfc vector list for testing –w dic/tag_Net //word network for recognition Dic/dict //pronouncing dictionary –H wordHmms/m5/hmmdefs //a set of hmms Number of output = 1 –i mlfs/recOutWordm5.mlf // result of recognition

dic/dict - Writing a pronouncing dictionary Word [outsym] models –Word : word to be recognized –[outsym] : string to output when word is recognized –models : hmm model list

BNF Grammar rule $ :variable {} : zero or more repitions <>:one or more repitions [] : optional (sil $words sil) $words= a | e | i | o | u; (sil $words sil)

HParse –C configs/config dic/tag_v_Gram dic/tag_Net (dic/tag_v_Gram)

HParse –C configs/config dic/korGram dic/tag_Net Results of HParse to tag_v_Gram: dic/tag_Net configs/config

HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs –w dic/tag_Net –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList config/config scripts/test.scp modellist/wordList

HVite –C configs/config -S scripts/test.scp -H wordHmms/m5/hmmdefs –w dic/tag_Net –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList mlfs/recOutWordm5.mlf

Step 4.Recognition results. HResults –I mlfs/words.mlf modelList/wordList mlfs/recOutWordm5.mlf First test : 5 sets ( each set represents 1 speaker) = > 5 speakers

Step 4.Recognition results. HResults –I mlfs/words.mlf modelList/wordList mlfs/recOutWordm5.mlf Second test : 10 sets ( each set represents 1 speaker) = > 10 speakers

Comparison of m5 and m6 ( hmmdefs) ( slight difference) HERest -C configs\config -I mlfs\words.mlf -S scripts\train.scp -H wordHmms\m4\hmmdefs -M wordHmms\m5 modelList\wordList HERest -C configs\config -I mlfs\words.mlf -S scripts\train.scp -H wordHmms\m5\hmmdefs -M wordHmms\m6 modelList\wordList m5m6

END