DSP homework 1 HMM Training and Testing

Slides:

Advertisements

Similar presentations

1 Gesture recognition Using HMMs and size functions.

Advertisements

Building an ASR using HTK CS4706

Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:

Building an ASR using HTK CS4706

數位語音處理概論 HW#2-1 HMM Training and Testing

專題研究 WEEK 4 - LIVE DEMO Prof. Lin-Shan Lee TA. Hsiang-Hung Lu,Cheng-Kuan Wei.

INSTRUCTOR:Dr.Veton Kepuska STUDENT:Dileep Narayan.Koneru YES/NO RECOGNITION SYSTEM.

Hidden Markov Models Theory By Johan Walters (SR 2003)

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

The Beatbox Voice-to-Drum Synthesizer A BSTRACT The Beatbox is a real time voice-to-drum synthesizer intended primarily for the entertainment of small.

Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.

The Internet. Telnet Telnet means using your computer as a terminal. All commands you type are sent to the host computer you are connected to and executed.

專題研究 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu,Cheng-Kuan Wei.

Numerical Text-to-Speech Synthesis System Presentation By: Sevakula Rahul Kumar.

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.

Computer Architecture B-tree construction & Traversal YongSoo Bae Room 236, Engineering Building [Project 1]

7-Speech Recognition Speech Recognition Concepts

Arnel Fajardo, student (“Hak Seng”)

Presentation by Daniel Whiteley AME department

Introduction to Engineering MATLAB – 6 Script Files - 1 Agenda Script files.

Csc Lecture 7 Recognizing speech. Geoffrey Hinton.

Linux Operations and Administration

Online Arabic Handwriting Recognition Fadi Biadsy Jihad El-Sana Nizar Habash Abdul-Rahman Daud Done byPresented by KFUPM Information & Computer Science.

Practice and Evaluation. Practice Develop a java class called: SumCalculator.java which computes a sum of all integer from 1 to 100 and displays the result.

Results of Tagalog vowel Speech recognition using Continuous HMM Arnel C. Fajardo Ph. D student (Under the supervision of Professor Yoon-Joong Kim)

Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.

Temple University Training Acoustic model using Sphinx Train Jaykrishna shukla,Mubin Amehed& cara Santin Department of Electrical and Computer Engineering.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

專題研究 (4) HDecode_live Prof. Lin-Shan Lee, TA. Yun-Chiao Li 1.

專題研究 (2) Feature Extraction, Acoustic Model Training WFST Decoding

1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.

1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.

The HTK Book (for HTK Version 3.2.1) Young et al., 2002.

Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga Maurício O. Tsugawa ©2002,

Performance Comparison of Speaker and Emotion Recognition

Speech Analysis TA : 林賢進 HW /10/28 1. Goal This homework is aimed to analyze speech from spectrogram, and try to distinguish different initials/

DYNAMIC TIME WARPING IN KEY WORD SPOTTING. OUTLINE KWS and role of DTW in it. Brief outline of DTW What is training and why is it needed? DTW training.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION DIGITAL SPEECH PROCESSING HOMEWORK #1 DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION Date: Oct, Revised.

ECE 544 Software Project 1 Kuo-Chun Huang (KC). Environment Linux (Ubuntu or others) Windows with Cygwin

HW2-2 Speech Analysis TA: 林賢進

Unix tools Regular expressions grep sed AWK. Regular expressions Sequence of characters that define a search pattern banana matches the text banana

長庚多媒體信號處理實驗室 1 HTK tutorial Speaker: ricer Date:

Development Environment

Date: October, Revised by 李致緯

CS 330 Class 7 Comments on Exam Programming plan for today:

Prof. Lin-shan Lee TA. Roy Lu

Speech Analysis TA:Chuan-Hsun Wu

Use Case Analysis Chapter 5.

A Digital Tool for the Classroom

Statistical Models for Automatic Speech Recognition

專題研究 week3 Language Model and Decoding

Prof. Lin-shan Lee TA. Lang-Chi Yu

Classification with Perceptrons Reading:

Lab 2: Isolated Word Recognition

CS 1308 Exam 2 Review.

Statistical Models for Automatic Speech Recognition

Programming in JavaScript

network of simple neuron-like computing elements

Lab 3: Isolated Word Recognition

PROJ2: Building an ASR System

Prof. Lin-shan Lee TA. Po-chun, Hsu

Presentation by Daniel Whiteley AME department

Handwritten Characters Recognition Based on an HMM Model

Programming in JavaScript

專題研究 WEEK 5 – Deep Neural Networks in Kaldi

專題進度報告資工四 B 洪志豪資工四 B 林宜鴻.

Lab 2: Information Retrieval

Prof. Lin-shan Lee TA. Roy Lu

Presentation transcript:

DSP homework 1 HMM Training and Testing Advisor: Lin-Shan Lee Date: 2007.04.10

Digit Recognizer Construct a Digit Recognizer Free tools of HMM: HTK http://htk.eng.cam.ac.uk/ Training data, testing data, and some script. http://speech.ee.ntu.edu.tw/courses/DSP/

Digit Recognizer - Flowchart

Thanks for HTK! We will focus on these HTK tools

HCopy - Feature extraction HCopy -C lib/hcopy.cfg –S scripts/training_hcopy.scp Convert wave to 39 dimension MFCC -C lib/hcopy.cfg Input and output format e.g. wav -> MFCC_Z_E_D_A Parameters of feature extraction Chapter 7 - Speech Signals and Front-end Processing –S scripts/training_hcopy.scp A mapping from Input file name to output file name speechdata/training/N110022.wav MFCC/training/N110022.mfc …

Initial MMF prototype MMF: HTK chapter 7 ~o <VECSIZE> 39 <MFCC_Z_E_D_A> ~h "proto" <BeginHMM> <NumStates> 5 <State> 2 <Mean> 39 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … <Variance> 39 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 … <State> 3 <Mean> 13 <Variance> 13 … <TransP> 5 0.0 1.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 <EndHMM>

HCompV - Initialize Compute global mean and variance of features HCompV -C lib/config.fig -o hmmdef -M hmm -S scripts/training.scp lib/proto Compute global mean and variance of features -C lib/config.fig Set format of input feature. E.g. MFCC_Z_E_D_A -o hmmdef -M hmm Set output name: hmm/hmmdef -S scripts/training.scp A list of training data. lib/proto A description of a HMM model. HTK MMF format

HCompV - Initialize

Initial HMM bin/macro bin/models_1mixsil hmm/models bin/macro Construct each HMM bin/models_1mixsil Add silence HMM These are written in C. But you can do these by a text editor. hmm/hmmdef

HERest - Adjust HMM model HERest -C lib/config.cfg –S scripts/training.scp –I labels/Clean08TR.mlf -H hmm/macros -H hmm/models –M hmm lib/models.lst Adjust parameters  to maximize P(O|) one iteration of EM algorithm Run this command three times => three iteration –I labels/Clean08TR.mlf Set label file to “labels/Clean08TR.mlf” lib/models.lst A list of word models. ( liN (零), #i (一), #er (二),… jiou (九), sil )

HERest - Adjust HMM model Chapter 4 - basic problem 3 for HMM Give O and an initial model =(A,B,), adjust  to maximize P(O|)

HHEd – Modify HMMs lib/sil1.hed lib/models_sp.lst bin/spmodel_gen hmm/models hmm/models Add SP (short pause) HMM definition to MMF file “hmm/models” HHEd -H hmm/macros -H hmm/models –M hmm lib/sil1.hed lib/models_sp.lst lib/sil1.hed A list of command to modify HMM definitions lib/models_sp.lst A new list of model. ( liN (零), #i (一), #er (二),… jiou (九), sil, sp ) See HTK book 3.2.2 (p. 33)

HERest - Adjust HMM model again HERest -C lib/config.cfg –S scripts/training.scp –I labels/Clean08TR_sp.mlf -H hmm/macros -H hmm/models –M hmm lib/models_sp.lst

HHEd - Increase numbers of mixture HHEd -H hmm/macros -H hmm/models -M hmm lib/mix2_10.hed lib/models_sp.lst Increase numbers of mixture

HERest - Adjust HMM model again HERest -C lib/config.cfg –S scripts/training.scp –I labels/Clean08TR_sp.mlf -H hmm/macros -H hmm/models –M hmm lib/models_sp.lst

Flowchart

HParse – construct word net HParse lib/grammar_sp lib/wdnet_sp lib/grammar_sp Regular expression Lib/wdnet_sp Output Word net

HVite – Viterbi search HVite -H hmm/macros -H hmm/models -S scripts/testing.scp -C lib/config.cfg -w lib/wdnet_sp -l '*' -i result/result.mlf -p 0.0 -s 0.0 lib/dict lib/models_sp.lst Chapter 8.0 - Search Algorithms for Speech Recognition -w lib/wdnet_sp Input word net -i result/result.mlf Output MLF file lib/dict Dictionary: a mapping from word to phone sequences E.g. ling -> liN, er -> #er, … . 一 -> sic_i i, 七-> chi_i i

HResult – compare with answers HResults -e "???" sil -e "???" sp -I labels/answer.mlf lib/models_sp.lst result/result.mlf Longest Common Subsequence (LCS) ====================== HTK Results Analysis ==================== Date: Mon Apr 9 19:34:02 2007 Ref : labels/answer.mlf Rec : result/result.mlf ------------------------ Overall Results -------------------------- SENT: %Correct=91.88 [H=441, S=39, N=480] WORD: %Corr=98.56, Acc=97.41 [H=1713, D=17, S=8, I=20, N=1738] ============================================================

Training flowchart

Requirement 1 – run baseline Download homework package Set PATH for HTK tools execute (bash shell script): 01_run_HCopy.sh 02_run_HCompV.sh 03_training.sh 04_testing.sh You can find acc in result/accuracy

Requirement 2 – improve recognition accuracy

Submit your homework Request files: Deadline: 2007.4.25 0:00 result/accuracy – the best acc you have done hmm/models – your HMM models Any script you wrote or modified A document describe your training process and accuracy Compress all files above into a zip file named “hw1_b9xxxxxx.zip” E-mail to TA with subject: [DSP] hw1_b9xxxxx <your name> and your zip file Deadline: 2007.4.25 0:00

Suggestion and other information Read HTK book chapter 3 first. (except 3.3, 3.5) Many hints are in the shell scripts. Perl or a good editor is your good friend. If you have problems about bash shell script, see http://linux.vbird.org/linux_basic/0340bashshell-scripts.php If you want to run HTK under windows Cygwin + HTK windows binary. If you have any question, mail to TA 呂東烜 r95041@csie.ntu.edu.tw