Presentation by Daniel Whiteley AME department

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
INSTRUCTOR:Dr.Veton Kepuska STUDENT:Dileep Narayan.Koneru YES/NO RECOGNITION SYSTEM.
ECE 8443 – Pattern Recognition Objectives: Elements of a Discrete Model Evaluation Decoding Dynamic Programming Resources: D.H.S.: Chapter 3 (Part 3) F.J.:
Hidden Markov Models Theory By Johan Walters (SR 2003)
ECE 8527 Homework Final: Common Evaluations By Andrew Powell.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Dynamic Time Warping Applications and Derivation
Introduction to C Programming
Agenda Data Representation Purpose Numbering Systems Binary, Octal, Hexadecimal (Hex) Numbering System Conversions Binary to Octal, Octal to Binary Binary.
Introduction to Automatic Speech Recognition
Advanced File Processing
Gaussian Mixture Model and the EM algorithm in Speech Recognition
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
7-Speech Recognition Speech Recognition Concepts
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Arnel Fajardo, student (“Hak Seng”)
Presentation by Daniel Whiteley AME department
DSP homework 1 HMM Training and Testing
Results of Tagalog vowel Speech recognition using Continuous HMM Arnel C. Fajardo Ph. D student (Under the supervision of Professor Yoon-Joong Kim)
Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008.
Chapter 0 Getting Started. Objectives Understand the basic structure of a C++ program including: – Comments – Preprocessor instructions – Main function.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Evaluation Decoding Dynamic Programming.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
Performance Comparison of Speaker and Emotion Recognition
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION DIGITAL SPEECH PROCESSING HOMEWORK #1 DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION Date: Oct, Revised.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
Essential components of the implementation are:  Formation of the network and weight initialization routine  Pixel analysis of images for symbol detection.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Date: October, Revised by 李致緯
Finite State Machines Dr K R Bond 2009
Introduction to the C Language
Hidden Markov Models.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
LECTURE 15: HMMS – EVALUATION AND DECODING
MATLAB: Structures and File I/O
I/O in C Lecture 6 Winter Quarter Engineering H192 Winter 2005
Topics Introduction to File Input and Output
Guide To UNIX Using Linux Third Edition
Hidden Markov Models Part 2: Algorithms
Statistical Models for Automatic Speech Recognition
LECTURE 14: HMMS – EVALUATION AND DECODING
Fundamentals of Python: First Programs
Command Me Specification
A QUICK START TO OPL IBM ILOG OPL V6.3 > Starting Kit >
LECTURE 15: REESTIMATION, EM AND MIXTURES
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
Topics Introduction to File Input and Output
Qiang Huo(*) and Chorkin Chan(**)
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
PYTHON - VARIABLES AND OPERATORS
Presentation transcript:

Presentation by Daniel Whiteley AME department HMM Toolkit (HTK) Presentation by Daniel Whiteley AME department

What is HTK? The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK is primarily used for speech recognition research although it has been used for numerous other applications including research into speech synthesis, character recognition and DNA sequencing. HTK is in use at hundreds of sites worldwide.

What is HTK? HTK consists of a set of library modules and tools available in C source form. The tools provide sophisticated facilities for speech analysis, HMM training, testing and results analysis. The software supports HMMs using both continuous density mixture Gaussians and discrete distributions and can be used to build complex HMM systems.

Basic HTK command format The commands in HTK follow a basic command line format: HCommand [options] files Options are indicated by a dash followed by the option letter. Universal options are capital letters. In HTK, it is not necessary to use file extentions, but headers to determine their format.

Configuration files As well, you can set up the configuration of HTK modules using config files. They are implemented using the -C option; or they can be implemented globally using the command setenv HCONFIG myconfig where myconfig is your own config modifications. All possible configuration variables can be found in chapter 18 of the HTK manual. However, for most of our purposes, we only need to create a config file with these lines: SOURCEKIND = USER %The user defined file format (not sound) TARGETKIND = ANON_D %Keep the file the same format.

Using HTK Parts of HMM modeling Data Preparation Model Training Pattern Recognition Model Analysis

Data Preparation One small problem: HTK was tailored for speech recognition. Therefore, most of the data preparation tools are for audio. Due to this, we need to jerry-rig our data to the HTK parameterized data file format. HTK parameter files consist of a sequence of samples preceeded by a header. The samples are simply data vectors, whose components are 2-byte integers or 4-byte floating point numbers. For us, these vectors will be a sequence of joint angles received from a motion capture session.

HTK file format The file begins with a 12-byte header containing the following information: nSamples (4-byte int): Number of samples samplePeriod (4-byte int): Sample period (calculated by multiplying the number by 100ns) sampleSize (2-byte): Number of bytes per vector parameterKind (2-byte int): Defines the type of data For our purposes, either this parameter will be 0x2400, which is the user defined parameter kind, or 0x2800, which is the discrete case.

HMM model creation In order to model the motion capture squence, we need to create a prototype of the HMM. In this prototype, the values of B and  are arbitrary. The same is true for the transition matrix A, save that any transition probability you set to zero will remain as zero. Models are created using a scripting language similar to HTML. As well, models in HTK have a beginning and ending state which are non-emitting. These states are not defined in the script.

HMM Model Example ~h ''prototype'' ... <BeginHMM> <TransP> Name of the file Number of Gaussian distributions ~h ''prototype'' <BeginHMM> <VectorSize> 4 <USER> <NumStates> 5 <State> 2 <NumMixes> 3 <Mixture> 1 0.3 <Mean> 4 0.0 0.0 0.0 0.0 <Variance> 4 1.0 1.0 1.0 1.0 <Mixture> 2 0.4 ... <State> 3 ... Transition matrix A ... <TransP> 0.0 0.4 0.3 0.3 0.0 0.0 0.2 0.5 0.3 0.0 0.0 0.2 0.2 0.4 0.2 0.0 0.1 0.2 0.3 0.4 0.0 0.0 0.0 0.0 0.0 Number of states Mean observation vector Sample size Covariance matrix diagonal All the transition probabilities for the ending state are always zero The distribution’s ID and weight

Vector Quantization In order to reduce computation, we can make the HMM discreete. In order to use a discreete HMM, we must first quantize the data into a set of standard vectors. Warning: in quantizing the data, error is inheritably introduced. Before quantizing the data, we must first have a standard set of vectors, or a “vector cookbook”. This is made with HQuant.

HQuant HQuant takes the training data and uses a K-means algorithm to evenly partition the data and find the centriods of these partitions to create our quantization vectors (QVs). A sample command: HQuant -C config -n 1 64 -S train.scp vqcook To reduce quatization time, a cookbook using a binary tree search algorithm can be made using the -t option. Number of QVs for a certain data stream You can use a script to list all of your training files Our cookbook will be written to this file Use the configuration variables found in config

Converting to Discrete The conversion of data files is done using the HCopy command. In order to quantize our data, we do this: HCopy –C quantize rawdata qvdata Where rawdata is our original data, qvdata is our quantized data, and quantize is a config file having these commands: SOURCEKIND = USER %We start with our original data TARGETKIND = DISCRETE %Convert it into discrete data SAVEASVQ = T %We throw away the continuous data VQTABLE = vqcook %We use are previously made %cookbook to quantize the data

Discrete HMM Discreete HMMs are very similar to their continuous counterparts, save for a few changes. Discrete probabilities are in logrithmic form, where: P(v) = exp(-d(v)/2371.8) ~o <Discrete> <StreamInfo> 1 1 ~h “dhmm” <BeginHMM> <NumStates> 5 <State> 2 <NumMixes> 10 <DProb> 5461*10 .... <EndHMM> Number of discrete symbols Duplicate function

Model Training (token HMM) The initialization of our prototype can be done using HInit: HInit [options] hmm data1 data2 data3 ... HInit is used mainly for left-right HMMs. For more ergodic HMMs, it can be initialized by doing a flat-start. This is done by setting all means and variances to the global counterparts using HCompV: HCompV -m -S trainlist hmm (The HHMM being trained)

Retraining The model this then retrained using the Welch- Baum algorithm found in HRest: HRest -w 1.0 -v 0.0001 -S trainlist hmm The -w and -v options are to set floors for the mixture probability and variances respectively. The float used in -w represents a multiplier of 10^-5. This can be iterated as many times as wanted to achieve desired results.

Dictionary Creation In order to create a recognition program or script, we must first create a dictionary. A dictionary in HTK gives the word and its pronunciation. For our purposes, it will just consist of our token HMM that we trained. RUNNING run WALKING walk JUMPING [SKIPPING] jump Word Tokens used to form the word Displayed output (if not specified the word is displayed)

Label Files Label files contain a transcription of what is going on in the data sequence. 000000 100000 walk 100001 200000 run 200001 300000 jump Start of frame in samples End of frame in samples Token found in that time frame

Master Label Files (MLFs) Same as a original label file During training and recognition, we may have many test files and their accompanying label files. The label files can be condensed into one file called a master label file, or MLF. “#!MLF!#” “*/a.lab” 000000 100000 walk 100001 200000 run 200001 300000 jump . “*/b.lab” run “*/jump*.lab” jump If the entire file is one token, it can be labeled with just the token The wildcard operator can be used to label multiple files at once

Pattern Recognition The recognition of a motion sequence is done by using HVite. To receive a transcription of the recognition data in MLF format, we use: HVite –a –i results –o SWT –H hmmlist \ –I transcripts.mlf –S testfiles Throws away unnecessary data in the label files Output transcription file in MLF format Text file containing a list of HMM used Create word network from given transcriptions MLF file that has the test files’ transcriptions Motion capture data to be recognized

Model Analysis The analysis of the recognition results is done by HResults. HResults -I transcripts.mlf -H hmmlist results Note: The reference labels and the results labels must have different file extensions MLF containing the reference labels List of HMMs used MLF containing result labels