專題研究 (2) Feature Extraction, Acoustic Model Training WFST Decoding

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

數位語音處理概論 HW#2-1 HMM Training and Testing
Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center.
Speech Recognition with Hidden Markov Models Winter 2011
專題研究 WEEK 4 - LIVE DEMO Prof. Lin-Shan Lee TA. Hsiang-Hung Lu,Cheng-Kuan Wei.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Finite State Transducers
Introduction to Hidden Markov Models
Hidden Markov Models Theory By Johan Walters (SR 2003)
Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
Speech Recognition. What makes speech recognition hard?
FSA and HMM LING 572 Fei Xia 1/5/06.
Finite state automaton (FSA)
How shell works Shell reads in input, parse & process it, the resulting string is taken to be argv[] and executed echo a  shell sees “ echo a ” echo gets.
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
專題研究 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu.
專題研究 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu,Cheng-Kuan Wei.
Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models C. J. Leggetter and P. C. Woodland Department of.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Introduction to Automatic Speech Recognition
Gaussian Mixture Model and the EM algorithm in Speech Recognition
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Presentation by Daniel Whiteley AME department
DSP homework 1 HMM Training and Testing
EMT 2390L Lecture 8 Dr. Reyes Reference: The Linux Command Line, W.E. Shotts.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
1 Hidden Markov Model 報告人:鄒昇龍. 2 Outline Introduction to HMM Activity of HMM Problem and Solution Conclusion Reference.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Graphical Models over Multiple Strings Markus Dreyer and Jason Eisner Dept. of Computer Science, Johns Hopkins University EMNLP 2009 Presented by Ji Zongcheng.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
專題研究 (4) HDecode_live Prof. Lin-Shan Lee, TA. Yun-Chiao Li 1.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
Week Two Agenda Announcements Link of the week Use of Virtual Machine Review week one lab assignment This week’s expected outcomes Next lab assignments.
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
Initial HMM Model. 語音訊號 表示成所有中文語 言單元 經過 lexicon 字彙庫找出所 有的子音 (22) 、母音 (39) 、 32mixture 32 個 µ 、 v 32mixture VQ 32 個 µ 、 v MLVQ 左至右 HMM MLVQ u 、 v 、 c 、
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION DIGITAL SPEECH PROCESSING HOMEWORK #1 DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION Date: Oct, Revised.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Date: October, Revised by 李致緯
Prof. Lin-shan Lee TA. Roy Lu
Statistical Models for Automatic Speech Recognition
專題研究 week3 Language Model and Decoding
Prof. Lin-shan Lee TA. Lang-Chi Yu
Computational NeuroEngineering Lab
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Statistical Models for Automatic Speech Recognition
Automatic Speech Recognition: Conditional Random Fields for ASR
Prof. Lin-shan Lee TA. Po-chun, Hsu
專題研究 WEEK 5 – Deep Neural Networks in Kaldi
2007 SPEECH PROJECT PRESENTATION
LECTURE 15: REESTIMATION, EM AND MIXTURES
專題研究 WEEK 5 – Deep Neural Networks in Kaldi
Prof. Lin-shan Lee TA. Roy Lu
The Application of Hidden Markov Models in Speech Recognition
Listen Attend and Spell – a brief introduction
Presentation transcript:

專題研究 (2) Feature Extraction, Acoustic Model Training WFST Decoding Prof. Lin-Shan Lee, TA. Yun-Chiao Li

Announcement You will probably have many questions from today Go to ptt2 “SpeechProj” Your problem can probably help others

Linux Shell Script Basics echo “Hello” (print “hello” on the screen) a=ABC (assign ABC to a) echo $a (will print ABC on the screen) b=$a.log (assign ABC.log to b) cat $b > testfile (write “ABC.log” to testfile) 指令 -h (will output the help information)

Feature Extraction 02.01.extract.feat.sh 02.02.convert.htk.feat.sh

Feature Extraction - MFCC

02.01.extract.feat.sh

Example of MFCC

02.02.convert.htk.feat.sh Hidden Markov Model Toolkit (HTK) is the model we used to use In this project, we learn Kaldi Vulcan provide an interface to convert one to another Type “bash 02.02.convert.htk.feat.sh” The feature will then be converted to HTK format

Acoustic Model Training 03.01.mono0a.train.sh

Acoustic Model Hidden Markov Model/Gaussian Mixture Model 10 Hidden Markov Model/Gaussian Mixture Model 3 states per model Example

Acoustic model training (1/2) When training acoustic model, we need labelled data material/train.txt 03.01.mono0a.train.sh Lacks the information to train initialized the HMM model with equally aligning frame to each state Gaussian Mixture Model (GMM) accumulation and estimation. you might want to check “HMM Parameter Estimation ” in HTK Book, or “HMM problem 3” in course

Acoustic model training (2/2) Refine the alignment in some specific iterations, (in variable realign_iters)

Introduction to WFST

FST An FSA “accepts” a set of strings View FSA as a representation of a possibly infinite set of strings Start state(s) bold; final/accepting states have extra circle. This example represents the infinite set {ab, aab, aaab , . . .}

WFST Like a normal FSA but with costs on the arcs and final-states Note: cost comes after “/”, For final-state, “2/1” means final- cost 1 on state 2. This example maps ab to (3 = 1 + 1 + 1), all else to 1.

WFST Composition Notation: C = A B means, C is A composed with B

WFST Component HCLG = H。C。L。G H: HMM structure C: Context-dependent relabeling L: Lexicon G: language model acceptor

Framework for Speech Recognition

WFST Component L(Lexicon) Where is C ? (Context-Dependent) H (HMM) G (Language Model)

Training WFST 03.02.mono0a.mkgraph.sh

03.02.mono0a.mkgraph.sh

Decoding WFST 03.03.mono0a.fst.sh

Decoding WFST (1/2) From HCLG we have… We need another WFST, U the relationship from state -> word We need another WFST, U Compose U with HCLG, ie, S = U。HCLG Search the best path(s) on S is the recognition result

Decoding WFST (2/2) During decoding, we need to specify the weight respectively for acoustic model and language model Split the corpus to Train, Test, Dev set Training set used to training acoustic model Test all of the acoustic model weight on Dev set, and use the best Test set used to test our performance (Word Error Rate, WER)

03.03.mono0a.fst.sh (1/2)

03.03.mono0a.fst.sh (2/2)

Homework 02.01~03.04.sh

To Do Copy data into your own directory Execute the following command: cp –r /share/ Execute the following command: bash 01.format.data.sh bash 02.01.extract.feat.sh bash 02.02.convert.htk.feat.sh … Observe the output and report You might want to check HTK book for acoustic model training

Some Helpful References “使用加權有限狀態轉換器的基於混合詞與次詞 以文字及語音指令偵測口語詞彙” – 第三章 https://www.dropbox.com/s/dsaqh6xa9dp3dzw/wfst _thesis.pdf