專題研究 week3 Language Model and Decoding

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0)
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
專題研究 WEEK 4 - LIVE DEMO Prof. Lin-Shan Lee TA. Hsiang-Hung Lu,Cheng-Kuan Wei.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 3: ASR: HMMs, Forward, Viterbi.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
專題研究 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu.
專題研究 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu,Cheng-Kuan Wei.
12/13/2007Chia-Ho Ling1 SRILM Language Model Student: Chia-Ho Ling Instructor: Dr. Veton Z. K ë puska.
Name:Venkata subramanyan sundaresan Instructor:Dr.Veton Kepuska.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
 Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors.
7-Speech Recognition Speech Recognition Concepts
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.
專題研究 (4) HDecode_live Prof. Lin-Shan Lee, TA. Yun-Chiao Li 1.
專題研究 (2) Feature Extraction, Acoustic Model Training WFST Decoding
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Digital Speech Processing HW3
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION DIGITAL SPEECH PROCESSING HOMEWORK #1 DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION Date: Oct, Revised.
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
ALPHABET RECOGNITION USING SPHINX-4 BY TUSHAR PATEL.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Page 1 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Sanjay Patil, Jun-Won Suh Human and Systems Engineering Experimental.
Language Model for Machine Translation Jang, HaYoung.
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Automatic Speech Recognition
Prof. Lin-shan Lee TA. Roy Lu
Juicer: A weighted finite-state transducer speech decoder
An overview of decoding techniques for LVCSR
Conditional Random Fields for ASR
Prof. Lin-shan Lee TA. Lang-Chi Yu
Digital Speech Processing
8.0 Search Algorithms for Speech Recognition
CSCI 5832 Natural Language Processing
Automatic Speech Recognition
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
N-Gram Model Formulas Word sequences Chain rule of probability
CS4705 Natural Language Processing
Prof. Lin-shan Lee TA. Po-chun, Hsu
Automatic Speech Recognition
Dynamic Programming Search
專題研究 WEEK 5 – Deep Neural Networks in Kaldi
Automatic Speech Recognition
Artificial Intelligence 2004 Speech & Natural Language Processing
Prof. Lin-shan Lee TA. Roy Lu
Listen Attend and Spell – a brief introduction
Presentation transcript:

專題研究 week3 Language Model and Decoding Prof. Lin-Shan Lee TA. Yang-de Chen, r04942038@ntu.edu.tw

語音辨識系統 Use Kaldi Hw 1 Hw 2 Front-end Signal Processing Acoustic Models Lexicon Feature Vectors Linguistic Decoding and Search Algorithm Output Sentence Speech Corpora Model Training Language Construction Text Lexical Knowledge-base Input Speech Grammar Hw 2

Language Modeling: providing linguistic constraints to help the selection of correct words Prob [the computer is listening] > Prob [they come tutor is list sunny] Prob [電腦聽聲音] > Prob [店老天呻吟] t

Language Model Training 00.train_lm.sh 01.format.sh

Language Model : Training Text (1/2) train_text=ASTMIC_transcription/train.text cut -d ' ' -f2- $train_text > exp/lm/LM_train.text remove the first column

Language Model : Training Text (2/2) cut -d ' ' -f2- $train_text > exp/lm/LM_train.text

Language Model : ngram-count (1/3) /share/srilm/bin/i686-m64/ngram-count -order 2 (You can modify it from 1~3) -kndiscount (modified Kneser-Ney smoothing) -text exp/lm/LM_train.text (Your training data file name on p.7) -vocab $lexicon (Lexicon, as shown on p.10) -unk (Build open vocabulary language model) -lm $lm_output (Your language model name) http://www.speech.sri.com/projects/srilm/manpages/ngram -count.1.html ngram-count -help

Language Model : ngram-count (2/3) Smoothing Many events never occur in the training data e.g. Prob [Jason immediately stands up]=0 because Prob [immediately| Jason]=0 Try to assign some non-zero probabilities to all events even if they never occur in the training data. https://class.coursera.org/nlp/lecture Week 2 – Language Modeling

Language Model : ngram-count (3/3) Lexicon lexicon=material/lexicon.train.txt

01.format.sh Try to replace with YOUR language model !

Decoding

Decoding given the acoustic observations 𝑋 = 𝑥 1 , 𝑥 2 , …, 𝑥 𝑛 Find the most probable word sequence 𝑊 = 𝑤 1 , 𝑤 2 , …, 𝑤 𝑚 given the acoustic observations 𝑋 = 𝑥 1 , 𝑥 2 , …, 𝑥 𝑛

Framework for Speech Recognition 𝑊 = arg max 𝑊 𝑃(𝑊|𝑋) = arg max 𝑊 𝑃 𝑋 𝑊 𝑃(𝑊) (Lexicon)

Decoding WFST Decoding 04a.01.mono.mkgraph.sh 04a.02.mono.fst.sh 07a.01.tri.mkgraph.sh 07a.02.tri.fst.sh

WFST Decoder : Component Like a weighted FSA but with two tapes : input and output. Ex. Input tape : “ac”  Output tape : “xz” Cost = 0.5 + 2.5 + 3.5 = 6.5 Ex. Input tape : “bc”  Output tape : “yz” Cost = 1.5 + 2.5 + 3.5 = 7.5

WFST Decoder HCLG = H。C。L。G H: HMM structure C: Context-dependent relabeling L: Lexicon G: language model acceptor

WFST Component L(Lexicon) Where is C ? (Context-Dependent) H (HMM) G (Language Model)

Training WFST 04a.01.mono.mkgraph.sh 07a.01.tri.mkgraph.sh

Decoding WFST (1/2) 04a.02.mono.fst.sh 07a.02.tri.fst.sh

Decoding WFST (2/2) During decoding, we need to specify the weight respectively for acoustic model and language model Split the corpus to Train, Test, Dev set Training set used to training acoustic model Test all of the acoustic model weight on Dev set, and use the best Test set used to test our performance (Word Error Rate, WER)

Decoding Viterbi Decoding 04b.mono.viterbi.sh 07b.tri.viterbi.sh

Viterbi Decoding Viterbi Algorithm (for acoustic model) Given acoustic model and observations Find the best state sequence Viterbi Algorithm (for decoding) Given acoustic model, lexicon, language model Given observations Find the best word sequence 𝑊 = arg max 𝑊 𝑃(𝑊|𝑋) = arg max 𝑊 𝑃 𝑋 𝑊 𝑃(𝑊) = arg max 𝑊 𝑃 𝑊 max 𝑄 𝑃 𝑋 𝑄 𝑃 𝑄 𝑊

Viterbi Decoding 04b.mono.viterbi.sh 07b.tri.viterbi.sh

Homework Language model training , WFST decoding , Viterbi decoding 00.train_lm.sh 01.format.sh 04a.01.mono.mkgraph.sh 04a.02.mono.fst.sh 07a.01.tri.mkgraph.sh 07a.02.tri.fst.sh 04b.mono.viterbi.sh 07b.tri.viterbi.sh

ToDo Step1. Finish code in 00.train_lm.sh and get your LM. Step2. Use your LM in 01.format.sh Step3.1. Run 04a.01.mono.mkgraph.sh and 04a.02.mono.fst.sh (WFST decode for mono-phone) Step3.2 Run 07a.01.tri.mkgraph.sh and 07a.02.tri.fst.sh (WFST decode for tri-phone) Step4.1 Run 04b.mono.viterbi.sh (Viterbi for mono) Step4.2 Run 07b.tri.viterbi.sh (Viterbi for tri-phone)

ToDo (Opt.) Train LM : Use YOUR training text or even YOUR lexicon. Train LM (ngram-count) : Try different arguments. http://www.speech.sri.com/projects/srilm/manpages/n gram-count.1.html Watch online courses on coursera (Week2 - LM) https://class.coursera.org/nlp/lecture Try different AM/LM combinations and report the recognition results.

Questions ?