專題研究 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu,Cheng-Kuan Wei.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0)
4b Lexical analysis Finite Automata
專題研究 WEEK 4 - LIVE DEMO Prof. Lin-Shan Lee TA. Hsiang-Hung Lu,Cheng-Kuan Wei.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Finite State Transducers
Finite-state automata 2 Day 13 LING Computational Linguistics Harry Howard Tulane University.
Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
FSA and HMM LING 572 Fei Xia 1/5/06.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
Computational Language Finite State Machines and Regular Expressions.
Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building.
Finite state automaton (FSA)
1 Finite state automaton (FSA) LING 570 Fei Xia Week 2: 10/07/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA.
專題研究 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
12/13/2007Chia-Ho Ling1 SRILM Language Model Student: Chia-Ho Ling Instructor: Dr. Veton Z. K ë puska.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.
Graphical Models over Multiple Strings Markus Dreyer and Jason Eisner Dept. of Computer Science, Johns Hopkins University EMNLP 2009 Presented by Ji Zongcheng.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
專題研究 (4) HDecode_live Prof. Lin-Shan Lee, TA. Yun-Chiao Li 1.
Xinhao Wang, Jiazhong Nie, Dingsheng Luo, and Xihong Wu Speech and Hearing Research Center, Department of Machine Intelligence, Peking University September.
專題研究 (2) Feature Extraction, Acoustic Model Training WFST Decoding
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
CS 203: Introduction to Formal Languages and Automata
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
Modeling Computation: Finite State Machines without Output
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
Automatic Speech Recognition
Finite State Machines Dr K R Bond 2009
Prof. Lin-shan Lee TA. Roy Lu
Lexical analysis Finite Automata
Juicer: A weighted finite-state transducer speech decoder
An overview of decoding techniques for LVCSR
專題研究 week3 Language Model and Decoding
Prof. Lin-shan Lee TA. Lang-Chi Yu
8.0 Search Algorithms for Speech Recognition
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
THEORY OF COMPUTATION Lecture One: Automata Theory Automata Theory.
EEG Recognition Using The Kaldi Speech Recognition Toolkit
CSC NLP - Regex, Finite State Automata
4b Lexical analysis Finite Automata
N-Gram Model Formulas Word sequences Chain rule of probability
CS4705 Natural Language Processing
Prof. Lin-shan Lee TA. Po-chun, Hsu
專題研究 WEEK 5 – Deep Neural Networks in Kaldi
Finite-State and the Noisy Channel
Formal Languages, Automata and Models of Computation
4b Lexical analysis Finite Automata
Prof. Lin-shan Lee TA. Roy Lu
Presentation transcript:

專題研究 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu,Cheng-Kuan Wei

語音辨識系統 Front-end Signal Processing Acoustic Models Lexicon Feature Vectors Linguistic Decoding and Search Algorithm Output Sentence Speech Corpora Acoustic Model Training Language Model Construction Text Corpora Lexical Knowledge-base Language Model Input Speech Grammar Use Kaldi as tool 2

Language Modeling: providing linguistic constraints to help the selection of correct words Prob [the computer is listening] > Prob [they come tutor is list sunny] Prob [ 電腦聽聲音 ] > Prob [ 店老天呻吟 ] t t

00.train_lm.sh 01.format.sh Language Model Training 4

Language Model : Training Text (1/2)  train_text=ASTMIC_transcription/train.text  cut -d ' ' -f 1 --complement $train_text > /exp/lm/LM_train.text 5 remove the first column

Language Model : Training Text (2/2)  cut -d ' ' -f 1 --complement $train_text > /exp/lm/LM_train.text

Language Model : ngram-count (1/3)  /share/srilm/bin/i686-m64/ngram-count  -order 2 (You can modify it from 1~3)  -kndiscount (modified Kneser-Ney smoothing)  -text /exp/lm/LM_train.text (Your training data file name on p.7)  -vocab $lexicon (Lexicon, as shown on p.10)  -unk (Build open vocabulary language model)  -lm $lm_output (Your language model name)  gram-count.1.html

Language Model : ngram-count (2/3)  Smoothing  Many events never occur in the training data  e.g. Prob [Jason immediately stands up]=0 because Prob [immediately| Jason]=0  Try to assign some non-zero probabilities to all events even if they never occur in the training data.   Week 2 – Language Modeling

Language Model : ngram-count (3/3)  Lexicon  lexicon=material/lexicon.train.txt

01.format.sh  Try to replace with YOUR language model !

WFST Decoding 04a.01.mono.mkgraph.sh 04a.02.mono.fst.sh 07a.01.tri.mkgraph.sh 07a.02.tri.fst.sh Viterbi Decoding 04b.mono.viterbi.sh 07b.tri.viterbi.sh Decoding 11

WFST : Introduction (1/3)  FSA (or FSM)  Finite state automata / Finite state machine  An FSA “accepts” a set of strings  View FSA as a representation of a possibly infinite set of strings  Start state(s) bold; final/accepting states have extra circle.  This example represents the infinite set {ab, aab, aaab,...}

WFST : Introduction (2/3)  FSA with edges weighted  Like a normal FSA but with costs on the arcs and final- states  Note: cost comes after “/”, For final-state, “2/1” means final-cost 1 on state 2.  This example maps “ab” to (3 = ).

WFST : Introduction (3/3)  WFST  Like a weighted FSA but with two tapes : input and output.  Ex. Input tape : “ac”  Output tape : “xz”  Cost = = 6.5

WFST Composition  Notation: C = A 。 B means, C is A composed with B

WFST Component  HCLG = H 。 C 。 L 。 G  H: HMM structure  C: Context-dependent relabeling  L: Lexicon  G: language model acceptor

Framework for Speech Recognition 17

WFST Component 18 L(Lexicon) H (HMM) G (Language Model) Where is C ? (Context-Dependent)

Training WFST  04a.01.mono.mkgraph.sh  07a.01.tri.mkgraph.sh

Decoding WFST (1/3)  From HCLG we have…  the relationship from state  word  We need another WFST, U  Compose U with HCLG, i.e. S = U 。 HCLG  Search the best path(s) on S is the recognition result 20

Decoding WFST (2/3)  04a.02.mono.fst.sh  07a.02.tri.fst.sh

Decoding WFST (3/3)  During decoding, we need to specify the weight respectively for acoustic model and language model  Split the corpus to Train, Test, Dev set  Training set used to training acoustic model  Test all of the acoustic model weight on Dev set, and use the best  Test set used to test our performance (Word Error Rate, WER) 22

Viterbi Decoding  Viterbi Algorithm  Given acoustic model and observations  Find the best state sequence  Best state sequence   Phone sequence (AM)   Word sequence (Lexicon)   Best word sequence (LM)

Viterbi Decoding  04b.mono.viterbi.sh  07b.tri.viterbi.sh

Language model training, WFST decoding, Viterbi decoding 00.train_lm.sh 01.format.sh 04a.01.mono.mkgraph.sh 04a.02.mono.fst.sh 07a.01.tri.mkgraph.sh 07a.02.tri.fst.sh 04b.mono.viterbi.sh 07b.tri.viterbi.sh Homework

ToDo  Step1. Finish code in 00.train_lm.sh and get your LM.  Step2. Use your LM in 01.format.sh  Step3.1. Run 04a.01.mono.mkgraph.sh and 04a.02.mono.fst.sh (WFST decode for mono-phone)  Step3.2 Run 07a.01.tri.mkgraph.sh and 07a.02.tri.fst.sh (WFST decode for tri-phone)  Step4.1 Run 04b.mono.viterbi.sh (Viterbi for mono)  Step4.2 Run 07b.tri.viterbi.sh (Viterbi for tri-phone)

ToDo (Opt.)  Train LM : Use YOUR training text or even YOUR lexicon.  Train LM (ngram-count) : Try different arguments.  count.1.html  Watch online courses on coursera (Week2 - LM)   Read 數位語音處理概論  4.0 (Viterbi)  6.0 (Language Model)  9.0 (WFST)  Try different AM/LM combinations and report the recognition results.

Questions ?