 Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors.

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

Building an ASR using HTK CS4706
USA AREA CODES APPLICATION by Koffi Eddy Ihou May 6,2011 Florida Institute of Technology 1.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
INSTRUCTOR:Dr.Veton Kepuska STUDENT:Dileep Narayan.Koneru YES/NO RECOGNITION SYSTEM.
15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 3: ASR: HMMs, Forward, Viterbi.
Linguist Module in Sphinx-4 By Sonthi Dusitpirom.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Natural Language Processing - Speech Processing -
Application of HMMs: Speech recognition “Noisy channel” model of speech.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
COMP 4060 Natural Language Processing Speech Processing.
Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen July 30 th, 2004.
Automatic Speech Recognition Introduction. The Human Dialogue System.
Automatic Speech Recognition Introduction Readings: Jurafsky & Martin HLT Survey Chapter 1.
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
CS 224S / LINGUIST 281 Speech Recognition, Synthesis, and Dialogue
Automatic Continuous Speech Recognition Database speech text Scoring.
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
Introduction to Automatic Speech Recognition
Isolated-Word Speech Recognition Using Hidden Markov Models
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Speech and Language Processing
7-Speech Recognition Speech Recognition Concepts
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
1 Word Recognition with Conditional Random Fields Jeremy Morris 12/03/2009.
HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.
8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.
Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Machine Learning: Probabilistic Luger: Artificial.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
ALPHABET RECOGNITION USING SPHINX-4 BY TUSHAR PATEL.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
CS 224S / LINGUIST 285 Spoken Language Processing
Automatic Speech Recognition
Automatic Speech Recognition Introduction
Statistical Models for Automatic Speech Recognition
8.0 Search Algorithms for Speech Recognition
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Speech Processing Speech Recognition
Automatic Speech Recognition Introduction
Statistical Models for Automatic Speech Recognition
Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky
Module Recognition Algorithms
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

 Feature extractor

 Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

 Acoustic Observations

 Hidden States

 Acoustic Observations  Hidden States  Acoustic Observation likelihoods

“Six”

 Constructs the HMMs of phones  Produces observation likelihoods

 Constructs the HMMs for units of speech  Produces observation likelihoods  Sampling rate is critical!  WSJ vs. WSJ_8k

 Constructs the HMMs for units of speech  Produces observation likelihoods  Sampling rate is critical!  WSJ vs. WSJ_8k  TIDIGITS, RM1, AN4, HUB4

 Word likelihoods

 ARPA format Example: 1-grams: board bottom bunch grams: as the at all at the grams: in the lowest in the middle in the on

public = ; public = (please | kindly | could you ) *; public = [ please | thanks | thank you ]; = ; = (open | close | delete | move); = [the | a] (window | file | menu);

 Maps words to phoneme sequences

 Example from cmudict.06d POULTICE P OW L T AH S POULTICES P OW L T AH S IH Z POULTON P AW L T AH N POULTRY P OW L T R IY POUNCE P AW N S POUNCED P AW N S T POUNCEY P AW N S IY POUNCING P AW N S IH NG POUNCY P UW NG K IY

 Constructs the search graph of HMMs from:  Acoustic model  Statistical Language model ~or~  Grammar  Dictionary

 Can be statically or dynamically constructed

 FlatLinguist

 DynamicFlatLinguist

 FlatLinguist  DynamicFlatLinguist  LexTreeLinguist

 Maps feature vectors to search graph

 Searches the graph for the “best fit”

 P(sequence of feature vectors| word/phone)  aka. P(O|W) -> “how likely is the input to have been generated by the word”

F ay ay ay ay v v v v v F f ay ay ay ay v v v v F f f ay ay ay ay v v v F f f f ay ay ay ay v v F f f f ay ay ay ay ay v F f f f f ay ay ay ay v F f f f f f ay ay ay v …

Time O1O2O3

 Uses algorithms to weed out low scoring paths during decoding

 Words!

 Most common metric  Measure the # of modifications to transform recognized sentence into reference sentence

 Reference: “This is a reference sentence.”  Result: “This is neuroscience.”

 Reference: “This is a reference sentence.”  Result: “This is neuroscience.”  Requires 2 deletions, 1 substitution

 Reference: “This is a reference sentence.”  Result: “This is neuroscience.”

 Reference: “This is a reference sentence.”  Result: “This is neuroscience.”  D S D

 Limited Vocab Multi-Speaker

 Extensive Vocab Single Speaker

*If you have noisy audio input multiply expected error rate x 2

Other variables: -Continuous vs. Isolated -Conversational vs. Read -Dialect

 Questions?

Time O1O2O3

Time O1O2O3 P(ay | f) * P(O2|ay) P(f|f) * P(O2 | f)

Time O1O2O3 P (O1) * P(ay | f) * P(O2|ay)

Time O1O2O3

 Common Sphinx4 FAQs can be found online: c/Sphinx4-faq.html c/Sphinx4-faq.html  What followes are some less-FAQs

 Q. Is a search graph created for every recognition result or one for the recognition app?  A. This depends on which Linguist is used. The flat linguist generates the entire search graph and holds it in memory. It is only useful for small vocab recognition tasks. The lexTreeLinguist dynamically generates search states allowing it to handle very large vocabularies

 Q. How does the Viterbi algorithm save computation over exhaustive search?  A. The Viterbi algorithm saves memory and computation by reusing subproblems already solved within the larger solution. In this way probability calculations which repeat in different paths through the search graph do not get calculated multiple times  Viterbi cost = n 2 – n 3  Exhaustive search cost = 2 n -3 n

 Q. Does the linguist use a grammar to construct the search graph if it is available?  A. Yes, a grammar graph is created

 Q. What algorithm does the Pruner use?  A. Sphinx4 uses absolute and relative beam pruning

 Absolute Beam Width - # active search paths 

 Absolute Beam Width - # active search paths   Relative Beam Width – probability threshold 

 Absolute Beam Width - # active search paths   Relative Beam Width – probability threshold   Word Insertion Probability – Word break likelihood 

 Absolute Beam Width - # active search paths   Relative Beam Width – probability threshold   Word Insertion Probability – Word break likelihood   Language Weight – Boosts language model scores 

 Silence Insertion Probability – Likelihood of inserting silence 

 Silence Insertion Probability – Likelihood of inserting silence   Filler Insertion Probability – Likelihood of inserting filler words 

 To call a Java example from Python: import subprocess subprocess.call(["java", "-mx1000m", "-jar", "/Users/Username/sphinx4/bin/Transcriber.jar”)

 Speech and Language Processing 2 nd Ed. Daniel Jurafsky and James Martin Pearson, 2009  Artificial Intelligence 6 th Ed. George Luger Addison Wesley, 2009  Sphinx Whitepaper aper aper  Sphinx Forum