Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
The Sound Patterns of Language: Phonology
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.
ETRW Modelling Pronunciation variation for ASR ESCA Tutorial & Research Workshop Modelling pronunciation variation for ASR INTRODUCING MULTIPLE PRONUNCIATIONS.
Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
The Acoustic/Lexical model: Exploring the phonetic units; Triphones/Senones in action. Ofer M. Shir Speech Recognition Seminar, 15/10/2003 Leiden Institute.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Speech Recognition. What makes speech recognition hard?
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Chapter three Phonology
2001/03/29Chin-Kai Wu, CS, NTHU1 Speech and Language Technologies for Audio Indexing and Retrieval JOHN MAKHOUL, FELLOW, IEEE, FRANCIS KUBALA, TIMOTHY.
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Natural Language Understanding
Phonology, phonotactics, and suprasegmentals
Automatic Continuous Speech Recognition Database speech text Scoring.
Introduction to Automatic Speech Recognition
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
7-Speech Recognition Speech Recognition Concepts
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.
Connected speech processes Coarticulation Suprasegmentals.
Kishore Prahallad IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Entropy & Hidden Markov Models Natural Language Processing CMSC April 22, 2003.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
CMU Robust Vocabulary-Independent Speech Recognition System Hsiao-Wuen Hon and Kai-Fu Lee ICASSP 1991 Presenter: Fang-Hui CHU.
The Use of Context in Large Vocabulary Speech Recognition Julian James Odell March 1995 Dissertation submitted to the University of Cambridge for the degree.
AGA 4/28/ NIST LID Evaluation On Use of Temporal Dynamics of Speech for Language Identification Andre Adami Pavel Matejka Petr Schwarz Hynek Hermansky.
Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
English Phonetics 许德华 许德华. Objectives of the Course This course is intended to help the students to improve their English pronunciation, including such.
Introduction to Speech Neal Snider, For LIN110, April 12 th, 2005 (adapted from slides by Florian Jaeger)
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Ch 5b: Discriminative Training (temporal model) Ilkka Aho.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.
Guided By, DINAKAR DAS.C.N ( Assistant professor ECE ) Presented by, ARUN.V.S S7 EC ROLL NO: 2 1.
Recurrent Neural Networks for Natural Language Processing
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Speech Processing Speech Recognition
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Audio Books for Phonetics Research
Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky
Presentation transcript:

Acoustic / Lexical Model Derk Geene

Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words): Language model  Idea: Maximize P(signal|words) P(words)  Today: Acoustic model

Variability  Variation Speaker Pronunciation Environmental Context  Static acoustic model will not work in real applications.  Dynamically adapt P(signal|words) while using the system.

Measuring errors (1)  500 sentences of 6 – 10 words each from 5 to 10 different speakers.  10% relative error reduction  Training set / Development set  First decide optimal parameter settings.

Measuring errors (2)  Word recognition errors: Substitution Deletion Insertion Correct: Did mob mission area of the Copeland ever go to m4 in nineteen eighty one? Recognized: Did mob mission area ** the copy land ever go to m4 in nineteen east one?

Measuring errors (3) Correct: The effect is clear Recognised:Effect is not clear Error Rate One by one: 75% Subs + Dels + Ins #words in correct sentence Word error rate=100% x  Word error rate

Units of speech (1)  Modeling is language dependent.fixme  Modeling unit Accurate Trainable Generalizable

Units of speech (2)  Whole-word models Only suitable for small vocabulary recognition  Phone models Suitable for large vocabulary recognition Problem: over-generalize  less accurate  Syllable models

Context dependency (1)  Recognition accuricy can be improved by using context-dependent parameters.  Important in fast / spontanious speech.  Example: the phoneme /ee/

 Peat  Wheel

Context dependency (2)  Triphone model: phonetic model that takes into consideration both the left and the right neightbouring phones.  If two phones have the same identity, but different left or right contexts, there are considered different triphones.  Interword context-dependent phones.  Place in the word: Beginning Middle End

Context dependency (3)  Stress Longer duration Higher pitch More intensity  Word-level stress Import – Import Italy – Italian  Sentence-level stress I did have dinner.

 Radio

Context dependency (4)  Vary much triphones =  Many phonemes have the same effects /b/ & /p/ labial (pronounces by using lips) /r/ & /w/ liquids  Clustered acoustic-phonetic units Is the left-context phone a fricative? Is the right-context phone a front vowel?

Acoustic model  After feature extraction, we have a sequence of feature vectors, such as the MFCC vector, as input data. Feature stream Phonemes / units Words Segmentation and labeling Lexical access problem

Acoustic model  Signal  Phonemes  Problem: phonemes can be pronounced differently Speaker differences Speaker rate Microphone

Acoustic model  Phonemes  Words  The three major ways to do this: Vector Quantization Hidden Markov Models Neural Networks

Acoustic model  Problem: Multiple pronunciations: owt aa ey tow t ax m aa ey tow 0,5 0,8 m Dialect variation Coarticulation 0,5 0,2

TheEnd