HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
ECE 8443 – Pattern Recognition Objectives: Acoustic Modeling Language Modeling Feature Extraction Search Pronunciation Modeling Resources: J.P.: Speech.
ECE 8443 – Pattern Recognition Objectives: Course Introduction Typical Applications Resources: Syllabus Internet Books and Notes D.H.S: Chapter 1 Glossary.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Speech Recognition. What makes speech recognition hard?
Natural Language Understanding
Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.
Neuroscience Program's Seminar Series HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs Joseph Picone, PhD Professor and Chair Department of Electrical and.
Proseminar HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs Joseph Picone, PhD Professor and Chair Department of Electrical and Computer Engineering Temple.
Emerging Directions in Statistical Modeling in Speech Recognition Joseph Picone and Amir Harati Institute for Signal and Information Processing Temple.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
"Dude, Where's My... Signals and Systems Textbook?" Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Joseph Picone, PhD Department of Electrical and Computer Engineering Mississippi State University The Important Role of Mathematics in Human Language Technology.
7-Speech Recognition Speech Recognition Concepts
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines Jon Hamaker and Joseph Picone Institute for.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Speech recognition and the EM algorithm
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Basic structure of sphinx 4
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Automatic Speech Recognition
College of Engineering Temple University
Automatic Speech Recognition Introduction
Conditional Random Fields for ASR
LECTURE 01: COURSE OVERVIEW
Statistical Models for Automatic Speech Recognition
8.0 Search Algorithms for Speech Recognition
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Machine Learning Ali Ghodsi Department of Statistics
Statistical Models for Automatic Speech Recognition
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
LECTURE 01: COURSE OVERVIEW
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
LECTURE 23: INFORMATION THEORY REVIEW
LECTURE 15: REESTIMATION, EM AND MIXTURES
Speech recognition, machine learning
Speech Recognition: Acoustic Waves
Speech recognition, machine learning
The Application of Hidden Markov Models in Speech Recognition
Presentation transcript:

HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs Joseph Picone, PhD Professor and Chair Department of Electrical and Computer Engineering Temple University URL:

Acoustic Models P(A/W) Speech Recognition Architectures Core components of modern speech recognition systems: Transduction: conversion of an electrical or acoustic signal to a digital signal; Feature Extraction: conversion of samples to vectors containing the salient information; Acoustic Model: statistical representation of basic sound patterns (e.g., hidden Markov models); Language Model: statistical model of common words or phrases (e.g., N-grams); Search: finding the best hypothesis for the data using an optimization procedure. Acoustic Front-end Acoustic Models P(A/W) Language Model P(W) Search Input Speech Recognized Utterance

Statistical Approach: Noisy Communication Channel Model

Acoustic Models P(A/W) Speech Recognition Overview Based on a noisy communication channel model in which the intended message is corrupted by a sequence of noisy models Bayesian approach is most common: Objective: minimize word error rate by maximizing P(W|A) P(A|W): Acoustic Model P(W): Language Model P(A): Evidence (ignored) Acoustic models use hidden Markov models with Gaussian mixtures. P(W) is estimated using probabilistic N-gram models. Parameters can be trained using generative (ML) or discriminative (e.g., MMIE, MCE, or MPE) approaches. Acoustic Front-end Acoustic Models P(A/W) Language Model P(W) Search Input Speech Recognized Utterance Feature Extraction

Features: Convert a Signal to a Sequence of Vectors

Acoustic Models P(A/W) Speech Recognition Overview Based on a noisy communication channel model in which the intended message is corrupted by a sequence of noisy models Bayesian approach is most common: Objective: minimize word error rate by maximizing P(W|A) P(A|W): Acoustic Model P(W): Language Model P(A): Evidence (ignored) Acoustic models use hidden Markov models with Gaussian mixtures. P(W) is estimated using probabilistic N-gram models. Parameters can be trained using generative (ML) or discriminative (e.g., MMIE, MCE, or MPE) approaches. Acoustic Front-end Acoustic Models P(A/W) Language Model P(W) Search Input Speech Recognized Utterance Research Focus

Acoustic Models: Capture the Time-Frequency Evolution

Language Modeling: Word Prediction

Search: Finding the Best Path breadth-first time synchronous beam pruning supervision word prediction natural language

Speech Recognition is Information Extraction Traditional Output: best word sequence time alignment of information Other Outputs: word graphs N-best sentences confidence measures metadata such as speaker identity, accent, and prosody Applications: Information localization data mining emotional state stress, fatigue, deception

Brief Bibliography of Related Research S. Pinker, The Language Instinct: How the Mind Creates Language, William Morrow and Company, New York, New York, USA, 1994. F. Juang and L.R. Rabiner, “Automatic Speech Recognition - A Brief History of the Technology,” Elsevier Encyclopedia of Language and Linguistics, 2nd Edition, 2005. M. Benzeghiba, et al., “Automatic Speech Recognition and Speech Variability, A Review,” Speech Communication, vol. 49, no. 10-11, pp. 763–786, October 2007. B.J. Kroger, et al., “Towards a Neurocomputational Model of Speech Production and Perception,” Speech Communication, vol. 51, no. 9, pp. 793- 809, September 2009. B. Lee, “The Biological Foundations of Language”, available at http://www.duke.edu/~pk10/language/neuro.htm (a review paper). M. Gladwell, Blink: The Power of Thinking Without Thinking, Little, Brown and Company, New York, New York, USA, 2005.