Speech Recognition MIT 6.893 SMA 5508 Spring 2004 Larry Rudolph (MIT)

Slides:



Advertisements
Similar presentations
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Advertisements

Language and communication What is language? How do we communicate? Pragmatic principles Common ground.
2.2 Validation & Verification
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
Writing Workshop Here are some typical writing style issues which people have trouble with.
Learning about the CELDT
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Confidence Measures for Speech Recognition Reza Sadraei.
Test-taking strategies: How to approach Multiple-choice and True/False Questions.
More Writing Workshop Use care with words like “thing” and “where”. For example: – “things like vision” – “visual illusions where colours are distorted”
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital.
Part of speech (POS) tagging
Some definitions Morphemes = smallest unit of meaning in a language Phrase = set of one or more words that go together (from grammar) (e.g., subject clause,
HUMANOID ANIMATION DRIVEN BY HUMAN VOICE Thesis Advisor : Dr. Donald P. Brutzman Second Reader : Dr. Xiaoping Yun A Thesis By Ozan APAYDIN, Turkish Navy.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
1 Functional Testing Motivation Example Basic Methods Timing: 30 minutes.
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
Natural Language Understanding
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓
Speech Recognition. My computer doesn’t understand me……….. Software is now mainstream Many people use it within office/home setting for inputting text.
14 Days Until CAHSEE!!! 15 February  Essay Revision Questions are based on the text of brief rough drafts, and they appear in two basic forms:
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Group 3 Teacher: Kate Chen Student: Nicole Ivy Julie Yuki Sandy Kelly.
Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
MIT 6.893; SMA 5508 Spring 2004 Larry Rudolph Lecture Introduction Speechbuilder Tutorial.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Computational Linguistics Ling 200 Spring 2006.
Helynn Boughner EDU 674 Prof. Klein.  Is any technology that can help a person do a task. It can be as high- tech, as a computer system that speaks the.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Mining Reference Tables for Automatic Text Segmentation Eugene Agichtein Columbia University Venkatesh Ganti Microsoft Research.
How Solvable Is Intelligence? A brief introduction to AI Dr. Richard Fox Department of Computer Science Northern Kentucky University.
CELDT Learning about the CELDT Created by Mike Hammar.
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
MIT 6.893; SMA 5508 Spring 2004 Larry Rudolph Lecture Introduction Sketching Interface.
MIT 6.893; SMA 5508 Spring 2004 Larry Rudolph Lecture 4: Graphic User Interface Graphical User Interface Larry Rudolph MIT 6.893; SMA 5508 Spring 2004.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS 416 Artificial Intelligence Lecture 19 Reasoning over Time Chapter 15 Lecture 19 Reasoning over Time Chapter 15.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
© 2013 by Larson Technical Services
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
Pervasive Computing MIT SMA 5508 Spring 2006 Larry Rudolph 1 Location, Location, Location Larry Rudolph.
CERTIFICATE IV IN BUSINESS JULY 2015 BSBWRT401A - Write Complex Documents.
The single most important skill for a computer programmer is problem solving Problem solving means the ability to formulate problems, think creatively.
Study Skills Lesson 7 Test Taking Preparing for Tests Daily: –Highlight important words and ideas in notes –Read assignments the teacher gives –Keep.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
SEESCOASEESCOA SEESCOA Meeting Activities of LUC 9 May 2003.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Pervasive Computing MIT SMA 5508 Spring 2006 Larry Rudolph 1 Debugging Applications in Pervasive Computing Larry Rudolph May 1, 2006 SMA 5508; MIT.
Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Speech Recognition Created By : Kanjariya Hardik G.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Using Commonsense Reasoning to Improve Voice Recognition.
How to Take Tests! 4 Learn all about: 4 True/False 4 Matching 4 Multiple Choice 4 Fill in the Blank.
Speech User Interface 10/26/2010. Pervasive Information Access Information & Services I-Land vision by Streitz, et. al.
IIS for Speech Processing Michael J. Watts
Using Speech Recognition to Predict VoIP Quality
Glottodidactics Lesson 4.
Dialog Design 4 Speech & Natural Language
Retrieval of audio testimonials via voice search
Huawei CBG AI Challenges
Presentation transcript:

Speech Recognition MIT SMA 5508 Spring 2004 Larry Rudolph (MIT)

Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly A long term goal Since 1950, AI researchers claimed Crucial problem Will be solved within the decade Finally, it appears true Failure rates still too high 90% hit rate is 10% error rate want 98% or 99% success rate

6.893 Spring 2004: User Interface Larry Rudolph Spectrum of choices Constrained Domain Unconstrained Domain Speaker Dependent Voice tags (e.g. phone) Trained Dictation (Viavoice) Speaker Independent Galaxy (we are here) What everyone wants

Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly Waveform to Phonemes Waveform is very fuzzy We think there is a large break between words and sentences hard to see from waveform Mapping waveform segments to phonemes is not accurate

Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly Phonemes to words Group phonemes into words not always 1-1 mapping missing phonemes false phonemes (extra ones) accents many possible choices Word should be known to system domain or dictionary

Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly words to sentences People do not always speak grammatically correct some invariant rules (for speech) extra or missing words phrases not always sentences Easier when sentence is in domain domain specified by grammar

Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly sentences into meaning Dictation system: want sentences Other system: want to understand Integrate high-level processing Most applications need it anyway Helps with recognition useful to disambiguate input

Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly meaning into action What happens after meaning? Respond to user (even a beep) Usually generate more substantial response Action should be valid in context

Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly Disambiguation Each transformation is rarely highly accurate Lots of choices Subsequent steps can rule out choices from previous steps

Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly disambiguation strategy Select “n-best” choices and pass on Each step restricts possible meaning Make heavy use of probability Viterby search state transitions along with probabilities. push through n choices at once

Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly after domain dependent Handling out-of-vocabulary words Multimodal input improve recognition rates e.g. lip reading sometimes easier to point than say