© 2013 by Larson Technical Services

Slides:



Advertisements
Similar presentations
Telephony Speech Recognition Application Testing Presentation for IEEE SCV Signal Processing Society March 8, 2004 Copyright CoAssure, Inc., 2004.
Advertisements

INTERACTIVE VOICE RESPONSE SYSTEM (IVRS)
Tuning Jenny Burr August Discussion Topics What is tuning? What is the process of tuning?
Automatic Switchboard Operator Luboš Šmídl, Tomáš Valenta Department of Cybernetics Faculty of Applied Sciences University of West Bohemia in Pilsen.
Natural Language Systems
Building an ASR using HTK CS4706
                      Digital Audio 1.
Collaborative Customer Relationship Management (CCRM) User Group June 23 rd, 2004.
SCHEME OF WORK SS ONE FIRST TERM 2014/ 2015
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Voice Guidelines 1© 2013 by Larson Technical Services.
Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
Auditory User Interfaces
Why is ASR Hard? Natural speech is continuous
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
Automatic Transcript Generation Helmer Strik A 2 RT Dept. of Language & Speech University of Nijmegen.
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
1 A Practical Rollout & Tuning Strategy Phil Shinn 08/06.
Should Intelligent Agents Listen and Speak to Us? James A. Larson Larson Technical Services
© 2007 Cisco Systems, Inc. All rights reserved.UCCXD v2.0—10-1 Configuring CME for CRS 5.0 & ASR Grammar.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Speaker Recognition By Afshan Hina.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Conversational Applications Workshop Introduction Jim Larson.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Speech Recognition Application
SIV Applications Claudia Daboul (IBP) Martin Eckert (T-Systems) Judith Markowitz (J. Markowitz, Consultants) 08. Aug 2006.
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
CP SC 881 Spoken Language Systems. 2 of 23 Auditory User Interfaces Welcome to SLS Syllabus Introduction.
Interaction Design Session 12 LBSC 790 / INFM 718B Building the Human-Computer Interface.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
Integrating VoiceXML with SIP services
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
© Copyright by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Tutorial 27 - Phone Book Application Introducing Multimedia.
Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Proportions Round One 2) x + 3 = 15 Answers 2.) x + 3 = 15 X=12 21.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
Creating User Interfaces [Continue presentations as needed] Speech recognition. Speech synthesis Homework: Report on current products. Register on Tellme.
Speech Recognition MIT SMA 5508 Spring 2004 Larry Rudolph (MIT)
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.
© 2013 by Larson Technical Services
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of.
VoiceXML – Speech Recognition Yousef Rabah. VoiceXML Markup Language Dialogs Dependencies Standalone Vs. Hosted Speaker Dependent Vs. Speaker Independent.
Speech Recognition Created By : Kanjariya Hardik G.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
Speech Recognition Xiaofeng Lai. What is speech recognition?  Speech recognition :  This is the ability of a machine or program to identify words and.
Presented By Sharmin Sirajudeen S7 CS Reg No :
Interactive Voice Response (IVR)
G. Anushiya Rachel Project Officer
Natural Language Processing and Speech Enabled Applications
Yes, I'm able to index audio files within Alfresco
Automatic Speech Recognition
ARTIFICIAL NEURAL NETWORKS
Speech Recognition UNIT -5.
Artificial Intelligence for Speech Recognition
•Topic: How many BANANAS?
Artificial Intelligence 2004 Speech & Natural Language Processing
Huawei CBG AI Challenges
Presentation transcript:

© 2013 by Larson Technical Services Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing © 2013 by Larson Technical Services

Speech Recognition (ASR, SST) Grammar-Based Developer specifies words to be recognized Statistical Language Models Developer records and tags phrases © 2013 by Larson Technical Services

© 2013 by Larson Technical Services Recognition Technology Source Target Typical Technique Automatic speech recognition (ASR) Spoken language Text Hidden Markov Model, Neural Net, Table lookup Touchtone recognition Caller presses buttons on phone Digits Tone recognition Speaker Identification Names of registered callers Table lookup Voice Activity Detection Caller speaks or does not speak “On” or “Off” Attention word Classification Categories   Statistical analysis Language Identification National language names © 2013 by Larson Technical Services

Touchtone Recognition Caller responds to voice menus by pressing touchtone buttons on the telephone keypad Advantages Highly accurate Disadvantages Lost in space Time-consuming menus where user must convert choice to a digit © 2013 by Larson Technical Services

© 2013 by Larson Technical Services Speech Recognition Advantages User does not convert choices to a digit Disadvantages Occasional failure to recognize what user said Time-consuming dialogs Users may interrupt prompts by “barge-in” © 2013 by Larson Technical Services

Speech Recognition Engines     Low-end High-end Other Speaking mode Isolated (discrete) Continuous Keywords Enrollment Speaker dependent Speaker independent Adaptive Vocabulary size Small Large Switch vocabu-laries Speaking style Read Spontaneous Number of simultaneous callers Single-threaded Multi-threaded © 2013 by Larson Technical Services

How Speech Recognition Works Words and Phrases Word Identification Phoneme Identification Feature Extraction signal Digital signal processing Audio Input © 2013 by Larson Technical Services

How Speech Recognition Works Words and Phrases Word Identification Phoneme Identification Acoustic Model Transform features to phonemes Feature Extraction Sounds in a language Different for each language May be speaker dependent (speaker must train model) May be speaker independent (pretrained) Usually supplied by ASR vendor Audio Input © 2013 by Larson Technical Services

How Speech Recognition Works Words and Phrases Language Model Word Identification Words in a language and their pronunciation Transform phonemes to words Phoneme Identification Feature Extraction Audio Input © 2013 by Larson Technical Services

Grammar-based Speech Recognition Context-free Grammar (CFG) Words and Phrases Grammar Grammar Compiler Language Model Word Identification Lexicon Phoneme Identification Feature Extraction Audio Input © 2013 by Larson Technical Services

Where are grammars used? Interactive Response Systems (IVR) Automated telephone agents Each step may use a different grammar Grammar defines only the words which the user may speak during a step Application developers specify grammars for each step The same grammar may be reused in multiple applications © 2013 by Larson Technical Services

© 2013 by Larson Technical Services Example Grammar <grammar type = "application/srgs+xml" root = "single_digit" mode = "voice">      <rule id = "single_digit">           <one-of>                <item> one </item>                <item> two </item>                <item> three </item>                <item> four </item>                <item> five </item>                <item> six </item>                <item> seven </item>                <item> eight </item>               <item> nine </item>          </one-of>      </rule> </grammar> © 2013 by Larson Technical Services

© 2013 by Larson Technical Services Example Grammar <grammar type = "application/srgs+xml" root = "twenties" mode = "voice"> <rule id = "twenties“> <one-of> <item> twenty </item> <item> twenty <ruleref uri = "#single_digit"/> </item> </one-of> </rule>      <rule id = "single_digit">           <one-of>                <item> one </item>                <item> two </item>                <item> three </item>                <item> four </item>                <item> five </item>                <item> six </item>                <item> seven </item>                <item> eight </item>               <item> nine </item>          </one-of>      </rule> </grammar> © 2013 by Larson Technical Services

© 2013 by Larson Technical Services Grammar with 3 Rules <grammar type = "application/grammar+xml" root = "request" mode = "voice"> <rule id = "request"> <ruleref uri = "#color"/> <ruleref uri = "#size"/> </rule> <rule id = "size"> <one-of> <item> small </item> <item> medium </item> <item> large </item> </one-of> </rule>    <rule id = "color">         <one-of>                <item> red </item>                <item> green </item>                 <item> blue </item>           </one-of>      </rule>                   © 2013 by Larson Technical Services

© 2013 by Larson Technical Services Grammar Exercise Extend the grammar to include the combination of “color,” “size,” and “product” where product may be “T-shirt” or “vest” © 2013 by Larson Technical Services

XML and ABNF Grammar Formats <rule id = "single_digit">           <one-of>                <item> one </item>                <item> two </item>                <item> three </item>                <item> four </item>                <item> five </item>                <item> six </item>                <item> seven </item>                <item> eight </item>               <item> nine </item>          </one-of>      </rule> </grammar> $single_digit = one | two | three | four | five | six | seven | eight | nine XML format Verbose Validated by XML tools ABNF format Terse Familiar to compiler experts Not validated by XML tools © 2013 by Larson Technical Services

Summary Grammar-Based Speech Recognition Various speech recognition technologies are used for a large variety of applications. Speech grammars are used to constrain the words that a user may speak during a single step of an automated conversation. Trained application developers create a grammar for each step of an automated conversation. © 2013 by Larson Technical Services

Answer: Grammar Exercise <grammar type = "application/grammar+xml" root = "request" mode = "voice"> <rule id = “request" "> <ruleref uri = "#color"/> <ruleref uri = "#size"/> <ruleref uri = "#product"/> </rule> <rule id = "size"> <one-of> <item> small </item> <item> medium </item> <item> large </item> </one-of> </rule>    <rule id = "color">         <one-of>                <item> red </item>                <item> green </item>                 <item> blue </item>           </one-of>      </rule>                   <rule id = “product">    <one-of>         <item> T-shirt </item>          <item> vest </item>       </one-of> </rule>                   © 2013 by Larson Technical Services