James A. Larson Intel Corporation

Slides:



Advertisements
Similar presentations
VoiceXML: A Field Evaluation By: Kristy Bradnum Supervisor: Peter Clayton Presented in partial fulfilment of the CS Honours Project.
Advertisements

VoiceXML: Application and Session variables, N- best and Multiple Interpretations.
Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Collaborative Customer Relationship Management (CCRM) User Group June 23 rd, 2004.
Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.
VoiceXML: Events, Errors, and ECMAScript. Acknowledgements Prof. Mctear, Natural Language Processing, University.
Project 1 Introduction to HTML.
The State of the Art in VoiceXML Chetan Sharma, MS Graduate Student School of CSIS, Pace University.
Pace VoiceXML Absentee System Paul Visokey, Ping Gallivan, Yani Mulyani, Lisa Jordan, Elaine Li, George Mathew, Qisheng Hong Presenter Name : Paul Visokey.
VoiceXML and Internet Telephony Kundan Singh and Henning Schulzrinne Columbia University Joint work (in progress) with Daniel,
Multimodal Architecture for Integrating Voice and Ink XML Formats Under the guidance of Dr. Charles Tappert By Darshan Desai, Shobhana Misra, Yani Mulyani,
Upgrading to XHTML DECO 3001 Tutorial 1 – Part 1 Presented by Ji Soo Yoon 19 February 2004 Slides adopted from
VoiceXML Basic COCOMO Calculator By Greg Kutcher.
Developing a Basic Web Page with HTML
1st Project Introduction to HTML.
Tutorial 3: Adding and Formatting Text. 2 Objectives Session 3.1 Type text into a page Copy text from a document and paste it into a page Check for spelling.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Chapter ONE Introduction to HTML.
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
CPSC 594B: Software Engineering Project Lecture 1: Introduction to HTML5 Lecturer: Ayman Issa Office: ICT 555.
VoiceXML: Speech Recognition Grammars
Conversational Applications Workshop Introduction Jim Larson.
Introduction to VoiceXML 2.0 Rob Marchand Director of Product Management VoiceGenie Technologies Inc.
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
XHTML1 Building Document Structure Chapter 2. XHTML2 Objectives In this chapter, you will: Learn how to create Extensible Hypertext Markup Language (XHTML)
VoiceXML: Forms, Menus, Grammars, Form Interpretation Algorithm.
Integrating VoiceXML with SIP services
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
CHAPTER TEN AUTHORING.
The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
(c) 2007 Larson Technical Services1 VoiceXML Overview James A. Larson Intel Corporation
© 2013 by Larson Technical Services
© 2013 by Larson Technical Services
Creating User Interfaces Another example. Classwork/homework: work on VoiceXML project.
Listener-Control Navigation of VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better.
XP Review 1 New Perspectives on JavaScript, Comprehensive1 Introducing HTML and XHTML Creating Web Pages with HTML.
B Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Working with PDF and eText Templates.
Creating User Interfaces VoiceXML. Examples. Classwork/Homework: Make proposal and start work on your VoiceXML project.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Presented By Sharmin Sirajudeen S7 CS Reg No :
HTML PROJECT #1 Project 1 Introduction to HTML. HTML Project 1: Introduction to HTML 2 Project Objectives 1.Describe the Internet and its associated key.
Blended HTML and CSS Fundamentals 3 rd EDITION Tutorial 1 Using HTML to Create Web Pages.
VoiceXML Tutorial: Part 1 Introduction and User Interaction with DTMF
HTML CS 4640 Programming Languages for Web Applications
Microsoft PowerPoint™ 2010
Getting Started with CSS
Unit 4 Representing Web Data: XML
Project 1 Introduction to HTML.
Section 17.1 Section 17.2 Add an audio file using HTML
Introduction to XHTML.
Specifying, Compiling, and Testing Grammars
Chapter 7 Representing Web Data: XML
SALT & The Microsoft Speech Application SDK
May 25, Week Thirty-Two May 18, Week Thirty-One.
March 10, Week Twenty-Two March 3, Week Twenty-One
Managing Dialogue Julia Hirschberg CS /28/2018.
New Perspectives on XML
Web Development Using ASP .NET
Numbers
September 14, Week Two September 7, Week One.
Numbers and Number Names 0-20
STRIKE THE NUMBER! PLAY.
MATHS TIME! nine ten eleven fifteen eight
September 14, Week Two September 7, Week One.
HTML CS 4640 Programming Languages for Web Applications
VoiceXML An investigation Author: Mya Anderson
Presentation transcript:

James A. Larson Intel Corporation jim@larson-tech.com VoiceXML Overview James A. Larson Intel Corporation jim@larson-tech.com (c) 2007 Larson Technical Services

Outline Motivation for VoiceXML W3C Speech Interface Framework Languages Dialog—VoiceXML 2.0 Speech Synthesis—SSML Grammars—SRGS Semantic Interpretation—SI VoiceXML 2.1 (c) 2007 Larson Technical Services

VoiceXML in the Marketplace VoiceXML 2.0 is now ratified as a Recommendation (e.g., official standard) by the W3C Hundreds of millions of VoiceXML calls are answered every day VoiceXML is the standard for building speech-enabled applications (c) 2007 Larson Technical Services

Motivation for Speech Applications Users access Web sites from any telephone, anywhere, any time. Speaking and listening are the natural usage modes for phones. (c) 2007 Larson Technical Services

Strength of VoiceXML Applications Traditional system-directed dialogs for novice users Mixed initiative dialogs for experienced users Novice users smoothly become experienced users at their own pace (c) 2007 Larson Technical Services

Limitations of VoiceXML Applications No special analysis of speech input Not suitable for training speech skills—Reading, ESL, singing, etc. VUI conversational bandwidth is slower than GUI conversational bandwidth Using a VUI is like drinking from Lake Superior with a straw (c) 2007 Larson Technical Services

Exercise 1 Name or describe a speech application you could use at work. Name or describe a speech application you or family member can use at home. (c) 2007 Larson Technical Services

XML XML = eXtensible Markup Language Elements are surrounded by tags <prompt>Welcome to the voice system </prompt> Elements may be nested <prompt>      Welcome to Ajax Travel <break/> we have the cheapest fares </prompt> Elements may have attributes <choice next="#boat"> <grammar type="application/grammar+xml" version="1.0"        root = "by_boat" src = “boat.grxml”>   Because “<”, “>”, and “&” have special meanings “<” in place of “<” “>” in place of  “>” “&” in place of “&”.                      (c) 2007 Larson Technical Services

Outline Motivation for VoiceXML W3C Speech Interface Framework Languages Dialog—VoiceXML 2.0 Speech Synthesis—SSML Grammars—SRGS Semantic Interpretation—SI VoiceXML 2.1 (c) 2007 Larson Technical Services

Documents Web Browser Database Server Speech Server/Gateway Web Server Multimedia Files HTML Scripts VoiceXML Scripts Web Browser DB Voice Browser Capture Voice Grammars ASR Database Server DTMF Replay Audio Audio Files TTS Speech Server/Gateway Web Server (c) 2007 Larson Technical Services

W3C Speech Interface Framework VoiceXML 2.0 Speech Synthesis Call Control Semantic Interpretation Other Grammar (c) 2007 Larson Technical Services

Status of W3C Speech Interface Languages Recommendation Proposed Candidate Last Call Working Draft Requirements Voice XML 2.0 Grammar (SRGS) Synthesis (SSML) Semantic Interpret- Ration (SISR) Voice XML 2.1 Call Control (CCXML) V3 (c) 2007 Larson Technical Services

Outline Motivation for VoiceXML W3C Speech Interface Framework Languages Dialog—VoiceXML 2.0 Speech Synthesis—SSML Grammars—SRGS Semantic Interpretation—SI VoiceXML 2.1 (c) 2007 Larson Technical Services

Example of VoiceXML 2.0 Fragment Dialog Language (VocieXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI) <?xml version="1.0"?> <vxml version="2.0"> <form> … <field name = "account">   <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice">       <rule id = “account_type">           <one-of>                <item> savings </item>                <item> checking </item>         <item> CD </item>                 <item> certificate of deposit <tag>$ = “CD”<tag> </item>           </one-of>      </rule> </grammar> </field> …. </form> </vxml>    (c) 2007 Larson Technical Services

Example of VoiceXML 2.0 Fragment Dialog Language (VocieXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI) <?xml version="1.0"?> <vxml version="2.0"> <form> … <field name = "account">   <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice">       <rule id = “account_type">           <one-of>                <item> savings </item>                <item> checking </item>         <item> CD </item>                 <item> certificate of deposit <tag>$ = “CD”<tag> </item>           </one-of>      </rule> </grammar> </field> …. </form> </vxml>    (c) 2007 Larson Technical Services

Example of VoiceXML 2.0 Fragment Dialog Language (VocieXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI) <?xml version="1.0"?> <vxml version="2.0"> <form> … <field name = "account">   <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice">       <rule id = “account_type">           <one-of>                <item> savings </item>                <item> checking </item>         <item> CD </item>                 <item> certificate of deposit <tag>$ = “CD”<tag> </item>           </one-of>      </rule> </grammar> </field> …. </form> </vxml>    (c) 2007 Larson Technical Services

Example of VoiceXML 2.0 Fragment Dialog Language (VocieXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI) <?xml version="1.0"?> <vxml version="2.0"> <form> … <field name = "account">   <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice">       <rule id = “account_type">           <one-of>                <item> savings </item>                <item> checking </item>         <item> CD </item>                 <item> certificate of deposit <tag>new.account = “CD”<tag> </item>           </one-of>      </rule> </grammar> </field> …. </form> </vxml>    (c) 2007 Larson Technical Services

VoiceXML 2.0 features Menus, forms, sub-dialogs Inputs Output <menu>, <form>, <subdialog> Inputs Speech recognition <grammar> Recording <record> Keypad <grammar mode=“dtmf”> Output Audio files <audio> Text-to-speech <prompt> Variables <var> <script> <assign> Events <nomatch>, <noinput>, <help>, <catch>, <throw> Transition and submission <goto>, <submit> Telephony Connection control <transfer>, <disconnect> Telephony information Platform Objects Performance Fetch Telephony features Simple connection control Transfer to 3rd party <transfer> Add 3rd party <transfer bridge=“true”> Disconnect user <disconnect> Telephony information Automatic Number Identification Dialed Number Information Service Information Indicator Digit Platform features Invoke platform-specific functionality <object> SpeechObject custom credit card dialog Cell phone current location (latitude/longitude) Thermostat settings Control platform properties <property) Speech recognition threshold level Recognition-based bargein vs. energy-based Performance features Voice browsers optimized fetching Authors given close control over fetching (c) 2007 Larson Technical Services

Typical Form Fill-In <form> <block> <prompt>Welcome to the electronic payment system.</prompt> </block> <field name="card_number"> <prompt> Please enter your credit card number? </prompt> <grammar src=“http://www.ajax.com/credit_card_number.grxml"/> </field> <field name="date"> <prompt>Please enter your expiration date </prompt> <grammar src=“http://www.ajax.com/credit_card_date.grxml"/> </form> (c) 2007 Larson Technical Services

Exercise 2 Capture “birth date” <form> <block> <prompt> _____________________ </prompt> </block> <field name = "month"> <prompt> _______________________________</prompt> <grammar src=“http://www.ajax.com/month.grxml"/> </field> <field name = "day"> <prompt> ______________________________ </prompt> <grammar src=“http://www.ajax.com/day.grxml"/> <field name = "year"> <grammar src=“http://www.ajax.com/year.grxml"/> </form> (c) 2007 Larson Technical Services

Event Handlers Deal with exceptional or error conditions Control mechanism for dialog turn retries <catch event=“noinput”> … </catch> <catch event=“nomatch” … </catch> <catch event=“help”> … </catch> Shorthand notation available <noinput> … </noinput>, etc. Scoped according to where they occur <form>, <field>, etc. (c) 2007 Larson Technical Services

Adding Event Handlers <form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> <prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/> </field> </form> (c) 2007 Larson Technical Services

Adding Event Handlers <form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> <prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/> </field> </form> (c) 2007 Larson Technical Services

Adding Event Handlers <form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> <prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/> </field> </form> (c) 2007 Larson Technical Services

Default Event Handlers <catch event = "nomatch"> <prompt> I did not understand, please try again </prompt> </catch> <catch event = "help"> <prompt> Sorry, no help is available. </prompt> </catch> <catch event = "noinput"> <prompt> I did not hear anything, please speak again </prompt> </catch> (c) 2007 Larson Technical Services

Exercise 3 Write event handlers for the month field <catch event = "nomatch"> <prompt> __________________________ </prompt> </catch> <catch event = "help"> <prompt> ____________________ </prompt> </catch> <catch event = "noinput"> <prompt> ___________________________________ </prompt> </catch> (c) 2007 Larson Technical Services

Outline Motivation for VoiceXML W3C Speech Interface Framework Languages Dialog—VoiceXML 2.0 Speech Synthesis—SSML Grammars—SRGS Semantic Interpretation—SI VoiceXML 2.1 (c) 2007 Larson Technical Services

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: p, s Non-markup behavior: infer structure by automated text analysis Critic: I don’t want to use to use TTS, it’s too difficult to understand Fresponse: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults (c) 2007 Larson Technical Services

Before and after Structure Analysis Before structure analysis Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass. After structure analysis <p> <s> Dr. Smith lives at 214 Elm Dr. </s> He weights 214 lb. <s> He plays bass guitar. </s> He also likes to fish; last week he caught a 19 lb. bass. </p> (c) 2007 Larson Technical Services

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: p, s Non-markup behavior: infer structure by automated text analysis Critic: I don’t want to use to use TTS, it’s too difficult to understand Fresponse: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs (c) 2007 Larson Technical Services

After Text Normalization <p> <s> <sub alias= "doctor">Dr. </sub> Smith lives at 214 Elm <sub alias = "drive">Dr. </sub> </s> He weights 214<sub alias= "pounds"> lb. </sub> He plays bass guitar. He also likes to fish; last week he caught a 19 <sub alias= "pound"> lb. </sub> bass. </p> (c) 2007 Larson Technical Services

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: p, s Non-markup behavior: infer structure by automated text analysis Critic: I don’t want to use to use TTS, it’s too difficult to understand Fresponse: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs (c) 2007 Larson Technical Services

After text-to-phoneme conversion <sub alias = "doctor">Dr.</sub> Smith lives at <say-as interpret-as = “address"> 214 </sayas> Elm <sub alias = "drive">Dr. </sub> </s> He weighs <sayas interpret-as = “number”>214 </sayas> <sub alias= "pounds"> lb.</sub> He plays <phoneme alphabet = “IPA" ph="b@s">bass</phoneme> guitar. He also likes to fish; last week he caught a <sayas interpret-as= “number">19 </sayas> <sub alias= "pound"> lb. </sub> <phoneme alphabet = “IPA" ph="bas">bass</phoneme>. </p> (c) 2007 Larson Technical Services

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: p, s Non-markup behavior: infer structure by automated text analysis Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Critic: I don’t want to use to use TTS, it’s too difficult to understand Fresponse: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs (c) 2007 Larson Technical Services

Prosody Analysis (Initial text) <prompt> Environmental control menu. Do you want to adjust the lighting or temperature? </prompt> (c) 2007 Larson Technical Services

Prosody Analysis <prompt> Environmental control menu <break/> <emphasis level = "reduced" > do you want to adjust the </emphasis> <emphasis level = "strong"> lighting </emphasis> <break/> or <emphasis level = "strong"> temperature? </emphasis> </prompt> (c) 2007 Larson Technical Services

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: voice, audio* Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: paragraph, sentence Non-markup behavior: infer structure by automated text analysis *audio icons, branding, advertising Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Critic: I don’t want to use to use TTS, it’s too difficult to understand response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs (c) 2007 Larson Technical Services

Wave Form Production <prompt> <audio src=“http://www.example.com/adjust.wav" > <desc> Environmental control menu. Do you want to adjust the lighting or temperature </desc> </audio> </prompt> (c) 2007 Larson Technical Services

Exercise 4 (insert SSML commands) <prompt> Welcome to Ajax Bank do you want to withdraw or deposit funds? </prompt> (c) 2007 Larson Technical Services

Outline Motivation for VoiceXML W3C Speech Interface Framework Languages Dialog—VoiceXML 2.0 Speech Synthesis—SSML Grammars—SRGS Semantic Interpretation—SI VoiceXML 2.1 (c) 2007 Larson Technical Services

Grammars Describe what the user may say at a point in the dialog Enable the speech recognition engine to work faster and more accurately Consist of one or more “rules” (c) 2007 Larson Technical Services

Example Grammar <grammar type = "application/srgs+xml" root = "zero_to_ten" mode = "voice"> <rule id = "zero_to_ten">        <one-of> <item> zero </item>               <ruleref uri = "#single_digit"/>               <item> ten </item>         </one-of> </rule>      <rule id = "single_digit">           <one-of>                <item> one </item>                <item> two </item>                <item> three </item>                <item> four </item>                <item> five </item>                <item> six </item>                <item> seven </item>                <item> eight </item>               <item> nine </item>          </one-of>      </rule> </grammar> XML form of grammars (c) 2007 Larson Technical Services

should start with the “zero_to_ten” rule Example Grammar <grammar type = "application/srgs+xml" root = "zero_to_ten" mode = "voice"> <rule id = "zero_to_ten">        <one-of> <item> zero </item>               <ruleref uri = "#single_digit"/>               <item> ten </item>         </one-of> </rule>      <rule id = "single_digit">           <one-of>                <item> one </item>                <item> two </item>                <item> three </item>                <item> four </item>                <item> five </item>                <item> six </item>                <item> seven </item>                <item> eight </item>               <item> nine </item>          </one-of>      </rule> </grammar> Grammar processor should start with the “zero_to_ten” rule (c) 2007 Larson Technical Services

Example Grammar <grammar type = "application/srgs+xml" root = "zero_to_ten" mode = "voice"> <rule id = "zero_to_ten">        <one-of> <item> zero </item>               <ruleref uri = "#single_digit"/>               <item> ten </item>         </one-of> </rule>      <rule id = "single_digit">           <one-of>                <item> one </item>                <item> two </item>                <item> three </item>                <item> four </item>                <item> five </item>                <item> six </item>                <item> seven </item>                <item> eight </item>               <item> nine </item>          </one-of>      </rule> </grammar> This is a grammar used by the speech recognizer. (There may also be grammars for DTMF recognizers.) (c) 2007 Larson Technical Services

Example Grammar Rule describing single digits <grammar type = "application/srgs+xml" root = "zero_to_ten" mode = "voice"> <rule id = "zero_to_ten">        <one-of> <item> zero </item>               <ruleref uri = "#single_digit"/>               <item> ten </item>         </one-of> </rule>      <rule id = "single_digit">           <one-of>                <item> one </item>                <item> two </item>                <item> three </item>                <item> four </item>                <item> five </item>                <item> six </item>                <item> seven </item>                <item> eight </item>               <item> nine </item>          </one-of>      </rule> </grammar> Rule describing digits one through ten (c) 2007 Larson Technical Services

<one-of> describes Example Grammar <grammar type = "application/srgs+xml" root = "zero_to_ten" mode = "voice"> <rule id = "zero_to_ten">        <one-of>                <item> zero </item>               <ruleref uri = "#single_digit"/>               <item> ten </item>         </one-of> </rule>      <rule id = "single_digit">           <one-of>                <item> one </item>                <item> two </item>                <item> three </item>                <item> four </item>                <item> five </item>                <item> six </item>                <item> seven </item>                <item> eight </item>               <item> nine </item>          </one-of>      </rule> </grammar> <one-of> describes alternatives (c) 2007 Larson Technical Services

Rule element references another rule Example Grammar <grammar type = "application/srgs+xml" root = "zero_to_ten" mode = "voice"> <rule id = "zero_to_ten">        <one-of>                <item> zero </item>               <ruleref uri = "#single_digit"/>                <item> ten </item>         </one-of> </rule>      <rule id = "single_digit">           <one-of>                <item> one </item>                <item> two </item>                <item> three </item>                <item> four </item>                <item> five </item>                <item> six </item>                <item> seven </item>                <item> eight </item>               <item> nine </item>          </one-of>      </rule> </grammar> Rule element references another rule (c) 2007 Larson Technical Services

Example Grammar <grammar type = "application/srgs+xml" root = "zero_to_ten" mode = "voice"> <rule id = "zero_to_ten">        <one-of>               <item> zero </item>               <ruleref uri = "#single_digit"/>                <item> ten </item>         </one-of> </rule>      <rule id = "single_digit">           <one-of>                <item> one </item>                <item> two </item>                <item> three </item>                <item> four </item>                <item> five </item>                <item> six </item>                <item> seven </item>                <item> eight </item>               <item> nine </item>          </one-of>      </rule> </grammar> Exercise 5: Write a grammar for that recognizes the digits zero to nineteen (c) 2007 Larson Technical Services

More Grammar Elements Repeat and optional Sequence Garbage <rule id = "goodness" scope = "public">       <item repeat = "0-3" > very </item> good </rule> Sequence <rule id = "twenty_thru_twentynine“> Twenty  <ruleref uri = "#single_digit"/> Garbage <rule name = "James_Lewis">     <item> James <ruleref special = “garbage"/> Lewis </item> (c) 2007 Larson Technical Services

Reusing existing grammars type = "application/srgs+xml" root = "size” src = “http://www.example.com/size.grxml"/> (c) 2007 Larson Technical Services

Outline Motivation for VoiceXML W3C Speech Interface Framework Languages Dialog—VoiceXML 2.0 Speech Synthesis—SSML Grammars—SRGS Semantic Interpretation—SI VoiceXML 2.1 (c) 2007 Larson Technical Services

Semantic Interpretation Semantic Interpretation defines how to extract and modify the results returned by the speech recognition engine Semantic interpretation instructions contained in the <tag> element Two kinds of syntax for <tag> contents: Semantic Literals (literal values) Semantic Scripts (ECMAScript) (c) 2007 Larson Technical Services

Semantic Interpretation Semantic Literals example: <rule id=“drink“> <one-of> <item> coca cola <tag> coke </tag> </item>    <item> cola <tag> coke </tag>  </item>    <item> black fizzy stuff <tag> coke </tag> </item> <item> coke </item> </one-of> </rule> (c) 2007 Larson Technical Services

Semantic Interpretation Semantic Literals example: <rule id=“drink“> <one-of> <item> coca cola <tag> coke </tag> </item>    <item> cola <tag> coke </tag>  </item>    <item> black fizzy stuff <tag>coke </tag> </item> <item> coke </item> Default Assignment </one-of> </rule> (c) 2007 Larson Technical Services

No Semantic Scripts fourteen Grammar with Semantic Interpretation VoiceXML Interpreter ASR text Semantic Interpretation Processor ECMAScript object (c) 2007 Larson Technical Services

No Semantic Interpretation fourteen Grammar with Semantic Interpretation Scripts fourteen VoiceXML Interpreter ASR text Semantic Interpretation Processor ECMAScript object (c) 2007 Larson Technical Services

Semantic Interpretation fourteen <item> fourteen <tag>new.quantity=“14”;</tag> </item> Grammar with Semantic Interpretation Scripts VoiceXML Interpreter ASR text Semantic Interpretation Processor ECMAScript object (c) 2007 Larson Technical Services

Semantic Interpretation fourteen fourteen <item> fourteen <tag>new.quantity=“14”;</tag> </item> Grammar with Semantic Interpretation Scripts VoiceXML Interpreter ASR text Semantic Interpretation Processor ECMAScript object {      quantity: “14” } (c) 2007 Larson Technical Services

Semantic Interpretation Semantic Scripts employ ECMAScript Advantages: Richer structure (objects) Ability to perform computations (c) 2007 Larson Technical Services

Semantic Interpretation Large white Example grammar rule with Script Syntax: <rule id = "action"> <one-of>      <item> small <tag> out.size = "small"; </tag> </item>         <item> medium <tag> out.size = "medium"; </tag> </item>         <item> large <tag> out.size = “large"; </tag> </item>     </one-of> <one-of>      <item> green <tag> out.color = "green"; </tag> </item>         <item> blue   <tag> out.color = "blue"; </tag>  </item>         <item> white <tag> out.color = "white"; </tag>  </item>     </one-of> </rule> ECMAScript structure: action: {      size: "large" color:  "white"      } (c) 2007 Larson Technical Services

Semantic Interpretation Example grammar rule with Script Syntax: <rule id="calculator"> What is <ruleref uri="#digit"/><tag>$.total = $digit;</tag> <item repeat="1-"> plus <ruleref uri="#digit"/> <tag> $.total = $.total + $digit; </tag> </item> </rule> ECMAScript structure: calculator: {      total: 6      } What is 1+ 2+ 3? (c) 2007 Larson Technical Services

Exercise 6 Fill in the contents of <tag> From savings to checking Grammar rule: <rule id = “transfer"> from          <one-of>           <item> savings <tag>________________________ </tag> </item>           <item> checking <tag>________________________</tag>  </item>       </one-of> to <one-of>           <item> savings <tag>________________________</tag> </item>           <item> checking <tag>________________________</tag> </item>       </one-of> </rule> ECMAScript structure: transfer: {      source_account: "savings" target_account:  “checking"      } (c) 2007 Larson Technical Services

Outline Motivation for VoiceXML W3C Speech Interface Framework Languages Dialog—VoiceXML 2.0 Speech Synthesis—SSML Grammars—SRGS Semantic Interpretation—SI VoiceXML 2.1 (c) 2007 Larson Technical Services

VoiceXML 2.1 VoiceXML’s success and popularity resulted in many implementations early in the standardization process Additional, innovative features were conceived after VoiceXML 2.0 content was agreed Goals of VoiceXML 2.1: Ensure portability by specifying a set of commonly implemented extensions Backwards-compatible with VoiceXML 2.0 Follow a “fast track” to standardization (c) 2007 Larson Technical Services

VoiceXML 2.1 Standardized extensions: Locate barge-in occurrences within prompts Access recognition utterances for analysis Increase performance be reducing server round-trips Extended call transfer types (c) 2007 Larson Technical Services

Summary W3C Speech Interface Framework Can work together or separately Dialog—VoiceXML Grammar—SRGS Synthesis—SSML Semantic Interpretation—SI Call Control—CCXML Can work together or separately See http://www.w3.org/voice/ for details (c) 2007 Larson Technical Services

Industry Organizations World Wide Web Consortium http://www.w3.org W3C Voice Browser Working Group http://www.w3.org/voice/ W3C Multi-Modal Working Group http://www.w3.org/2002/mmi/ VoiceXML Forum http://www.voicexml.org SALT Forum: http://www.saltforum.org Speech Technology Magazine http://www.amcommexpos.com/ (c) 2007 Larson Technical Services

Books James A. Larson, VoiceXML—An Introduction to Developing Speech Applications, 2002, Upper Saddle River, NJ: Prentice Hall. Eve Astrid Andersson, et.al., Early Adopter Voice, 2001, Birmingham UK: Vrox. Bruce Balentine & David P. Morgan, How to Build a Speech Recognition Application: A Style Guide for Telephony Dialogues, 1999, San Ramon, CA: Enterprise Integration Group. Rick Beasley et. al., Voice Application Development with Voice, 2002, Indianapolis: Sams. Bob Edgar, The Voice Handbook, 2001, New York: CMP. Susan Weinschenk & Dean T. Barker, Designing Effective Speech Interfaces, 2000, New York: John Wiley & Sons. Chetan Sharma & Jeff Kunins, Voice: Strategies and Techniques for Effective Voice Application Development with Voice 2.0, 2002, New York: John Wiley. Michael H. Cohen, James P. Giangola, & Jennifer Balogh, Voice User Interface Design, 2004, Addison Wesley. (c) 2007 Larson Technical Services

Other Resources The VoiceXML Guide http://www.vxmlguide.com/ (c) 2007 Larson Technical Services

Tutorials and Articles VoiceXML Forum http://www.voicexmlforum.org/ VoiceXML Review http://www.voicexmlreview.org/ World of VoiceXML http://www.kenrehor.com/voicexml/ (c) 2007 Larson Technical Services

Online Voice SDKs Name URL BeVocal Cafe http://cafe.bevocal.com Tellme Studio http://studio.tellme.com VoiceGenie Developer Workshop http://developer.voicegenie.com Voxpilot voxbuilder http://www.voxbuilder.com   (c) 2007 Larson Technical Services

Questions? ? (c) 2007 Larson Technical Services

Thanks for your attention (c) 2007 Larson Technical Services

Answer to Exercise 2 <form> <prompt> When were you born? </prompt> <field name = "month"> <prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/> </field> <field name = "day"> <prompt> What day of the month? </prompt> <grammar src=“http://www.ajax.com/day.grxml"/> <field name = "year"> <prompt> What year </prompt> <grammar src=“http://www.ajax.com/year.grxml"/> </form> (c) 2007 Larson Technical Services

Answer to Exercise 3 Write event handlers for the month field <catch event = "nomatch"> <prompt> Which month, for example, January February, or March? </prompt> </catch> <catch event = "help"> <prompt> In what month were you born? </prompt> </catch> <catch event = "noinput"> <prompt> Say the name of the month you were born in </prompt> </catch> (c) 2007 Larson Technical Services

Answer to Exercise 4 <prompt> Welcome to Ajax Bank <break/> <emphasis level = "reduced " > do you want to </emphasis> <emphasis level = "strong"> withdraw </emphasis> or <emphasis level = "strong">deposit </emphasis> funds? </prompt> (c) 2007 Larson Technical Services

Answer to Exercise 5 Write a grammar for zero to nineteen <grammar type = "application/srgs+xml" root = "zero_to_19" mode = "voice"> <rule id = "zero_to_19">        <one-of>               <item> zero </item> <ruleref uri = "#single_digit"/>         <item> ten </item>               <item> eleven </item>                <item> twelve </item>                <item> thirteen </item>                <item> fourteen </item>                <item> fifteen </item>                <item> sixteen </item>                <item> seventeen </item>                <item> eighteen </item>                <item> nineteen </item>         </one-of> </rule>      <rule id = "single_digit">           <one-of>                <item> one </item>                <item> two </item>                <item> three </item>                <item> four </item>                <item> five </item>                <item> six </item>                <item> seven </item>                <item> eight </item>               <item> nine </item>          </one-of>      </rule> </grammar> (c) 2007 Larson Technical Services

From savings to checking Answer to Exercise 6 From savings to checking Grammar rule: <rule id = “transfer"> from          <one-of>           <item> savings <tag> out.source_account = “savings"; </tag> </item>           <item> checking <tag> out.source_account = “checking"; </tag> </item>       </one-of> to <one-of>           <item> savings  <tag> out.target_account = “savings"; </tag> </item>           <item> checking <tag> out.target_account = “checking"; </tag> </item>       </one-of> </rule> ECMAScript structure: transfer: {       source_account: "savings" target_account:  “checking"      } (c) 2007 Larson Technical Services