Introduction to VoiceXML 2.0 Rob Marchand Director of Product Management VoiceGenie Technologies Inc.

Slides:

Advertisements

Similar presentations

1 Open Source Grammars David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair)

Advertisements

VoiceXML: A Field Evaluation By: Kristy Bradnum Supervisor: Peter Clayton Presented in partial fulfilment of the CS Honours Project.

Automatic Switchboard Operator Luboš Šmídl, Tomáš Valenta Department of Cybernetics Faculty of Applied Sciences University of West Bohemia in Pilsen.

Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.

Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.

Collaborative Customer Relationship Management (CCRM) User Group June 23 rd, 2004.

Multiplication Facts 9 through x 5= 50 Number One.

Voice Guidelines 1© 2013 by Larson Technical Services.

Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.

Project 1 Introduction to HTML.

The State of the Art in VoiceXML Chetan Sharma, MS Graduate Student School of CSIS, Pace University.

Pace VoiceXML Absentee System Paul Visokey, Ping Gallivan, Yani Mulyani, Lisa Jordan, Elaine Li, George Mathew, Qisheng Hong Presenter Name : Paul Visokey.

VoiceXML and Internet Telephony Kundan Singh and Henning Schulzrinne Columbia University Joint work (in progress) with Daniel,

CIS101 Introduction to Computing Week 05. Agenda Your questions CIS101 Survey Introduction to the Internet & HTML Online HTML Resources Using the HTML.

Multimodal Architecture for Integrating Voice and Ink XML Formats Under the guidance of Dr. Charles Tappert By Darshan Desai, Shobhana Misra, Yani Mulyani,

Thomas Kisner.  Unified Communications Architect at BNSF Railway  Board Member, DFW Unified Communications User Group ◦ Meets 4 th Thursday of Every.

Upgrading to XHTML DECO 3001 Tutorial 1 – Part 1 Presented by Ji Soo Yoon 19 February 2004 Slides adopted from

VoiceXML Basic COCOMO Calculator By Greg Kutcher.

1st Project Introduction to HTML.

Find The Better Way Expand Your Voice with VXML May 10 th, 2005.

Exploring Microsoft® Office Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Robert Grauer and Maryann Barber Using.

Chapter ONE Introduction to HTML.

Should Intelligent Agents Listen and Speak to Us? James A. Larson Larson Technical Services

VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.

VoiceXML: Speech Recognition Grammars

Conversational Applications Workshop Introduction Jim Larson.

ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.

VoiceXML: Forms, Menus, Grammars, Form Interpretation Algorithm.

Integrating VoiceXML with SIP services

Section 17.1 Add an audio file using HTML Create a form using HTML Add text boxes using HTML Add radio buttons and check boxes using HTML Add a pull-down.

System Analysis and Design

XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.

CHAPTER TEN AUTHORING.

Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University.

The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©

WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.

Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.

Voice User Interface

Copyright (c) 2004 Prentice-Hall. All rights reserved. 1 Committed to Shaping the Next Generation of IT Experts. Creating XHTML Documents Essentials for.

Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.

(c) 2007 Larson Technical Services1 VoiceXML Overview James A. Larson Intel Corporation

© 2013 by Larson Technical Services

© 2013 by Larson Technical Services

Intermediate 2 Computing Unit 2 - Software Development.

Creating User Interfaces Another example. Classwork/homework: work on VoiceXML project.

Listener-Control Navigation of VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better.

WDMD 170 – UW Stevens Point 1 WDMD 170 Internet Languages eLesson: Variables, Functions and Events (NON-Audio version) © Dr. David C. Gibbs WDMD.

VoiceXML Version 2.0 Jon Pitcherella. What is it? A W3C standard for specifying interactive voice dialogues. Uses a “voice” browser to interpret documents,

HTML Concepts and Techniques Fifth Edition Chapter 1 Introduction to HTML.

Oman College of Management and Technology Course – MM Topic 7 Production and Distribution of Multimedia Titles CS/MIS Department.

Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:

Creating User Interfaces VoiceXML. Examples. Classwork/Homework: Make proposal and start work on your VoiceXML project.

W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.

Presented By Sharmin Sirajudeen S7 CS Reg No :

HTML PROJECT #1 Project 1 Introduction to HTML. HTML Project 1: Introduction to HTML 2 Project Objectives 1.Describe the Internet and its associated key.

VoiceXML Tutorial: Part 1 Introduction and User Interaction with DTMF

Project 1 Introduction to HTML.

James A. Larson Intel Corporation

We count one, two, three….

Specifying, Compiling, and Testing Grammars

SALT & The Microsoft Speech Application SDK

May 25, Week Thirty-Two May 18, Week Thirty-One.

March 10, Week Twenty-Two March 3, Week Twenty-One

AJAX Impact on Telecom It’s not just for web sites anymore.

September 14, Week Two September 7, Week One.

MATHS TIME! nine ten eleven fifteen eight

September 14, Week Two September 7, Week One.

VoiceXML An investigation Author: Mya Anderson

Presentation transcript:

Introduction to VoiceXML 2.0 Rob Marchand Director of Product Management VoiceGenie Technologies Inc.

Introduction to VoiceXML Audience o Managers and programmers with little experience with VoiceXML Attendees will learn o The basic principles of VoiceXML, o Just enough syntax to design and code simple speech applications requiring voice menus and voice forms.

VoiceXML in the Marketplace VoiceXML 2.0 is now ratified as a Recommendation (e.g., official standard) by the W3C Hundreds of millions of VoiceXML calls are answered every day VoiceXML is the standard for building speech-enabled applications

W3C and VoiceXML Forum W3C manages the technical evolution and development of the VoiceXML language VoiceXML Forum focuses on providing best practices, certification testing, resources and tools Together the W3C and VoiceXML Forum accelerate the adoption of VoiceXML-based speech applications

Outline Motivation for VoiceXML W3C Speech Interface Framework Languages Dialog—VoiceXML 2.0 Speech Synthesis—SSML Grammars—SRGS Semantic Interpretation—SI Call Control

Motivation for Speech Applications Users access Web sites from any telephone, anywhere, any time. Speaking and listening are the natural usage modes for phones.

Speech-enabled Applications Are Possible Now Increased computing power at less expense o Due to improved chip design and manufacturing techniques Improved speech recognition o Due to refinements to basic speech recognition algorithms Improved dialog design using voice o Minimizes the number of words and phrases that the speech recognizer must process at any point during the dialog

Strength of VoiceXML Applications Traditional system-directed dialogs for novice users Mixed initiative dialogs for experienced users Novice users smoothly become experienced users at their own pace

Limitations of VoiceXML Applications No special analysis of speech input o Not suitable for training speech skills—Reading, ESL, singing, etc. VUI conversational bandwidth is slower than GUI conversational bandwidth o Using a VUI is like drinking from Lake Superior with a straw

Exercise 1 Name or describe a speech application you could use at work. Name or describe a speech application you or family member can use at home.

XML o XML = eXtensible Markup Language o Elements are surrounded by tags Welcome to the voice system o Elements may be nested Welcome to Ajax Travel we have the cheapest fares o Elements may have attributes o Because “ ”, and “&” have special meanings “<” in place of “<” “>” in place of “>” “&” in place of “&”.

Outline Motivation for VoiceXML W3C Speech Interface Framework Languages Dialog—VoiceXML 2.0 Speech Synthesis—SSML Grammars—SRGS Semantic Interpretation—SI Call Control

DB Multimedia Files Audio Files Web Server HTML Scripts VoiceXML Scripts Grammars Speech Server/Gateway Web Browser Capture Voice ASR DTMF Replay Audio TTS Database Server Voice Browser Documents

W3C Speech Interface Framework Speech Synthesis GrammarOther VoiceXML 2.0 Call Control Semantic Interpretation

Status of W3C Speech Interface Languages Call Control Semantic Interpret- ration Recommendation Proposed Recommendation Candidate Recommendation Last Call Working Draft Requirements Working Draft V 3 Synthesis Grammar VoiceXML 2.0 VoiceXML 2.1 PLS

Outline Motivation for VoiceXML W3C Speech Interface Framework Languages Dialog—VoiceXML 2.0 Speech Synthesis—SSML Grammars—SRGS Semantic Interpretation—SI Call Control

VoiceXML 2.0 Fragment … Which account savings or checking savings checking CD certificate of deposit $ = “CD” …. … Dialog Language (VoiceXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Speci

VoiceXML 2.0 Fragment … Which account savings or checking savings checking CD certificate of deposit $ = “CD” …. … Dialog Language (VoiceXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification erpretation (SI)

VoiceXML 2.0 Fragment … Which account savings or checking savings checking CD certificate of deposit $ = “CD” …. … Dialog Language (VoiceXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI)

VoiceXML 2.0 Fragment … Which account savings or checking savings checking CD certificate of deposit $ = “CD” …. … Dialog Language (VoiceXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI)

VoiceXML 2.0 features Menus, forms, sub-dialogs o,, Inputs o Speech recognition o Recording o Keypad Output o Audio files o Text-to-speech Variables o Events –,,,, Transition and submission –, –Telephony –Connection control –, –Telephony information –Platform –Objects –Performance –Fetch

A Typical Voice Menu Do you want to listen, next, prior, buy, or exit? listen next prior buy exit Exercise 2: Write a menu that asks the user a “yes/no” question to confirm that the user wants to buy the audio “three blind mice

Answer to Exercise 2 A “yes/no” menu Do you want to buy three blind mice now? yes no

Typical Form Fill-In Welcome to the electronic payment system. Please enter your credit card number? Please enter your expiration date Exercise 3: Write a form that solicits the month, day, and year for the user’s birth date.

Answer to Exercise 3 When were you born? What month? What day of the month? What year

Event Handlers Deal with exceptional or error conditions Control mechanism for dialog turn retries o … o o … Shorthand notation available o …, etc. Scoped according to where they occur o,, etc.

Adding Event Handlers When were you born? ….. ….. What month? …..

Adding Event Handlers When were you born? ….. ….. What month? …..

Adding Event Handlers When were you born? ….. ….. What month? …..

Default Event Handlers Sorry, no help is available. I did not understand, please try again I did not hear anything, please speak again

Exercise 4 Write event handlers for the month field ____________________ __________________________ ___________________________________

Answer to Exercise 4 Write event handlers for the month field Which month, for example, January February, or March? Say the name of the month you were born in In what month were you born?

Outline Motivation for VoiceXML W3C Speech Interface Framework Languages Dialog—VoiceXML 2.0 Speech Synthesis—SSML Grammars—SRGS Semantic Interpretation—SI Call Control

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: p, s Non-markup behavior: infer structure by automated text analysis

Before and after Structure Analysis Before structure analysis o Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass. After structure analysis He plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass. Dr. Smith lives at 214 Elm Dr. He weights 214 lb.

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs Markup support: p, s Non-markup behavior: infer structure by automated text analysis

After Text Normalization Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass.

Dr. Smith lives at 214 Elm Dr. He weighs 214 lb. He plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass.

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs Markup support: p, s Non-markup behavior: infer structure by automated text analysis

After text-to-phoneme conversion Dr. Smith lives at 214 Elm Dr. He weighs 214 lb. He plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass.

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs Markup support: p, s Non-markup behavior: infer structure by automated text analysis

Prosody Analysis (Initial text) Environmental control menu. Do you want to adjust the lighting or temperature?

Prosody Analysis (Add pause at phrase boundaries) Environmental control menu Do you want to adjust the lighting or temperature?

Prosody analysis (De-emphasize familiar words) Environmental control menu Do you want to adjust the lighting or temperature?

Prosody Analysis (pause to let the listener catch up) Environmental control menu do you want to adjust the lighting or temperature?

Prosody Analysis (Add emphasis to focus listener’s attention) Environmental control menu do you want to adjust the lighting or temperature?

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: voice, audio* Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs Markup support: paragraph, sentence Non-markup behavior: infer structure by automated text analysis *audio icons, branding, advertising

Waveform Production Environmental control menu. Do you want to adjust the lighting or temperature

Exercise 5 ( insert SSML commands ) Welcome to Ajax Bank do you want to withdraw or deposit funds?

Answer to Exercise 5 Welcome to Ajax Bank do you want to withdraw or deposit funds?

Outline Motivation for VoiceXML W3C Speech Interface Framework Languages Dialog—VoiceXML 2.0 Speech Synthesis—SSML Grammars—SRGS Semantic Interpretation—SI Call Control

Grammars Describe what the user may say at a point in the dialog Enable the speech recognition engine to work faster and more accurately Consist of one or more “rules”

Example Grammar zero ten one two three four five six seven eight nine XML form of grammars

Example Grammar zero ten one two three four five six seven eight nine Rule describing single digits Rule describing digits zero through ten

Example Grammar zero ten one two three four five six seven eight nine Grammar processor should start with the “zero_to_ten” rule

Example Grammar zero ten one two three four five six seven eight nine This is a grammar used by the speech recognizer. (There may also be grammars for DTMF recognizers.)

Example Grammar zero ten one two three four five six seven eight nine describes alternatives

Example Grammar zero ten one two three four five six seven eight nine Rule element references another rule

Example Grammar zero ten one two three four five six seven eight nine Exercise 6: Write a grammar for that recognizes the digits zero to nineteen

Answer to Exercise 6 Write a grammar for zero to nineteen zero ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen one two three four five six seven eight nine

More Grammar Elements Repeat and optional very good Sequence Twenty Garbage James Lewis

Exercise 7 Write a grammar for that recognizes the digits zero to thirty-nine

Answer to Exercise 7 Write a grammar for zero to thirty-nine zero twenty twenty thirty thirty one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen

Reusing existing grammars <grammar type = "application/srgs+xml" root = "size” src = “

Outline Motivation for VoiceXML W3C Speech Interface Framework Languages Dialog—VoiceXML 2.0 Speech Synthesis—SSML Grammars—SRGS Semantic Interpretation—SI Call Control

Semantic Interpretation To create smart voice user interfaces, we need to extract the semantic information from speech utterances Example: o Utterance: “I want to fly from Dublin to Paris” o Semantic Interpretation: { origin: “Dublin” destination: “Paris” }

Semantic Interpretation ASR Grammar with Semantic Interpretation Scripts Semantic Interpretation Processor VoiceXML Interpreter Application text ECMAScript object fourteen

Semantic Interpretation ASR Grammar with Semantic Interpretation Scripts VoiceXML Interpreter Application text fourteen ECMAScript object Semantic Interpretation Processor

Semantic Interpretation ASR Grammar with Semantic Interpretation Scripts VoiceXML Interpreter Application text fourteen fourteen $.quantity=“14”; ECMAScript object Semantic Interpretation Processor

Semantic Interpretation ASR Grammar with Semantic Interpretation Scripts VoiceXML Interpreter Application text fourteen { quantity: “14” } fourteen $.quantity=“14”; ECMAScript object Semantic Interpretation Processor

Semantic Interpretation ASR Grammar with Semantic Interpretation Scripts VoiceXML Interpreter Application text fourteen quantity = “14” fourteen { quantity: “14” } fourteen $.quantity=“14”; ECMAScript object Semantic Interpretation Processor

Semantic Interpretation Semantic Interpretation defines the content of s in SRGS grammars Two kinds of syntax for contents: o Semantic Literals (literal values) o Semantic Scripts (ECMAScript)

Semantic Interpretation Semantic Literals example: coca cola coke cola coke black fizzy stuff coke coke

Semantic Interpretation Semantic Literals example: coca cola coke cola coke black fizzy stuff coke coke Default Assignment

Semantic Interpretation Semantic Scripts employ ECMAScript Advantages: Richer structure (objects) Ability to perform computations

Semantic Interpretation Example grammar rule with Script Syntax: small $.size = "small"; medium $.size = "medium"; large $.size = “large"; green $.color = "green"; blue $.color = "blue"; white $.color = "white"; ECMAScript structure: action: { size: "large" color: "white" } Large white

Semantic Interpretation Example grammar rule with Script Syntax: What is $.total = $digit; plus $.total = $.total + $digit; ECMAScript structure: calculator: { total: 6 } What is ?

Exercise 8 Fill in the contents of Grammar rule: from savings ________________________ checking ________________________ to savings ________________________ checking ________________________ ECMAScript structure: transfer: { source_account: "savings" target_account: “checking" } From savings to checking

Answer to Exercise 8 From savings to checking Grammar rule: from savings $.source_account = “savings"; checking $.source_account = “checking"; to savings $.target_account = “savings"; checking $.target_account = “checking"; ECMAScript structure: transfer: { source_account: "savings" target_account: “checking" }

Outline Motivation for VoiceXML W3C Speech Interface Framework Languages Dialog—VoiceXML 2.0 Speech Synthesis—SSML Grammars—SRGS Semantic Interpretation—SI Call Control

CCXML Provides call control support for VoiceXML and other dialog languages Separate interpreter from VoiceXML o Lives on its own thread o Handles asynchronous events May be used to create standalone applications Replaces and currently in VoiceXML 2.0 (or provides the underlying support for them)

CCXML VoiceXML

CCXML VoiceXML + CCXML

CCXML Features o Multi-party conferencing (human and machine) o Sophisticated multi-call handling and control o Support for async external messages and events o More sophisticated call control than VoiceXML o Call control protocol independence Goal to support very high density and performance

CCXML

CCXML

CCXML

CCXML

CCXML

CCXML A new participant has entered the conference.

Exercise 9 Announce when a caller leaves

Answer to Exercise 9

A participant has left the conference. Answer to Exercise 9

Example Applications with CCXML-VoiceXML Alerts o Stock value changes, order is available, flight is delayed, road closure, school closure Conference o Add additional person to the conference o Whisper o Eject Find me o Try alternative telephone numbers Instant messaging o Notify me when John calls in to access his Control home applications o Turn on/off coffee pot, oven, air conditioner, lights, arm/disarm the security system Call Center/Customer Care Applications

VoiceXML 2.1 VoiceXML’s success and popularity resulted in many implementations early in the standardization process Additional, innovative features were conceived after VoiceXML 2.0 content was agreed Goals of VoiceXML 2.1: o Ensure portability by specifying a set of commonly implemented extensions o Backwards-compatible with VoiceXML 2.0 o Follow a “fast track” to standardization

VoiceXML 2.1 Standardized extensions: o Locate barge-in occurrences within prompts o Interact directly with XML-based infrastructure o Access recognition utterances for analysis o Increase performance be reducing server round-trips o Extended call transfer types

Summary W3C Speech Interface Framework o Dialog—VoiceXML o Grammar—SRGS o Synthesis—SSML o Semantic Interpretation—SI o Call Control—CCXML Can work together or separately See for detailshttp://

Resources

Industry Organizations World Wide Web Consortium o W3C Voice Browser Working Group o W3C Multi-Modal Working Group o VoiceXML Forum o SALT Forum: o Speech Technology Magazine o

Books James A. Larson, VoiceXML—An Introduction to Developing Speech Applications, 2002, Upper Saddle River, NJ: Prentice Hall. Eve Astrid Andersson, et.al., Early Adopter Voice, 2001, Birmingham UK: Vrox. Bruce Balentine & David P. Morgan, How to Build a Speech Recognition Application: A Style Guide for Telephony Dialogues, 1999, San Ramon, CA: Enterprise Integration Group. Rick Beasley et. al., Voice Application Development with Voice, 2002, Indianapolis: Sams. Bob Edgar, The Voice Handbook, 2001, New York: CMP. Susan Weinschenk & Dean T. Barker, Designing Effective Speech Interfaces, 2000, New York: John Wiley & Sons. Chetan Sharma & Jeff Kunins, Voice: Strategies and Techniques for Effective Voice Application Development with Voice 2.0, 2002, New York: John Wiley. Michael H. Cohen, James P. Giangola, & Jennifer Balogh, Voice User Interface Design, 2004, Addison Wesley.

Tutorials and Articles VoiceXML Forum o VoiceXML Review o World of VoiceXML o

Online Voice SDKs NameURL BeVocal Cafehttp://cafe.bevocal.com Hey Anita FreeSpeechhttp:// Tellme Studiohttp://studio.tellme.com VoiceGenie Developer Workshop Voxeo Communityhttp:// Voxpilot voxbuilderhttp://

Downloadable Voice Interpreters NameURL IBM WebSphere Voice Server SDK

Public VoiceXML Interpreters InterpreterSourceURL OpenVXI - VoiceXML Interpreter Carnegie-Melon University Department of Computer Science Speech Group s.cmu.edu/openvxi/i ndex.html PublicVoiceXML - VoiceXML platform Public Voice Lab Vienna, Austria cexml.org/

Introduction to VoiceXML Questions?