VoiceXML: Speech Recognition Grammars

Slides:



Advertisements
Similar presentations
23-Aug-14 HTML/XHTML Forms. 2 What are forms? is just another kind of XHTML/HTML tag Forms are used to create (rather primitive) GUIs on Web pages Usually.
Advertisements

VoiceXML: Application and Session variables, N- best and Multiple Interpretations.
Grammars.
Voice Guidelines 1© 2013 by Larson Technical Services.
Substitute FAQs SubFinder Overview. FAQs Do I have to have touch-tone service to use SubFinder? No, but you do need a telephone that can be switched from.
DTDs : definitions. Defining Elements PCDATA: Parsed character data i.e., any characters without further XML structure.
The State of the Art in VoiceXML Chetan Sharma, MS Graduate Student School of CSIS, Pace University.
Review Writing XML  Style  Common errors 1XML Technologies David Raponi.
VoiceXML and Internet Telephony Kundan Singh and Henning Schulzrinne Columbia University Joint work (in progress) with Daniel,
Introduction to VXML. What is VXML? Voice Extensible Markup Language Used in telephone-based speech applications voice browsing of the web.
CIS101 Introduction to Computing Week 11. Agenda Your questions Copy and Paste Assignment Practice Test JavaScript: Functions and Selection Lesson 06,
ITCS 6010 XML Grammars. What is a Grammar? Specifies what can be said—all the possible sentences and phrases that can be recognized Includes entry via.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Fundamentals of Python: From First Programs Through Data Structures
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Fundamentals of Python: First Programs
© 2007 Cisco Systems, Inc. All rights reserved.UCCXD v2.0—10-1 Configuring CME for CRS 5.0 & ASR Grammar.
ULI101 – XHTML Basics (Part II) What is Markup Language? XHTML vs. HTML General XHTML Rules Block Level XHTML Tags XHTML Validation.
Lesson 4: Using HTML5 Markup.  The distinguishing characteristics of HTML5 syntax  The new HTML5 sectioning elements  Adding support for HTML5 elements.
1 XML Schemas. 2 Useful Links Schema tutorial links:
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
Objectives: 1. Create a Skeleton HTML 2. View a Skeleton File Through a Server and Browser 3. Learn HTML Body Tags for the Display of Text and Graphics.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
VoiceXML: Forms, Menus, Grammars, Form Interpretation Algorithm.
SEG3210 DHTML Tutorial. DHTML DHTML is a combination of technologies used to create dynamic and interactive Web sites. –HTML - For creating text and image.
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
Integrating VoiceXML with SIP services
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Designing Interface Components. Components Navigation components - the user uses these components to give instructions. Input – Components that are used.
Instructors begin using McGraw-Hill’s Homework Manager by creating a unique class Web site in the system. The Class Homepage becomes the entry point for.
INTRODUCTION TO JAVASCRIPT AND DOM Internet Engineering Spring 2012.
CMPS 211 JavaScript Topic 1 JavaScript Syntax. 2Outline Goals and Objectives Goals and Objectives Chapter Headlines Chapter Headlines Introduction Introduction.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
JSTL, XML and XSLT An introduction to JSP Standard Tag Library and XML/XSLT transformation for Web layout.
XP Tutorial 9 1 Working with XHTML. XP SGML 2 Standard Generalized Markup Language (SGML) A standard for specifying markup languages. Large, complex standard.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.
E-commerce Lecture 3 Ravi Raman CERC, West Virginia University.
Moderate Problem. Problem  Write a function to swap a number in place without temporary variables.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
XML – Part III. The Element … This type of element either has the element content or the mixed content (child element and data) The attributes of the.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
Web Development 101 Presented by John Valance
Cascading Style Sheets CSS. Source W3Schools
© 2013 by Larson Technical Services
Student Pages
Creating User Interfaces Another example. Classwork/homework: work on VoiceXML project.
Listener-Control Navigation of VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better.
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name value pair;
Creating interfaces XML & XSL review VoiceXML: grammar Homework: postings, presentation, study guide.
VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better than web.
XML Schema – XSLT Week 8 Web site:
CH 15 XSL Transformations 1. Objective What is XSL? Overview of XSL transformations Understanding XSL templates Computing the value of a node with xsl:value-of.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
This activity has audio.
VoiceXML Tutorial: Part 1 Introduction and User Interaction with DTMF
Business rules.
Telling Time.
Primary Longman Elect 2B Chapter 4 Telling the time.
Specifying, Compiling, and Testing Grammars
WEB PROGRAMMING JavaScript.
TO PAST TELLING THE TIME.
+/- Numbers Year 6 – Place value, rounding and mental methods
Presentation transcript:

VoiceXML: Speech Recognition Grammars

Acknowledgements Prof. Mctear, Natural Language Processing, http://www.infj.ulst.ac.uk/nlp/index.html, University of Ulster. Bevocal documentation

Overview Types of grammar Grammar design and use Optional items in a grammar Semantic tags DTMF grammars Grammar rules Built-in grammars Grammar scope

What is a grammar A grammar defines the words and patterns of words that a user can say at any particular point in a dialogue Uses: speech recognition: to constrain the speech recognition process by specifying permissible sequences of words language understanding: to determine the structure and/or meaning of a sequence of words e.g. Transfer one hundred dollars from my checking to my savings account might be parsed and transformed into the structure: <transfer> <command> transfer </command> <destination> savings </destination> <source> checking </source> <amount> 100 </amount> </transfer>

Types of grammar Finite-state and phrase structure take the form of rules with a left-hand and right-hand side e.g. noun_phrase -> determiner adjective noun flight -> <destination> <date> <time> used in language understanding and speech recognition N-gram (used in speech recognition) based on probabilities of word combinations e.g. bigrams, trigrams

Grammar in VoiceXML May be specified Grammar formats – Inline i.e. embedded into a VoiceXML page – External i.e. stored as files on Web servers, etc. Grammar formats XML, ABNF (Augmented BNF syntax), Java Speech Grammar format (JSGF), GSL (Nuance’s Grammar Specification language) W3C specification embodies XML and ABNF IBM Voice Toolkit supports the XML and ABNF grammar formats Bevocal Café, Voxpilot and Tellme support the XML and GSL grammar formats For further details on the W3C Speech Recognition Grammar Specification, see http://www.w3.org/TR/speech-grammar/

Inline and External Grammar Definitions An inline grammar is defined within the <grammar> element in a VoiceXML document. In an inline grammar, if the grammar consists of exactly 1 rule, that rule does not have to have a name. GSL grammars use special characters: wrap your inline grammar as a section of CDATA: <grammar ...usage attributes...> <![CDATA[ ...grammar header... ...grammar rule definitions... ]]> </grammar> An external grammar is defined in an external file and referenced in the VoiceXML document In an external grammar document, all rules must be named In external GSL grammar file, the contents of that file should not be inside a CDATA section and should not contain a <grammar> element. : ;GSL2.0 ...grammar rule definitions...

<option> element Specifies a set of possible responses for a field If the number of possible responses is small, then a set of <option> elements can be used instead of a <grammar> element <form> <field name=“choice">           <prompt>              Say students, courses, or reports </prompt>            <option>students</option>            <option>courses</option>            <option>reports</option> </field> </form> <option> can also be used for alternative DTMF input e.g. <option dtmf = “1” value = “balance” > balance </option>

Grammar Design A grammar should cover all the ways that a user might say something Alternative choices within a category e.g. studentname [john rosemary etc] Alternative words for the same concept e.g. [comms communications] Alternative sentences that have the same meaning e.g. [(student john scott taking databases) (databases john scott) (john scott taking the course databases)] Note: careful wording of prompts can constrain the user to saying what has been predicted by the grammar designer These examples use the GSL grammar format, which is more suitable than the XML format for the presentation of examples

Grammars for words Simple words (or touch-tone strings): tokens GSL <grammar type = …> (student name) </grammar> XML <grammar> <token>student name</token> </grammar> GSL Choice[ students courses reports] XML <rule id = “choice" > <one-of> <item> students <item> <item> courses <item> <item> reports </item> </one-of> </rule> Alternative words

Making items optional GSL Name (?firstname lastname) XML <rule id=“name> <item repeat=“0-1” firstname </item> <item> lastname </item> </rule>

Making items optional-2 ( [ news weather sports ] ?please ) ( ?[ (i'd like) (tell me) ] ?the [ news weather sports ] ?please )

Repeating items XML: repeat = "0-1" means the item is optional i.e. zero or one time repeat = "n-” means the item is repeated n or more times e.g. “0-” = zero or more times repeat = "m-n" means the item re repeated between m and n times (inclusive) e.g. “1-3” = between one and three times repeat = "n" means the item is repeated exactly n times GSL: +(item) - the item is repeated 1 or more times *(item) - the item is repeated 0 or more times ?(item) – the item is optional

Grammar Slots (Tags) GSL: <field name = MainMenu> <filled> Grammar slots are used in grammars to return a value representing the meaning of the word(s) recognised e.g. ‘checking account’ and ‘checking’ should return the same value. GSL: <field name = MainMenu> … <![CDATA [ ( ?[ (i'd like) (tell me) ] ?the [ (news  ?reports)   { <selection news> } (weather  ?[info information]) { <selection weather> } (sports  ?[updates news]) { <selection sports> } ] ?please ) ] ]> <filled> <assign name=“selected" expr=“MainMenu.selection"/>

Grammar rules: sentences Grammars often consist of sub-grammars e.g. ;GSL 2.0; ColoredOjbect:public (Color Object) Color [ [red pink] { <color red> } [yellow canary] { <color yellow> } [green khaki] { <color green> } ] Object [ [truck car] { <object vehicle> } [ball block] { <object toy> } [shirt blouse] { <object clothing> } "yellow shirt" "canary blouse"=> { color: yellow; object: clothing; } Colored Object Object Color

Grammar with sub-rules Sub-grammars and rules are referenced in XML form using a rule reference. A rule reference can point to a local grammar, or an external grammar rule contained in another file or even on another server on the Internet. Design of a grammar consisting of sub-grammars requires considerable planning to ensure that all possible utterances are covered and also to avoid redundancies as well as repetitions in the grammar. It is often useful to map out the grammar diagrammatically or using a simple format such as GSL or ABNF before attempting to code the rules in XML format.

Rule Scope - GSL Each defined rule has a scope of either private or public.  A rule with public scope is visible outside its grammar and can be referenced by name from other grammars can be activated for recognition (can serve as a top-level rule)  A rule with private scope is visible only within its containing grammar may be referenced only by other rules within the same grammar. To mark a rule as public, the format is: RuleName:public ruleExpansion If no rules in the grammar are explicitly marked with :public, then all rules in the grammar are public. If any rule in the grammar is marked with :public, then all public rules must be so marked. The root rule in a GSL grammar is always the first public rule. For example, the following set of definitions creates one public rule named Snapper and two private rules named SnapperType and FishColors: SnapperType [mutton FishColors] FishColors [black gray red] Snapper:public (SnapperType snapper)

Rule scope - XML <one-of> <item> john </item> By default, VoiceXML 2.0 grammar rules are “private”. This means that the rules can only be referenced within the same grammar file. To allow a grammar rule to be referenced from an external source, such as a VoiceXML document or another grammar, the rule needs to be scoped as public using the scope attribute <rule id = “choice” scope = “public” > <ruleref uri="#studentname"/> </rule> <rule id = “studentname"> <one-of> <item> john </item> <item> rosemary </item> </one-of> Can be referenced from outside grammar References a rule in same grammar Not public, can only be referenced by a rule in same grammar

Grammar Headers - GSL Inline External: <grammar type="application/x-nuance-gsl"> External: ;GSL2.0 ...grammar rule definitions... No definition of top-level rule Referencing an external grammar or a top level rule in a grammar: <grammar src="foo.gsl"> <grammar src="foo.gsl#Month">

Grammar Headers - XML Inline <grammar type="application/srgs+xml" root="source“ version=“1.0”> <!– grammar rule(s) -> </grammar> External <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN" "http://www.w3.org/TR/speech-grammar/grammar.dtd"> <grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0" mode="voice" root=“transfer“> Note: the root node for the grammar must be defined

Grammar Scope Grammar elements can be included within any VoiceXML element that receives user input field link: for transitions to other documents e.g. operator.vxml menu: grammar implicitly specified by the <choice> element form: for mixed-initiative dialogues by default the scope of a grammar is limited to the elements in which it is defined scope can be set using the scope attribute e.g. grammars defined within forms or menus can be given document scope grammars defined in the root document scope to the entire application Session – specified by platform developers e.g. universal ‘help’ Application – specified as <grammar> element within <link> element in a root document, active in root and child documents Document – specified by a <grammar> element  with scope="document"  within a <form> element in the document, a <link> element within the document's <vxml> element.   Dialogs (Form or Menu) – specified by a  grammar with scope="dialog" within a <form> element.  (The default is scope = "dialog" if no scope is specified.) a <link> element in the <form> element. Input Items (within a Form Item) e.g. <field> grammar

Using Grammar Effectively A grammar should cover effectively the range of responses that can be encountered to a prompt this can include the essential input as well as extraneous words and phrases a grammar that is too large will hinder speech processing and lead potentially to more misrecognitions scope is important: grammars should not overlap excessive use of global grammars (defined in the root document) can increase the possibility of overlapping

Tutorial Exercise 1. Using tags Integrate the following rule and its grammar into an application that takes in the name of a student and the name of a course and outputs the student's name along with a course code. <rule id="rule2" scope="public"> <one-of> <item> <item> comms </item> <item> communications </item> </one-of> <tag>$="01"</tag> </item> <item> algorithms <tag>$="02"</tag></item> <item> programming <tag>$="03"</tag></item> <item> databases <tag>$="04"</tag></item> </rule>

DTMF DTMF (touch-tone) can be used as an alternative to speech input, particularly when speech recognition is unreliable or problematic. In VoiceXML 2.0 dtmf is included as a value of the mode attribute in the <grammar> element <grammar mode="dtmf" type="application/srgs+xml" version= "1.0" root="digit"> <rule id = "digit" scope = "public"> <one-of> <item> 1 <tag>$= “students" </tag> </item> <item> 2 <tag>$= “courses" </tag></item> <item> 3 <tag>$= “reports" </tag> </item> </one-of> </rule> </grammar>

DTMF and / or speech in GSL Rating( ?[(i feel ?like) (it is ?a) (its ?a)] [ [one dtmf-1] { <numRating 1> } [two dtmf-2] { <numRating 2> } [three dtmf-3] { <numRating 3> } …. ]

DTMF after counts Prompt counts can be used, e.g. to give the user an opportunity to choose using speech, then advise use of keypad if speech is unsuccessful <nomatch count="1"> <reprompt/> </nomatch> <nomatch count="2"> please use your keypad

Tutorial Exercise 2: DTMF and speech Create a file with choices (student details | course details | reports) that allows speech as well as DTMF input Include a nomatch (or noinput) event that asks the user to use the keypad on the second time that speech input is unsuccessful. The system should confirm with words rather than DTMF <grammar mode="dtmf" type="application/srgs+xml" version= "1.0" root="digit"> <rule id = "digit" scope = "public"> <one-of> <item> 1 <tag>$= "student details" </tag> </item> <grammar type="application/srgs+xml" root="choice" version="1.0"> <rule id = "choice" scope = "public"> <item> student details <tag>$= "student details" </tag> </item>

Built-In Grammars Built-in grammars are provided in VoiceXML boolean (true or false: in DTMF 1 is true, 2 is false) date digits (e.g. “three four seven”) currency number (e.g. “three hundred and forty seven”) phone time specifying within the <field> element <field name = “age” type = “number”>

Built-In Grammar: Digits Digit recognition is performed in VoiceXML by using a built-in grammar for digits that is declared as a field type. For example: <field name=“pin" type ="digits"> The user can say one or more digits between 0 and 9 and the result will be a string of digits. If the field value is used in a prompt, it will be spoken as a sequence of digits e.g. “one five six four”. You can also parameterise the digit built-in grammar as follows: digits?minlength=n - a string of at least n digits digits?maxlength=n - a string of at most n digits digits? length=n - a string of exactly n digits e.g. <field type="digits?minlength=3;maxlength=5“>

Digits grammar example <form> <field name=“pin" type="digits?length=4"> <prompt>what is your pin?</prompt> </field> <block> <prompt> Confirming your pin is <say-as interpret-as=“vxml:digits"> <value expr=“pin"/></say-as> </prompt> </block> </form>

Built-in grammar: boolean The boolean grammar contains ways of saying ‘yes’ or ‘no’ The particular words within the boolean grammar are dependent on the ‘locale’ i.e. the language type e.g. US English, UK English, etc. The words may also vary from one platform to another IBM Voice Toolkit UK English: yes, true, positive, right, ok, sure, affirmative, check, yep, correct, no, false, negative, wrong,not, nope, incorrect The return value sent is a boolean true or false. If the field name is subsequently used in a value element within a prompt, the TTS engine will speak either yes or no. Users can also provide DTMF input: 1 is yes, and 2 is no.

Boolean grammar example <form scope="dialog"> <field name=“pin" type="digits?length=4" modal="false"> <prompt version="1.0"> what is your pin? </prompt> </field> <field name="confirm" type="boolean" modal="false"> Please confirm your pin is <say-as interpret-as=“vxml:digits"><value expr=“pin"/></say-as> </form>

Sample input for built-in field types currency three twenty five sixteen dollars and fifty seven cents ten dollars nine million two hundred thousand dollars date may fifth march the thirty first of december two thousand yesterday today tomorrow phone seven three five eight four nine zero two one two four nine six two seven oh six

Sample input for UK English built-in field types (continued) number ten million five hundred thousand and fifty three minus one point five plus one point five point seven digits zero, oh, one, two, three, four , five, six, seven, eight, nine time one o’clock five past one three fifteen seven thirty half past eight oh four hundred hours sixteen fifty twelve noon midnight

Tutorial Exercise 3. Built-in grammars Aim: to include built-in grammars Create an application in which the user has to speak their account number, which consists of 6 digits (use built-in digit grammar). Extend the application with other built-in grammars, such as date. Experiment with the use of the DTMF simulator to enter the values for account number, date, etc.