Listener Controlled Navigation of VoiceXML Documents Gopal Gupta N. Annamalai, H. Reddy Dept. of Computer Science UT Dallas
VoiceWEB VoiceXML: The open-standard language for serving voice/audio documents Voice/audio documents can be browsed using a voice browser with speaker & microphone or using the regular phone Voice browser:VoiceXML:: Browser:HTML Voice browser:VoiceXML:: Browser:HTML
VoiceXML (Cont’d) VoiceXML allows scripts/CGIs etc. Can take input from the listener via speech (fill out forms like in HTML) Used extensively for automated call handling. Makes info. accessible over (cell)phones. The next revolution on the WEB.
Systems developed by our LAB Our Lab (ALPS lab, UTD) has developed two systems to automatically convert HTML to Dynamic VoiceXML. They are: A.HTML to VoiceXML Transcoder (in Java, Initial Prototype in Prolog) B.Dynamic VoiceXML Generator (using dynamic SRGS Grammar)
HTML to VoiceXML Transcoder HTML file cannot be converted in a tag-by-tag basis or sentence-by-sentence basis. The structure of the HTML file should be transported to the VoiceXML file. HTML file is parsed and the root node of the input file is obtained. Any HTML file’s root node would be the node Transcoding is done in two phases. i. i.HTML file is parsed into a Document tree ii. ii.Nodes of the tree are converted to VoiceXML using a mapping function
HTML Parsing Sample Example 1 Hello World Input HTML file (htmlRoot = new RootNode()).addNode(new PageNode().addNode(new HeadNode().addNode(new TitleNode().addNode(new StringNode().setHtmlData(“Example1”)) ) //end TitleNode ) //end HeadNode.addNode(new BodyNode().addNode(new H1Node().setAlign(``center’’).addNode(new StringNode().setHtmlData( ``Hello World ‘’)) ) // end H1 Node ) // end Body Node ) //end PageNode
Translation Logic The entire VXML page should have only blocks and forms. HTML form and VoiceXML form - basic difference is submission method and form declaration. Automatic name generation required for VXML forms. Forms are used for collecting inputs from user. Input obtained through more than one type.
Dynamic VoiceXML Generator In the following slides we will introduce the Dynamic VoiceXML Generator (DVG). Static VoiceXML which is output from the Transcoder is fed as input to the DVG. DVG adds dynamic Grammar and direction control elements to the static VoiceXML document to enable anchoring and recalling of marks. The dynamic VoiceXML document allows the user to mark a portion of the document with any user specified name.
Problem with VoiceXML Navigation of the voice document is completely controlled by the page author After each dialog (form) the author has to ask where the listener will like to go next Listener has absolutely no control over navigation. Tedium, Adv. Applications not possible Analogy: Scroll vs a book
Our Solution: Voice Anchors Voice anchors are speech labels that listeners can place on a dialog. Listener can return to that dialog later by uttering that label. Hard to implement this concept, as free- form speech recognition is not possible. Need to incorporate it in the voice browser
System Architecture
Voice Anchors We have developed a number of methods for attaching voice anchors. Most practical method: via spelling The user can state the anchor as a whole word and return to the dialoged labeled Can also have default anchors (turning a scroll into a book). Can also have a no. of default navigation strategies. E.g. skim section headings first
Applications Our system finds its application in the following two scenarios Result of a database query is a plain VoiceXML document and the listener likes to navigate through it Mobile User wishes to navigate through a textually rich HTML document while driving