AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION by Narayanan Annamala Gopal Gupta B. Prabhakaran DEPARTMENT OF COMPUTER SCIENCE THE UNIVERSITY.

AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION by Narayanan Annamala Gopal Gupta B. Prabhakaran DEPARTMENT OF COMPUTER SCIENCE THE UNIVERSITY OF TEXAS AT DALLAS

 Goal: Make information accessible to visually- impaired individuals  Screen-readers work well, but are not completely voice/audio based.  Screen-readers provide a work-around a technology that was not created with blind people in mind.  We should leverage a technology that is more easily usable by blind individuals: VoiceXML  Web access via voice has become important for other reasons as well: cell phones Overview

 VoiceXML is an XML for marking up voice-audio data  A Standard developed by VoiceXML forum (AT&T, Motorola, IBM, Lucent). Now a W3C standard.  VoiceXML is a Markup language for creating telephone-based human-computer interfaces  VoiceXML pages “browsed” via a Voice Browser running on a computer  Users can interact with a VoiceXML page through spoken inputs (Telephone key press).  Voice browser plays synthesized speech audio files using TTS (Text to speech) converters VoiceXML

<![CDATA[[ [(yes)]{ } [(no)] } ]]]> Would you like to get rich quick? Gotcha. You want to be rich! You don't want to be rich. Sample VXML

 To make the web accessible via VoiceXML, we need to rewrite all the web-pages in VXML  Natural solution is to perform this translation automatically  The objective of our research is to develop a translator that converts HTML to VoiceXML  HTML pages can be translated to VXML and browsed via voice/audio on a voice browser HTML to VXML

Application of the Transcoder PSTN INTERNET Mobile User Voice Server Transcoder WEB SERVER Req. http req. html VoiceXML Audio

Application of the Transcoder INTERNET Client Transcoder WEB SERVER http req. Voice Browser HTML VXML HTML Audio

Application of the Transcoder INTERNET Client WEB SERVER http req. Transcoder Voice Browser HTML Audio VXML

Transcoder: Objectives  Provide means for Visually impaired to access the Web.  Strive to express the structure of HTML pages in Voice form.  Application can be customized with respect to User’s wish.  Make the transcoder extensible – to accommodate new HTML tags in future

VoiceXML Example starting of the vxml page Sample Page The output is in the form of audio Sample Page The output is in the form of audio HTML fileVoiceXML file

HTML vs VoiceXML HTMLVoiceXML 1.Single unit, presented with full efficiency. 2.Displays several inputs at the same time. 3.Input does not need any grammar for validation. 1.Consists of forms and blocks alone. 2.Inputs are collected sequentially 3. Every input needs a grammar for validation.

Assumptions Input HTML file needs to comply with the following rules:  Every open tag should have a corresponding close tag.  The input file should be error free.  The file should use only the tags that are specified in the HTML standard. Some browsers inserts special characters during editing.

System Model The application is realized in two phases I.Parsing Phase II.Translation Phase Parsing Phase: The Input HTML file is parsed and the HTML node tree is obtained as output. Parser used - purpose is Web-Wise Systems HTML parser Translation Phase: Each HTML node is converted in to corresponding VoiceXML node.

System Architecture Input Provider Parser Translator Internal data sheet External data sheet Output VoiceXML file

Parsing Phase The structure of the HTML file should be transported to the VoiceXML file. HTML file is parsed and the root node of the input file is obtained. Any HTML file’s root node will be the node

Example 1 Hello World Input HTML fileOutput parse tree (htmlRoot = new RootNode()).addNode(new PageNode().addNode(new HeadNode().addNode(new TitleNode().addNode(new StringNode().setHtmlData(“Example1”)) ) //end TitleNode ) //end HeadNode.addNode(new BodyNode().addNode(new H1Node().setAlign(``center’’).addNode(new StringNode().setHtmlData( ``Hello World ‘’)) ) // end H1 Node ) // end Body Node ) //end PageNode Parsing Example

Translating Phase: Issues Translating phase: Node tree is traversed recursively (from left to right – depth first). Html node converted to appropriate VoiceXML node. Issues:  Verify inputs before submission – different from HTML  Highly structured – follows strict convention eg. consider It is a beautiful city syntactically right, but can be child of only field or block  One to one conversion not always possible

Translation Logic The entire VXML page should have only blocks and forms. HTML form and VoiceXML form - basic difference is submission method and form declaration. Automatic name generation required for VXML forms. Forms are used for collecting inputs from user. Input obtained through more than one type.

Forms: radio tag Radio tags – provide choices, user selects one choice. When one choice selected, other(s) becomes inactive. HTML – radio tags does not have closing tag. Challenge is to identify the last ‘radio’ button of the same type. example: Input HTML section Male Female End of Radio

Forms: radio tag (contd.) Output VoiceXML section …… Please select an Entrée, what sex Male Female ……. Form node Radio: male sex Radio: female sex h1 String: ‘end of radio’

Form: Text Box text box and text area are used to obtain String inputs from user. No sample space for string : e.g., name of a person. VoiceXML inputs need a grammar always. element is used to solve the problem. User can specify record time and attributes. needs a list of fields and a URL for submission. Should verify the inputs with user before submission.

Form: text box (contd.) Sample HTML extractCorresponding VoiceXML extract ……. Firstname …….. At tone, speak First name: I did not hear anything, please try again Your input is …….

Links In HTML, links are given by tag in two ways: To different part of the same document. To a different document altogether. In VXML, links are provided by method. To Internal documents: Sub-dialogs are created. Sub- dialog is like a function call. To External documents:. The target HTML URL is converted to a VoiceXML page, thus VoiceXML URL is provided.

Table In HTML – used to present information in tabular form. Table contains rows and columns, rows may contain tables. Nested table is possible. Information – text, can be read out. Our system maintains table numbers, row number, column numbers and differentiates row and column headings.

Frames Frames – integral part. Source HTML only contains links to other HTML pages (each link is a separate frame) Limitation of oral medium – all frames cannot be spoken simultaneously. Transition to frames provided using element. HTML URLs converted into VoiceXML pages. All Frame URLs stored in separate array, transcoded to VoiceXML recursively.

Text Display Tags Tags used for display – does not make much sense in VoiceXML. Function of some display tags can be spoken out orally ……. and ……. are tags used to speak out text enclosed between them. Content to be spoken can be tailored using Interface sheet. The Interface sheet – used to add new HTML tags, making the system Extensible

Extensible Feature of Transcoder A B Input Attributes HTML TagsCorresponding Text spoken Input duration in seconds for Text-box : Input duration in seconds for Text-Area : …………. ………… Starting of text quoted from elsewhere Ignore ………….. Row A – Input Attributes can be supplied by the user Row B – Treatment of HTML tags can be altered, ignored. New tags can be added in this section.

Conclusion Our transcoder is capable of converting any HTML (4.0 or lower version) file to corresponding VoiceXML file. Prominent feature of the Transcoder – Extensibility and User Inter-activeness. HTML to VoiceXML paves the way for Anytime, Anywhere Internet access for visually impaired (as well as cell phone users).

Future Work  Process applets and scripts that may be present in input HTML page.  Build a true voice-based web (see next talk).

Related Work  The visually impaired – used Screen readers  F. James proposed Auditory HTML Access System (AHA) – used distinct tones  Above two systems – No Interactive feature.  Goose et al. proposed HTML to VoXML converter. VoXML is the ancestor of VoiceXML.

AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION by Narayanan Annamala Gopal Gupta B. Prabhakaran DEPARTMENT OF COMPUTER SCIENCE THE UNIVERSITY.

Similar presentations

Presentation on theme: "AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION by Narayanan Annamala Gopal Gupta B. Prabhakaran DEPARTMENT OF COMPUTER SCIENCE THE UNIVERSITY."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION by Narayanan Annamala Gopal Gupta B. Prabhakaran DEPARTMENT OF COMPUTER SCIENCE THE UNIVERSITY.

Similar presentations

Presentation on theme: "AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION by Narayanan Annamala Gopal Gupta B. Prabhakaran DEPARTMENT OF COMPUTER SCIENCE THE UNIVERSITY."— Presentation transcript:

Similar presentations

About project

Feedback