Multimodal user interfaces: Implementation Chris Vandervelpen
Overview Introduction VoiceXml X+V From models to X + V Demo: ACCESS Netfront Conclusions Questions
Introduction Focus on speech/direct manipulation on mobile device How can we deploy a multi modal UI –Build our own framework using speech synthesizer/recognizers that interpret the designed models (reinventing the wheel) –Build software that generates standardized markup from the models (use existing technologies) start point
VoiceXml Markup language for speech only interfaces Telephone interfaces Using grammars for speech recognition –Java Speech Grammar Format (JSGF) –Nuance Grammar Specification Language (NGSL) Speech output –Synthesis –Prerecorded audio
VoiceXml <![CDATA[ #JSGF V1.0; grammar cities; = brussels | antwerp | amsterdam; ]]> What departure city do you like?? For example, brussels, antwerp or amsterdam Your departure city is ………
VoiceXml Mixed-initiative forms –Single user input for several fields –Supports more natural language For example –I want to fly from “brussels” to “amsterdam” –Filling in departure_city and destination_city fields
X + V –XHtml: visual channel –VoiceXml snippets: speech channel Synchronization between modalities using Xml Events Multimodal browsers supporting X+V –ACCESS Netfront multimodal browser (PocketPC) –Opera al/x+v/12/ al/x+v/12/
X + V <input id=“to” name=“to” size=“20” ev:event=“inputfocus” ev:handler=“#voice_city_to” />
X + V <![CDATA[ #JSGF V1.0; grammar cities; = brussels | antwerp | amsterdam; ]]> What departure city do you like?? For example, brussels, antwerp or amsterdam …….
X + V Also usable with XForms VoiceXml snippets and XForms influence same XForms instance model synchronization
Models to X + V
Annotate UI description for speech [Shao2003: Transcoding HTML to VoiceXML Using Annotations] Extend this approach to UIML and X + V –Identify particular information structures Text areas Menu/List structures Top-level visual region –Define their representation in XHTML and VoiceXml –Generate the synchronization XML eventing code
Model to X + V Define a generic UIML widget vocabulary mapping for both GUI and speech [Plomp2002] TextEntry – (VoiceXml) – (XHtml) –System.Windows.Forms.TextBox Collection – (VoiceXml) – (XHtml) –System.Windows.Forms.Panel
Access Netfront multimodal browser PocketPC Ordering pizza Ordering Chinese Demo
Conclusions X + V –built-in modality synchronization –alternative to own multimodal implementation –declarative –transformation from UIML possible
Questions?