© 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal Application
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application2 Introduction - need Design a simple multimodal architecture Architecture supports all possible kinds of multimodal applications starting from simple form filling to Interactive movie including animation. Small required resources - runs on PDA and on Internet Use open standards when possible No compromises in multimodality - let the user freely change between voice (VUI) and GUI Simple and fast development IBM ViaVoice
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application3 Key Components - approach IBM Embedded ViaVoice linklink Embedded VoiceXML Browser (EVB) - research prototype Standard HTML browser – Internet Explorer or Firefox The Adobe Flash Player (XML) protocol which enables the control of the browser by the external application
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application4 Embedded ViaVoice overview Embedded ViaVoice® delivers IBM speech technology to mobile devices and automobile components. Robust speech-recognition with low error rate and text-to-speech SLM and action classification supporting freeform commands – no need for user’s manual Embedded grammars or large lists of over words N-best, confidence score, out of vocabulary detection Speaker and noisy environment adaptation Push to activate button, automatic gain control, automatic end of utterance detection, transient noise detection, Broad range of languages Eclipse based easy-to-use developer toolkit C/C++ highly portable, scalable, small footprint, low CPU MIPS code. IBM provides porting, integration, testing and consulting services, along with customized development workshops IBM ViaVoice
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application5 IBM Embedded VoiceXML Browser overview Small, fast, and portable Embedded VoiceXML Browser (EVB) VoiceXML 2.0 compliant. Written in plain C++ (no templates, etc.) Compact and portable code. Targeted to small portable devices - PDA, handhelds, set-top boxes, etc. Runs on top of the IBM's Embedded Speech Engine and TTS. Ported to Win32, WinCE (iPAQ), and Linux. Runs as a viewer, VoiceXML snippets are pushed to the EVB EVB
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application6 Flash Player - overview The Adobe Flash Player is a widely distributed multimedia and application player created and distributed by Macromedia (a division of Adobe Systems). Flash Player runs SWF files that can be created by the Adobe Flash authoring tool, by Adobe Flex or by a number of other Macromedia and third party tools.MacromediaAdobe SystemsSWFAdobe FlashAdobe FlexMacromedia Flash Player has support for an embedded scripting language called ActionScript (AS), which is based on ECMAScript. ActionScript matured from a script without variables to one that supports object-oriented code.ActionScriptECMAScript
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application7 HTML Browsers - overview HTML Browser MS IE 6, IE 7 Firefox Browsers support add-ons
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application8 PDA architecture EVB GUI – Adobe Flash Player VUI – Embedded VoiceXML Browser – viewer mode Application control ActionScript ActionScripts synchronizes GUI and VUI and generates: VoiceXML snippets of code, Dynamic grammars, grammars, prompts (links) All other dialog parameters Result processing (n-best, disambiguation, similarity, OOV,...)
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application9 Internet Extensions EVB Life-Cycle Manager Add-on starting, initializing, running shutting down the browser prevent multiply VXML browsers running at the same time version policy mechanism providing new version notification The Security Server permits to open a socket in a different domain. Communicate with EVB Life Cycle Manager Security Server
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application10 Internet Architecture Life cycle manager Security server EVB Add-ons Browser Client Internet
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application11 Sample application - Literacy Tutor IBM, Corporate Citizenship & Corporate Affairs Project goals Use speech recognition technology - over the web - to help children and adults improve their literacy skills Value to customer Gain literacy skills through practice and positive reinforcement Improve pronunciation in a private setting Interaction with tutor character introduces ‘fun’ and increases computer skills Web = Anywhere/anytime access: Can resume where left off Can share progress with family Build and share books on the web
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application12 Home page
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application13 Functionality Practice Reading – main application Flash application that uses EVB+EVV to decode speech Flash animates a tutor character that interacts with the reader Reporting – performance reports for teachers indicating strengths as well as problem areas for students Book Library – add/remove books from classroom, rate books, book browser Classroom Management – add/delete students, adjust reading level, add/delete classrooms as well as teachers and schools Book Authoring – separate tool to author new books
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application14 Bookshelf
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application15 Children’s book/character
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application16 Adult book/character
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application17 Student Performance
IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application18 Reading Companion - summary We currently have more than 200 schools and not-for-profit organizations participating in the grant program, involving more than 11,000 users (children and adults) in 9 countries, as follows: Canada, United States, Spain, United Kingdom, Ireland, South Africa, Mexico, Venezuela, India Community relations managers are reviewing proposals from prospective organizations since we hope to expand the program this year to 100 more sites. Market value: US$10,000 per site (regardless of number of users)
© 2007 IBM Corporation SpeechTEK, August 21, 2007 Thank You!