Multimodal Architecture for Integrating Voice and Ink XML Formats Under the guidance of Dr. Charles Tappert By Darshan Desai, Shobhana Misra, Yani Mulyani, Than NyiNyi
Agenda Introduction of Architecture System Architecture Implemented Design Model Sample Dialogue Design InkXML Architecture Tools Used Conclusion
Introduction of Architecture Generic nature Supports development of multimodal applications that can handle speech, ink, and touch-tone digits integration patterns, and also can interpret unimodal speech, ink, and touch-tone digits input, as well as combined multi-modal input. System consists of Ink/Voice SDKs and a multimodal integrator. Voice SDK provides the voice processing capabilities. Ink SDK processes the information entered through ink media. Multimodal integrator handles disambiguation, errors and generates the confirmation feedback. Dialogue design
System Architecture
Implemented Design Model PSTN DATABASE VOICE XML Browser INK XML Interpreter (Java/C++) Ink Input Device CISCO Router Speech To Text Engine TTS ENGINE Handwriting Recognition Engine Voice Input/ Output Device
Sample Dialogue Design (Banking information application) System: You can access your existing account or you can open a new account. What would you like to do? User: Check existing account System: Did you say existing account? User: Yes System: Please enter your account number. User: one eight one four six five System: Did you write one eight seven four six five User: No System: Sorry My Mistake. Please enter your account number. User: one eight one four six five System: Did you write one eight one four six five User: Yes System: Please speak your four digit, pin number User: one two three four System: Did you say one two three four? User: Yes System: Please use the ink to input your full name. Control passes to the ink media. The system waits for the user to input the new text and submit. System: Did you write Haeey Potter User: No System: Sorry My Mistake. Please use the ink to input your full name. Control passes to the ink media. The system waits for the user to input the new text and submit. System: Did you write Harry Potter User: Yes System: Thank you for accessing your account.
Cont. System: Choose personal information, checking or savings. User: personal information System: Did you say personal information? User: Yes System: What would you like to do? Access your information or change your information. User: Change information System: Did you say change information User: Yes System: Would you like to change the address or telephone number or exit? User: Address System: Did you say address? User: Yes System: Please enter your new address by ink Control passes to the ink media. The system waits for the user to input the new text and submit. Once the user has submitted the data the control switches back to voice. System: Did you write one martine av white plains new york one zero six zero three User: Yes System: Your address has been changed.
InkXML InkXML’s primary goal is to bring the full power of web development and content delivery to ink applications. InkXML enables the exchange of virtual ink among devices, such as handhelds, laptops, desktops, and servers. InkXML will provide the ink component of web based multimodal applications Numerous standards already exist that are closely related to or could be used to represent digital ink. (eg. ITU T-150, UNIPEN and Jot) InkXML has two requirements – functional meaning enumerate functions required by ink applications and pragmatic makes inkxml usable and efficient for developing ink applications
InkXML Architecture Application SDK Library Ink Log Generator API Event Handler Driver Pen Hardware
Tools Used Software VoiceXML gateway(Nuance Voice Server) Tomcat Server Ink SDK (IBM) Windows 2000 Server Pingtel softphone (for sip dialup) Hardware Wacom pen tablet Cisco 2600 router with FX0 card Enterprise server Microphone and speakers
Conclusion The proposed architecture for developing multimodal voice/ink applications for noisy mobile environments combines different input modalities to facilitate the development of robust and friendly multimodal applications supporting superior error handling. We envision that users will soon employ smart devices such as wireless phones with integrated pen tablets and more powerful processing capabilities to take full advantage of the proposed multimodal voice/ink architecture. Such smart devices should be able to perform locally enhanced media processing, such as voice recognition, speech synthesis, and handwriting recognition. Graphic generation capabilities on the user ’ s pen tablets should also enhance the efficiency of multimodal applications and may allow for the development of applications for a broader spectrum of the population, including permanently and temporarily disabled users
Advisors Dr. Charles Tappert Dr. Zouheir Trabelsi Yi-Min Chee (IBM T.J.Watson) Dr. Michael Perrone (IBM T.J.Watson)