Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan.

Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan

Introduction The term “multi-modal”  General description of an application that could be operated in multiple input/output modes.  E.g Input: voice, pen, gesture, face expression. Output: voice, graphical output

Multi-modal Dialogue (MMD) in Personal Navigation System Motivation of this presentation  Navigation System provides MMD an interesting scenario a case why MMD is useful Structure of this presentation  3 system papers AT&T MATCH speech and pen input with pen gesture Speechworks Walking Direction System speech and stylus input Univ. of Saarland REAL Speech and pen input Both GPS and a magnetic tracker were used.

Multi-modal Language Processing for Mobile Information Access

Overall Function A working city guide and navigation system  Easy access restaurant and subway information Runs on a Fujitsu pen computer Users are free to  give speech command  draw on display with stylus

Types of Inputs Speech Input  “show cheap italian restaurants in chelsea” Simultaneous Speech and Pen Input  Circle and area  Say “show cheap italian restaurants in neighborhood” at the same time. Functionalities include  Review  Subway routine

Input Overview Speech Input  Use AT&T Watson speech recognition engine Pen Input (electron Ink)  Allow usage of pen gesture.  It could be a complex, pen input Use special aggregation techniques for all this gesture. Inputs would be combined using lattice combination.

Pen Gesture and Speech Input For example:  U: “How do I get to this place?”  S: “Where do you want to go from?”  U “25 th St & 3 rd Avenue” 

Summary Interesting aspects of the system  Illustrate the real life scenario where multi- modal inputs could be used  Design issue: how different inputs should be used together?  Algorithmic issue: how different inputs should be combined together?

Multi-modal Spoken Dialog with Wireless Devices

Overview Work by Speechworks  Jointly conducted by speech recognition and user interface folks  Two distinct elements Speech recognition In a embedded domain, which speech recognition paradigm should be used?  embedded speech recognition?  network speech recognition?  distributed speech recognition? User interface How to “situationlize” the application?

Overall Function Walking Directions Application  Assume user walking in an unknown city  Compaq iPAQ 3765 PocketPC  Users could Select a city, start-end addresses Display a map Control the display Display directions Display interactive directions in the form of list of steps.  Accept speech input and stylus input Not pen gesture.

Choice of speech recognition paradigm Embedded speech recognition  Only simple commands could be used due to computation limits. Network speech recognition  Bandwidth is required  Sometimes network would be cut-off Distributed speech recognition  Client takes care of front-end  Server takes care of decoding 

User Interface Situationalization  Potential scenario Sitting at a desk Getting out of a cab, building, subway and preparing to walk somewhere Walking somewhere with hands free Walking somewhere carrying things Driving somewhere in heavy traffic Driving somewhere in light traffic Being the passenger in a car Being in highly noisy environment.

Their conclusion Balances of audio and visual information  Could be reduced to 4 complementary components Single-modal 1, Visual Mode 2, Audio Mode Multi-modal 3, Visual dominant 4, Visual dominant

A glance of UI

Summary Interesting aspects  Great discussion on how speech recognition could be used in an embedded domain how the user would use the dialogue application

Multi-modal Dialog in a Mobile Pedestrian Navigation System

Overview Pedestrian Navigation System  Two components: IRREAL : indoor navigation system Use magnetic tracker ARREAL: outdoor navigation system Use GPS

Speech Input/Output Speech Input:  HTK / IBM Viavoice embedded and Logox was being evaluated Speech Output:  Festival

Visual output Both 2D and 3D spatialization supported

Interesting aspects Tailor the system for elderly people  Speaker clustering to improve recognition rate for elderly people  Model selection Choose from two models based on likelihood Elderly models Normal adult models

Conclusion Aspects of multi-modal dialogue  What kind of inputs should be used?  How speech and other inputs could be combined/interacted?  How users would use the system?  How the system should respond to the users?

Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan.

Similar presentations

Presentation on theme: "Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan.

Similar presentations

Presentation on theme: "Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan."— Presentation transcript:

Similar presentations

About project

Feedback