LINGTOUR: a PDA for tourists Alain Goyé, Eric Lecolinet, Mutsuko Tomokiyo, Gérard Chollet GET-ENST 46, rue Barrault 75634 Paris Cedex 13 goye | elc |

LINGTOUR: a PDA for tourists Alain Goyé, Eric Lecolinet, Mutsuko Tomokiyo, Gérard Chollet GET-ENST 46, rue Barrault 75634 Paris Cedex 13 goye | elc | lin | chollet@enst.frchollet@enst.fr Catherine Pelachaud IUT de Montreuil - Université Paris 8 140, rue de la Nouvelle France 93100 Montreuil, France c.pelachaud@iut.univ-paris8.fr Ding Xiaoqing, Mao Yuhang Dept. of Electronic Engineering Tsinghua University Beijing, 100084, China dingxq@tsinghua.edu.cn Ni Yang Institut National des Télécommunications Département Electronique et Physique 9,Rue Charles Fourier 91011 Evry Cedex-France yang.ni@int-evry.fr

Interfaces multimodales pour un assistant au voyage LINGTOUR: an history Collaboration with TsingHua University : Collaboration with TsingHua University : –Memorandum of understanding (2000) –Vocal French-Chinese dictionary with Le Robert –Master thesis of Dong Qingfu: « Realization of Intelligent Camera Capable of Character Recognition and Translation »

Interfaces multimodales pour un assistant au voyage The LINGTOUR project Multilingual management of information Initially, a PDA for travellers : Initially, a PDA for travellers : –Virtual guide : access to multilingual information for tourists (practical and cultural) –Communication assistant: translation help, navigation within lexicon and access to typical conversations –Travel assistant : orientation and environment interpretation using local and positioning information A personal assistant (PDA or smartphone) with multimodal and ergonomic capabilities : A personal assistant (PDA or smartphone) with multimodal and ergonomic capabilities : –inputs (text, speech, stylus, images) –outputs (text, speech, images, video)

Interfaces multimodales pour un assistant au voyage Interactions PDA - server Multimodal navigation in maps and lexicon Tsinghua University Sound taking Selection / extraction of text Rafinement / corrections of the image Images, sound Images, sound, text Character recognition, Vocal recognition Multilingal translation, Speech synthesis Supervision

Interfaces multimodales pour un assistant au voyage Exploit the specificities of PDA One makes an optimal exploitation of possibilities of PDA for the multimodality : – –Use, jointly, without any keyboard, input of the tactile screen, microphone and camera, and – –Exploit alternatively or simultaneously the graphic qnd sound possibilities, according to the context, to represent the information. The PDA is connected as each time as possible to Internet: – –to download actuality informations – –to enable to export the tasks on a remote server: too complicated Or too high cost for memory – –To enable the intervention, if necessary, of a human operater

Interfaces multimodales pour un assistant au voyage 3 types of multimodal interface Gesture and voice : Gesture and voice : Combinaition of Control menus + vocal input –Controling zoomable interfaces towards graphic or text inputs Intelligent Camera : Intelligent Camera : Rafinement of images –Based on the correlation of a series of images –to improve character recognition Cultural agents : Cultural agents : Conversational agents animated and adapted to the culture Conversational agents animated and adapted to the culture –Adding to speech non-verbal behaviour: face, eyes, gestures, depending to the culture

Interfaces multimodales pour un assistant au voyage ZUIs and Menu control 2D Constraints of PDA : screen size Constraints of PDA : screen size ZUIs : user zoomable interfaces ZUIs : user zoomable interfaces –Concept of semantic zoom: Progressive revelation of levels of details Progressive revelation of levels of details Menus control [1] : Menus control [1] : –Selection + control of the action (movement, zoom) by only one gesture –No chang of context, no manipulation of multiple interactions for only one operation Gesture and voice [1] [1] Pook, S., Lecolinet, E., Vaysseix, G. et Barillot, E., Control Menus: Execution and Control in a Single Interactor. Proc. ACM conf. on Human Factors in Computing Systems (CHI) 2000, 263-264. ACM Press.

Interfaces multimodales pour un assistant au voyage Characteristics of menu control –Combinning the selection and the control of an operation for only one gesture –Capable to integrate up to 2 bars of movements (vertical et horizontal) –The user concentrates his attention on the content –Capable to have sub-menus –Like the Pie menus [2] and the Marking menus [3], offering a beginner mode et an expert mode The spacious disposition of the menus helps the memorization The spacious disposition of the menus helps the memorization Quick gestures => the menus don’t appear on the screen Quick gestures => the menus don’t appear on the screen Implicit passage from a mode to the other Implicit passage from a mode to the other [2] [2] Hopkins, D., The design and implementation of Pie menus. Dr Dobb's journal of software tools, 1991, 16 (12), 16-26. [3] [3] Kurtenbach, G. et al., The Hotbox: efficient access to a large number of menu-items. Proc. ACM – CHI, 1993, 231-327. Gesture and voice

Interfaces multimodales pour un assistant au voyage Application of the menu control navigation in a map of town, navigation by a lexicon : – –Helpful words and clauses to tourists, – –hierarchized in categories such as : accomodation > hotel > reservation…. Gesture and voice

Interfaces multimodales pour un assistant au voyage The voice : multilingal recognition voice recognition engine: – –Limited vocabulary, but – –independant of speaker, – –No leaning. The recognition in different langages : – –sharing common acoustique models, one which facilitates the future extensions to new languages. – –Adaptable models to users and to usage conditions. French Chinese common acoustique models Models specific to the langage Gesture and voice

Interfaces multimodales pour un assistant au voyage The voice is associated with gestures… The vocal information is emploied differently according to the given context : Navigation in the map : « tap and talk » : access by a vocal menu to diverse informations on the pointed objet. Navigation by lexicon : – –like short cut access to categories, then – –to the access to input words or clauses. The translation will appear / be synthesized in the target language. Possibly, improvement by using keywords ("word spotting"). Gesture and voice

Interfaces multimodales pour un assistant au voyage The « intelligent » camera see, recognize and translate see, recognize and translate The character recognition – chinese in paticular – achieved now to high performance. to limit computing cost : – –Recognition made on a sub-part of the image. – –This sub-part can be chosen semi-automatically at the moment of delimitation phase and previous segmentation. The text once recognized can be translated : – –Locally to facilitate the translation, a vocal menu enables to choose the context : the notice of bus stops or street names, monuments, etc. – –Or by a remote server via a radiocommunication service. It’s also possible to be reproduced by vocal synthesis Intelligent camera

Interfaces multimodales pour un assistant au voyage The camera usage [4] capture reco translation Intelligent camera [4] [4] Mao, Y., Dong, Q., Qi Y. et Chollet, G. Realization of an Intelligent Camera capable of Character Recognition and Translation. Proc. of Sino-French Symp. on Speech and Language Processing, Beijing, October 2000. Disponible à l’adresse : http://www.tsi.enst.fr/~chollet/Projets/Chine/Lingtour/IntelCamera.dochttp://www.tsi.enst.fr/~chollet/Projets/Chine/Lingtour/IntelCamera.doc

Interfaces multimodales pour un assistant au voyage Improve the image resolution Difficulty : Difficulty : – –image far obtained in the street – –Cheeper camera   quality/ insufficient resolution for the recognition Solution Solution : image rafinement – –correlation and reconstruction of a series of successive images. – –Exploitation of the small differences due to natural movement of the hand which keeps the camera.   image with superieur resolution to one of captures. Intelligent camera

Interfaces multimodales pour un assistant au voyage Principle of image rafinement Camera on the PDA Vibration of the hand Acquisition of image sequence Evaluation of movements (sub-pixel) Image of better resolution Recomposition of only one imageIntelligent camera camera

Interfaces multimodales pour un assistant au voyage Rafinement of images : results Notable improvement : – –Of visual quality – –of rate of character – –recognition Intelligentcamera

Interfaces multimodales pour un assistant au voyage Conversational agents : interest It enables to [5] tarnsfer an information in more attractive and more user-friendly manner than simple vocal synthesis. The nonverbal expressions enable : – –to disambiguate the meaning of an utterance, – –to emphasize certain words or utterance fragments… It supplies the informations with different levels: – –syntactic – –semantic – –emotionnal In a multicultural context, a visual demonstration can be also better vecter of teaching of certain usages. Cultural agents [5] [5] Pelachaud, C., Carofiglio, V., De Carolis, B. et de Rosis, F., Embodied Contextual Agent in Information Delivering Application, First Intl. Joint Conf. on Autonomous Agents & Multi-Agent Systems, Bologna, July 2002

Interfaces multimodales pour un assistant au voyage « Greta » : facial animation engine Objective : a model animated capable to simulate in quick and realistic manner the dynamic aspects of human face. Realization : a facial animation engine of which the model 3D forms a young woman behaviour. Greta is : – –the core of a decoder MPEG-4 – –Conform to specifications “Simple Facial Animation Object Profile" of the standard. – –capable : to generate the structure of an original model, To animate this, To reproduct in real time. Cultural Cultural agents agents

Interfaces multimodales pour un assistant au voyage Adopt the conversational agents Transport on PDA of animated agents. Transport on PDA of animated agents. –The power and the screen size of apparatus are limited –The complexity and the level of details of the animation have to be adapted. Adaptation of the behaviour to users : Adaptation of the behaviour to users : In spite of recent advance in material of realism, the actual agents know only one type of behaviour, which reflects often the occidental culture.  Cultural and social adaptation to the context : The same information must be delivered differently, for example: to a French and to a Chinese, to a French and to a Chinese, to a journalist and to a private. to a journalist and to a private. Cultural agents

Interfaces multimodales pour un assistant au voyage Conversational and cultural agents : semantic representation Base : semantic representation independant on the language, based on the standard XML-XSD. Base : semantic representation independant on the language, based on the standard XML-XSD. –description of the communicative fonction of gestures and signals composing the gestures. On-layer of the attributes specific to the culture, which influence on : On-layer of the attributes specific to the culture, which influence on : –the choice of a gesture (smile or shake/nod of the head), –the duration of a look… More generally, these influences can concern : –the definition of a signal (hiding of a signal by an other), –Intensity of sound, –Sound duration, etc.

Interfaces multimodales pour un assistant au voyage Conversational and cultural agents … in certain cultures, Not to watch his interlocuter can be interpreted as a lack of his attention /his interest… In other cultures, Watch straightforward in eyes can be interpreted as a form of agression… Cultural Culturalagents

Interfaces multimodales pour un assistant au voyage Results and what follows… At the end of the works which this project has enabled to initiate, we hope be in a position to demonstrate : 1) the possibility to integrate on a mobile terminal (PDA, smartphone…) using the diverse interfaces presented here : – –Menu control 2D, – –capture and recognition of text, – –Conversational agents. 2) the profits of the improvements which we recommend for each of these fonctionnalities: – –integration of vocal commands in the menus, – –rafinement of images by spatio-temporary correlation, – –enrichment of the agents by the cultural attributes. Gesture and voice Intelligent camera Cultural agents

Interfaces multimodales pour un assistant au voyage To evaluate these works within the EURO-CHINA programme … Collaboration engaged with Peer2Phone (voice on IP via WIFI) Collaboration engaged with Peer2Phone (voice on IP via WIFI) Presentation at the end of April in Beijing Presentation at the end of April in Beijing A proposal with our Chinese partnars for the Olympics in Beijing A proposal with our Chinese partnars for the Olympics in Beijing

LINGTOUR: a PDA for tourists Alain Goyé, Eric Lecolinet, Mutsuko Tomokiyo, Gérard Chollet GET-ENST 46, rue Barrault 75634 Paris Cedex 13 goye | elc |

Similar presentations

Presentation on theme: "LINGTOUR: a PDA for tourists Alain Goyé, Eric Lecolinet, Mutsuko Tomokiyo, Gérard Chollet GET-ENST 46, rue Barrault 75634 Paris Cedex 13 goye | elc |"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LINGTOUR: a PDA for tourists Alain Goyé, Eric Lecolinet, Mutsuko Tomokiyo, Gérard Chollet GET-ENST 46, rue Barrault 75634 Paris Cedex 13 goye | elc |

Similar presentations

Presentation on theme: "LINGTOUR: a PDA for tourists Alain Goyé, Eric Lecolinet, Mutsuko Tomokiyo, Gérard Chollet GET-ENST 46, rue Barrault 75634 Paris Cedex 13 goye | elc |"— Presentation transcript:

Similar presentations

About project

Feedback